Python Tips for SDK Developers¶
Tip #1: Intro to Virtual Environments, Poetry, and Pipx¶
Everyone comes from a different perspective - please select the scenario that you most identify with.
If you know nothing about virtual environments¶
If you are completely new to the concept of virtual environments, that’s great! You have nothing to “unlearn”.
Poetry and Pipx will make your life easy. They basically make it so you never have to worry about virtual environments. Pipx and Poetry take care of virtual environments for you so that you don’t have to worry about dependency conflicts.
Pipx: Use this instead of pip whenever you are installing a python program (versus a python library). Pipx automatically creates a virtual environment for you and automatically makes sure that the executables contained in the python package get added to your path.
Poetry: Use this when you are developing in Python. The SDK cookiecutter template already sets you up for poetry. When you are running a command with
poetry run ..., poetry is doing the work to make sure your command runs in the correct virtual environment behind the scenes. This means you will automatically be running with whatever library versions you have specified in
poetry add .... If it ever feels like your environment may be stale, you can run
If you already know about virtual environments¶
If you are used to working with virtual environments, the challenge with pipx and poetry is just to learn how to let these new tools do the work for you. Instead of manually creating and managing virtual environments, these two tools automate the process for you.
poetry: Handles package management processes during development.
Admittedly, there’s a learning curve. Instead of
requirements.txt, everything is managed in
Adding new dependencies is performed with
poetry add <pippable-library-ref>or
poetry add -D <pippable-dev-only-ref>.
If version conflicts occur, relax the version constraints in
pyproject.tomlfor the libraries where the conflict is reported, then try again.
The biggest change is that
-e(“editable mode”) doesn’t work how it used to with
pip. Instead, just use the shell-script approach from the cookiecutter template. This script automatically changes into the poetry directory and executes the needed
poetry runcommands from that folder. (There’s a stackoverflow link in the shell script with more context.)
Poetry can also publish your libraries to PyPi.
pipx: Install pipx once, and then use pipx instead of pip.
If you are using poetry for development, then all other pip-installables should be executable tools and programs, which is what pipx is designed for.
You don’t need to create a virtual environment, and you don’t need to remember to activate/deactivate the environment. For instance, you can just run
pipx install meltanoand then directly execute
meltanowith no virtual env reference, with no prefix to remember, and with no activation.
What is an virtual environment anyway?¶
The quick explanation of a virtual environment is: a directory on your machine that holds a full set of version-specific python packages, isolated from other copies of those same libraries so that version conflicts from package to package do not cause conflicts one each other. Each program can have its own version requirements for its dependencies, and that’s okay because each virtual environment is separate from the others.
For years, python developers have had to create, track, and manage their virtual environments manually, but luckily, now we don’t have to!
Tip #2: Static vs Dynamic Properties in Python and the SDK¶
In Python, properties within classes like Stream and Tap can generally be overridden
in two ways: statically or dynamically. For instance,
replication_key should be declared statically if their values are known ahead of time
(during development), and they should be declared dynamically if they vary from one
environment to another or if they can change at runtime.
Here’s a simple example of static definitions based on the cookiecutter template. This example defines the primary key and replication key as fixed values which will not change.
class SimpleSampleStream(Stream): primary_keys = ["id"] replication_key = None
Dynamic property example¶
Here is a similar example except that the same properties are calculated dynamically based on user-provided inputs:
class DynamicSampleStream(Stream): @property def primary_keys(self): """Return primary key dynamically based on user inputs.""" return self.config["primary_key"] @property def replication_key(self): """Return replication key dynamically based on user inputs.""" result = self.config.get("replication_key") if not result: self.logger.warning("Danger: could not find replication key!") return result
Note that the first static example was more concise while this second example is more extensible.
Use the static syntax whenever you are dealing with stream properties that won’t change and use dynamic syntax whenever you need to calculate the stream’s properties or discover them dynamically.
For those new to Python, note that the dynamic syntax is identical to declaring a function or method, with the one difference of having the
@propertydecorator directly above the method definition. This one change tells Python that you want to be able to access the method as a property (as in
pk = stream.primary_key) instead of as a callable function (as in
pk = stream.primary_key()).
For more examples, please see the Code Samples page.