Singer Tap Porting Guide
This guide walks you through the process of migrating an existing Singer Tap over to the SDK.
Want to follow along in a real world porting example? See our recorded pair coding session for the
A Clear Slate
When porting over an existing tap, most developers find it easier to start from a fresh repo than to incrementally change their existing one.
Within your existing repo, create a new branch.
Move all of the files in the old branch into a subfolder called “archive”.
Commit and push the result to your new branch. (You’ll do this several times along the way, which creates a fresh tree and a fresh diff for subsequent commits.)
Now follow the steps in the dev guide to create a new project using the Tap cookiecutter.
Copy all the files from the cookiecutter output into your main repo and commit the result.
Settings and Readme
Next, we’ll copy over the settings and readme from the old project to the new one.
archive/README.mdand copy-paste the old file into the new
README.md. Generally, the best place to insert it is in the settings section, then move things around as needed. Commit the result when you’re happy with the new file.
README.mdin side-by-side mode with
tap.pyand copy paste each setting and it’s description into an appropriate type helper class (prefixed with
If settings are not defined in your README.md, you can try searching the
archived/**.pyfiles for references to
config[, and/or check any other available reference for the expected input settings.
If you are building a SQL tap
Since SQL taps leverage the excellent SQLAlchemy library, most behaviors are already predefined and automatic. This includes catalog creation, table scanning, and many other common challenges.
If you are porting over a SQL tap… skip ahead now to the Installing Dependencies and make sure your SQL provider’s SQLAlchemy drivers are included in the added library dependencies. Also, when you get the step of searching for TODO items, pay close attention to
get_sqlalchemy_url() since this will drive authentication and connectivity.
Continue until just before you reach the “Pagination” section, at which point you are probably done! 🚀 Optionally, you can further optimize performance by overriding
get_records() with a sync method native to your SQL operations.
client.pyfile (depending on auth method) and locate the authentication logic.
In your archived files, open
client.pyor whichever file pertains to authentication. (You’ll use this for reference in the next step.)
Update the authenticator methods by applying the logic and config values as demonstrated in the archived python code.
Define your first stream
Before you begin this section, please select a stream you would like to port as your first stream. This should be a simple stream without complex logic. If your tap has nested structure, start with a top level stream rather than a child stream.
streams.pyand modify one of the samples to fit your first stream’s name.
Make sure you set
If you have a schema file for each stream:
Move the entire
schemasfolder out of the
schema_filepathproperty to be equal to the schema file for this stream.
If you are declaring schemas directly (without an existing JSON schema file):
Using the typing helpers (
th.*) to define just the
replication_key, and 3-5 additional fields.
Don’t worry about defining all properties up front. Instead, come back to this step after you finish a successful stream test.
Once you have a single stream defined, with 3-6 properties, you’re ready to continue to the next step.
Check for required libraries in
archive/setup.py. For each library you find:
#Add a library: poetry add my-library #Or add the same library with version constraints: poetry add my-library==1.0.2
You can probably skip any libraries related to
singer-*- as these functions are already managed in the SDK.
Once you have the necessary dependencies added, run
poetry install to make sure everything is ready to go.
TODO items in
tap.pyand search for TODO items. Depending on the type of tap you are porting, you will likely have to provide your new stream’s class names so that the tap class knows to invoke them.
client.pyand search for TODO items. If your API type requires a
url_base, go ahead and input it now.
You can postpone the other TODOs for now. Pagination will be addressed in the steps below.
Note: You do not have to resolve TODOs everywhere in the project, but if there are any sections you can obviously resolve, you can go ahead and do so now.
Test, debug, repeat, … until… success!
This is the stage where you’ll finally see data streaming from the tap. 🙌
If you have not already done so, run
poetry install to make sure your project and its dependencies are properly installed in your virtual environment.
Repeat the following steps until you see a help message:
poetry run tap-mysource --helpto confirm the program can run.
Find and fix any errors that occur.
Now, repeat the following steps until you get data coming through your tap:
poetry run tap-mysourceto attempt your first data sync.
Find and fix any errors that occur.
If you run into error, go back and debug, and especially double check your authentication process and input credentials.
If you’re able to see data coming from your tap, congrats!
Important: If you’ve gotten this far, this is a good time to commit your code back to your branch. In case anything breaks in the subsequent steps, you’ll easily be able to get back to this point and/or see what has changed since the successful sync.
Pagination is generally unique for almost every API. There’s no single method that solves for very different API’s approach to pagination.
Most likely you will use
get_next_page_token to parse and return whatever the “next page” token is for your source, and you’ll use
get_url_params to define how to pass the “next page” token back to the API when asking for subsequent pages.
When you think you have it right, run
poetry run tap-mysource again, and debug until you are confident the result is including multiple pages back from the API.
Note: Depending on how well the API is designed, this could take 5 minutes or multiple hours. If you need help, sometimes PostMan or Thunder Client can be helpful in debugging the APIs specific quirks.
Now is a good time to test that the built-in tests are working as expected:
poetry run pytest
Create the remaining streams
Now that basic authentication, pagination, and test are all working, you can freely add the remaining streams to
As should be expected, you are free to subclass streams in order to have their behavior be inherited from other stream classes.
For instance, if 3 streams use one pagination method, and 5 other streams use a different method, you can have each stream created as a subclass of a stream that has desired behavior.
If you have streams which invoke each other in a nested layout, please refer to the
parent_stream_classproperty and its related documentation.
As before, if you do not already have a full JSON Schema file for each stream type, it is generally a good practice to start with just 5-8 properties per stream. You don’t have to define all properties up front and before doing so, it is generally more valuable to test that each stream is getting data.
Run pytest again, add stream properties, and repeat
Now that all streams are defined, run
poetry run pytest
If pytest is successful, add properties missing from your prior iteration.
Run pytest again.
Continue adding properties and testing until all streams are fully defined.
Optional Next Steps
Handle legacy state conversions
The SDK will automatically handle
STATE for you 99% of the time. However, it is very likely that the legacy version of the tap has a different
STATE format in comparison with the SDK format. If you want to seamlessly support both old and new STATE formats, you’ll need to define a conversion operation.
To handle the conversion operation, you’ll override
Tap.load_state(). The exact process of converting state is outside of this guide, but please check the STATE implementation docs for an explanation of general format expectations.
Leverage Auto Generated README
The SDK provides autogenerated markdown you can paste into your README:
poetry run tap-mysource --about --format=markdown
This text will automatically document all settings, including setting descriptions. Optionally, paste this into your existing