Getting Started¶
Tap Development Overview¶
Create taps with the SDK requires overriding just two or three classes:
The
Tap
class. This class governs configuration, validation, and stream discovery.The stream class. You have different options for your base class depending on the type of data source you are working with:
Stream
- The generic base class for streams.RESTStream
- The base class for REST-type streams.GraphQLStream
- The base class for GraphQL-type streams. This class inherits fromRESTStream
, since GraphQL is built upon REST.
An optional authenticator class. You can omit this class entirely if you do not require authentication or if you prefer to write custom authentication logic. The supported authenticator classes are:
SimpleAuthenticator
- This class is functionally equivalent to overridinghttp_headers
property in the stream class.OAuthAuthenticator
- This class performs an OAuth 2.0 authentication flow.OAuthJWTAuthenticator
- This class performs an JWT (JSON Web Token) authentication flow. Requires installing thesinger-sdk[jwt]
extra.
Target Development Overview¶
Create targets with the SDK requires overriding just two classes:
The
Target
class. This class governs configuration, validation, and stream discovery.The
Sink
class. You have two different options depending on whether your target prefers writing one record at a time versus writing in batches:RecordSink
writes one record at a time, via theprocess_record()
method.BatchSink
writes one batch at a time. Important class members include:start_batch()
to (optionally) initialize a new batch.process_record()
to enqueue a record to be written.process_batch()
to write any queued records and cleanup local resources.
Note: The Sink
class can receive records from one stream or from many. See the Sink documentation
for more information on differences between a target’s Sink
class versus a tap’s Stream
class.
Building a New Tap or Target¶
First, install cookiecutter, Poetry, and optionally Tox:
# Install pipx if you haven't already
pip install pipx
pipx ensurepath
# Restart your terminal here, if needed, to get the updated PATH
pipx install cookiecutter
pipx install poetry
# Optional: Install Tox if you want to use it to run auto-formatters, linters, tests, etc.
pipx install tox
Tip
The minimum recommended version of cookiecutter is 2.2.0
(released 2023-07-06).
Now you can initialize your new project with the Cookiecutter template for taps:
cookiecutter https://github.com/meltano/sdk --directory="cookiecutter/tap-template"
…or for targets:
cookiecutter https://github.com/meltano/sdk --directory="cookiecutter/target-template"
Note that you do not need to create the directory for the tap.
If you want want /projects/tap-mytap
, then run the cookiecutter in /projects
and the tap-mytap
project will be created.
Once you’ve answered the cookiecutter prompts, follow the instructions in the
generated README.md
file to complete your new tap or target. You can also reference the
Meltano Tutorial for a more
detailed guide.
Avoid repeating yourself
If you find yourself repeating the same inputs to the cookiecutter, you can create a
cookiecutterrc
file in your home directory to set default values for the prompts.
For example, if you want to set the default value for your name and email, and the
default stream type and authentication method, you can add the following to your
~/.cookiecutterrc
file:
# ~/.cookiecutterrc
default_context:
admin_name: Johnny B. Goode
admin_email: jbg@example.com
stream_type: REST
auth_method: Bearer Token
Application configuration¶
The SDK lets you define a configuration schema for your tap or target with full support for JSON schema validation. Read the more in-depth guide on defining a configuration schema.
Using an existing library¶
In some cases, there may already be a library that connects to the API and all you need the SDK for is to reformat the data into the Singer specification. The SDK is still a great choice for this. The Peloton tap is an example of this.
RESTful JSONPaths¶
By default, the Singer SDK for REST streams assumes the API responds with a JSON array of records, but you can easily override this behaviour by specifying the records_jsonpath
expression in your RESTStream
or GraphQLStream
implementation:
class EntityStream(RESTStream):
"""Entity stream from a generic REST API."""
records_jsonpath = "$.data.records[*]"
You can test your JSONPath expressions with the JSONPath Online Evaluator.
Nested array example¶
Many APIs return the records in an array nested inside an JSON object key.
Response:
{ "data": { "records": [ { "id": 1, "value": "abc" }, { "id": 2, "value": "def" } ] } }
Expression:
$.data.records[*]
Result:
[ { "id": 1, "value": "abc" }, { "id": 2, "value": "def" } ]
Nested object values example¶
Some APIs instead return the records as values inside an object where each key is some form of identifier.
Response:
{ "data": { "1": { "id": 1, "value": "abc" }, "2": { "id": 2, "value": "def" } } }
Expression:
$.data.*
Result:
[ { "id": 1, "value": "abc" }, { "id": 2, "value": "def" } ]
Extra features¶
The following extra features are available for the Singer SDK:
faker
- Enables the use of Faker in stream maps.jwt
- Enables theOAuthJWTAuthenticator
class for JWT (JSON Web Token) authentication.s3
- Enables AWS S3 as a BATCH storage.parquet
- Enables as BATCH encoding.testing
- Pytest dependencies required to use the Tap & Target Testing Framework.
Resources¶
Detailed Class Reference¶
For a detailed reference, please see the SDK Reference Guide
Implementation Details¶
For more information about the SDK’s’ Singer implementation details, please see the SDK Implementation Details section.
Code Samples¶
For a list of code samples solving a variety of different scenarios, please see our Code Samples page.
CLI Samples¶
For a list of sample CLI commands you can run, click here.
Python Tips¶
We’ve collected some Python tips which may be helpful for new SDK users.
IDE Tips¶
Using the debugger features of your IDE can help you develop and fix bugs easier and faster. Also using breakpoints is a great way to become familiar with the internals of the SDK itself.
VSCode Debugging¶
Ensure the interpreter you’re using in VSCode is set to use poetry. You can change this by using the command palette to go to interpreter settings. Doing this will also help with autocompletion.
In order to launch your plugin via it’s CLI with the built-in debugger, VSCode requires a Launch configuration.
An example launch configuration, added to your launch.json
, might be as follows:
{
// launch.json
"version": "0.2.0",
"configurations": [
{
"name": "tap-snowflake discovery",
"type": "python",
"request": "launch",
"module": "tap_snowflake.tap",
"args": ["--config", "config.json", "--discover"],
"python": "${command:python.interpreterPath}",
// Set to true to debug third-party library code
"justMyCode": false,
}
]
}
PyCharm Debugging¶
See the JetBrain’s PyCharm documentation for more detail
To launch the PyCharm debugger you can select “Edit Configuration” in the main menu to open the debugger configuration.
Click “Add new run configuration”. Set the script path to the full path to your tap.py and parameters to something like --config .secrets/config.json
.
You can pass in additional parameters like --discover
or --state my_state_file.json
to test the discovery or state workflows.
Main Method¶
The above debugging configurations rely on an equivalent to the following snippet being added to the end of your tap.py
or target.py
file:
if __name__ == "__main__":
TapSnowflake.cli()
This is automatically included in the most recent version of the tap and target cookiecutters.
Testing performance¶
We’ve had success using viztracer
to create flame graphs for SDK-based packages and find if there are any serious performance bottlenecks.
You can start doing the same in your package. Start by installing viztracer
.
$ poetry add --group dev viztracer
Then simply run your package’s CLI as normal, preceded by the viztracer
command
$ poetry run viztracer my-tap
$ poetry run viztracer -- my-target --config=config.json --input=messages.json
That command will produce a result.json
file which you can explore with the vizviewer
tool.
$ poetry run vizviewer result.json
Thet output should look like this
Note: Chrome seems to work best for running the vizviewer
app.