Getting Started

Tap Development Overview

Create taps with the SDK requires overriding just two or three classes:

  1. The Tap class. This class governs configuration, validation, and stream discovery.

  2. The stream class. You have different options for your base class depending on the type of data source you are working with:

    • Stream - The generic base class for streams.

    • RESTStream - The base class for REST-type streams.

    • GraphQLStream - The base class for GraphQL-type streams. This class inherits from RESTStream, since GraphQL is built upon REST.

  3. An optional authenticator class. You can omit this class entirely if you do not require authentication or if you prefer to write custom authentication logic. The supported authenticator classes are:

    • SimpleAuthenticator - This class is functionally equivalent to overriding http_headers property in the stream class.

    • OAuthAuthenticator - This class performs an OAuth 2.0 authentication flow.

    • OAuthJWTAuthenticator - This class performs an JWT (JSON Web Token) authentication flow. Requires installing the singer-sdk[jwt] extra.

Target Development Overview

Create targets with the SDK requires overriding just two classes:

  1. The Target class. This class governs configuration, validation, and stream discovery.

  2. The Sink class. You have two different options depending on whether your target prefers writing one record at a time versus writing in batches:

    • RecordSink writes one record at a time, via the process_record() method.

    • BatchSink writes one batch at a time. Important class members include:

      • start_batch() to (optionally) initialize a new batch.

      • process_record() to enqueue a record to be written.

      • process_batch() to write any queued records and cleanup local resources.

Note: The Sink class can receive records from one stream or from many. See the Sink documentation for more information on differences between a target’s Sink class versus a tap’s Stream class.

Building a New Tap or Target

First, install cookiecutter, Poetry, and optionally Tox:

# Install pipx if you haven't already
pip install pipx
pipx ensurepath

# Restart your terminal here, if needed, to get the updated PATH
pipx install cookiecutter
pipx install poetry

# Optional: Install Tox if you want to use it to run auto-formatters, linters, tests, etc.
pipx install tox

Tip

The minimum recommended version of cookiecutter is 2.2.0 (released 2023-07-06).

Now you can initialize your new project with the Cookiecutter template for taps:

cookiecutter https://github.com/meltano/sdk --directory="cookiecutter/tap-template"

…or for targets:

cookiecutter https://github.com/meltano/sdk --directory="cookiecutter/target-template"

Note that you do not need to create the directory for the tap. If you want want /projects/tap-mytap, then run the cookiecutter in /projects and the tap-mytap project will be created.

Once you’ve answered the cookiecutter prompts, follow the instructions in the generated README.md file to complete your new tap or target. You can also reference the Meltano Tutorial for a more detailed guide.

Avoid repeating yourself

If you find yourself repeating the same inputs to the cookiecutter, you can create a cookiecutterrc file in your home directory to set default values for the prompts.

For example, if you want to set the default value for your name and email, and the default stream type and authentication method, you can add the following to your ~/.cookiecutterrc file:

# ~/.cookiecutterrc
default_context:
  admin_name: Johnny B. Goode
  admin_email: jbg@example.com
  stream_type: REST
  auth_method: Bearer Token

Using an existing library

In some cases, there may already be a library that connects to the API and all you need the SDK for is to reformat the data into the Singer specification. The SDK is still a great choice for this. The Peloton tap is an example of this.

RESTful JSONPaths

By default, the Singer SDK for REST streams assumes the API responds with a JSON array of records, but you can easily override this behaviour by specifying the records_jsonpath expression in your RESTStream or GraphQLStream implementation:

class EntityStream(RESTStream):
    """Entity stream from a generic REST API."""
    records_jsonpath = "$.data.records[*]"

You can test your JSONPath expressions with the JSONPath Online Evaluator.

Nested array example

Many APIs return the records in an array nested inside an JSON object key.

  • Response:

    {
      "data": {
        "records": [
          { "id": 1, "value": "abc" },
          { "id": 2, "value": "def" }
        ]
      }
    }
    
  • Expression: $.data.records[*]

  • Result:

    [
      { "id": 1, "value": "abc" },
      { "id": 2, "value": "def" }
    ]
    

Nested object values example

Some APIs instead return the records as values inside an object where each key is some form of identifier.

  • Response:

    {
      "data": {
        "1": {
          "id": 1,
          "value": "abc"
        },
        "2": {
          "id": 2,
          "value": "def"
        }
      }
    }
    
  • Expression: $.data.*

  • Result:

    [
      { "id": 1, "value": "abc" },
      { "id": 2, "value": "def" }
    ]
    

Extra features

The following extra features are available for the Singer SDK:

Resources

Detailed Class Reference

For a detailed reference, please see the SDK Reference Guide

Implementation Details

For more information about the SDK’s’ Singer implementation details, please see the SDK Implementation Details section.

Code Samples

For a list of code samples solving a variety of different scenarios, please see our Code Samples page.

CLI Samples

For a list of sample CLI commands you can run, click here.

Python Tips

We’ve collected some Python tips which may be helpful for new SDK users.

IDE Tips

Using the debugger features of your IDE can help you develop and fix bugs easier and faster. Also using breakpoints is a great way to become familiar with the internals of the SDK itself.

VSCode Debugging

Ensure the interpreter you’re using in VSCode is set to use poetry. You can change this by using the command palette to go to interpreter settings. Doing this will also help with autocompletion.

In order to launch your plugin via it’s CLI with the built-in debugger, VSCode requires a Launch configuration. An example launch configuration, added to your launch.json, might be as follows:

{
  // launch.json
  "version": "0.2.0",
  "configurations": [
    {
      "name": "tap-snowflake discovery",
      "type": "python",
      "request": "launch",
      "module": "tap_snowflake.tap",
      "args": ["--config", "config.json", "--discover"],
      "python": "${command:python.interpreterPath}",
      // Set to true to debug third-party library code
      "justMyCode": false,
    }
  ]
}

PyCharm Debugging

See the JetBrain’s PyCharm documentation for more detail

To launch the PyCharm debugger you can select “Edit Configuration” in the main menu to open the debugger configuration. Click “Add new run configuration”. Set the script path to the full path to your tap.py and parameters to something like --config .secrets/config.json. You can pass in additional parameters like --discover or --state my_state_file.json to test the discovery or state workflows.

Main Method

The above debugging configurations rely on an equivalent to the following snippet being added to the end of your tap.py or target.py file:

if __name__ == "__main__":
    TapSnowflake.cli()

This is automatically included in the most recent version of the tap and target cookiecutters.

Testing performance

We’ve had success using viztracer to create flame graphs for SDK-based packages and find if there are any serious performance bottlenecks.

You can start doing the same in your package. Start by installing viztracer.

$ poetry add --group dev viztracer

Then simply run your package’s CLI as normal, preceded by the viztracer command

$ poetry run viztracer my-tap
$ poetry run viztracer -- my-target --config=config.json --input=messages.json

That command will produce a result.json file which you can explore with the vizviewer tool.

$ poetry run vizviewer result.json

Thet output should look like this

SDK Flame Graph

Note: Chrome seems to work best for running the vizviewer app.