Inline Stream Maps¶

Introduction¶

SDK-based taps, targets, and mappers automatically support the custom inline mappings feature. Stream mappings can be applied to solve the following real-world applications.

Tip

In all examples below where null is used as a value, the special string "__NULL__" can be used instead.

Stream-Level Mapping Applications¶

Stream aliasing: streams can be aliased to provide custom naming downstream.
Stream filtering: streams records can be filtered based on any user-defined logic.
Stream duplication: streams can be split or duplicated and then sent as multiple distinct streams to the downstream target.

Property-Level Mapping Applications¶

Property-level aliasing: properties can be renamed in the resulting stream.
Property-level transformations: properties can be transformed inline.
Property-level exclusions: properties can be removed from the resulting stream.
Property-level additions: new properties can be created based on inline user-defined expressions.

Schema Flattening Applications¶

Flatten nested properties: separates large complex properties into multiple distinct fields.

For instance, a complex user property may look like this:

{
    // ...
    "user": {
        "first_name": "Jane",
        "last_name": "Carter",
        "id": "jcarter"
    }
}

Rather than receive the entire record as one large structure, flattening the record would output three distinct fields:

user__first_name
user__last_name
user__id

Flattening Configuration Options¶

When flattening is enabled, the following configuration options are available:

flattening_enabled: Set to true to enable schema flattening.
flattening_max_depth: The maximum depth of nested properties to flatten (required when flattening is enabled).
flattening_max_key_length: The maximum length of flattened key names (optional, defaults to 255 characters).

When a flattened key would exceed the flattening_max_key_length, the SDK automatically abbreviates parent key names to keep the total length under the limit while preserving all the key components.

Flattening Example¶

meltano.yml

flattening_enabled: true
flattening_max_depth: 1   # flatten only top-level properties
flattening_max_key_length: 100  # optional, defaults to 255

JSON

{
  "flattening_enabled": true,
  "flattening_max_depth": 1,
  "flattening_max_key_length": 100
}

Out-of-scope capabilities¶

These capabilities are all out of scope by design:

Mappers do not support aggregation.
- To aggregate data, first land the data and then apply aggregations using a transformation tool like dbt.
Mappers do not support joins between streams.
- To join data, first land the data and then perform joins using a transformation tool like dbt.
Mappers do not support external API lookups.
- To add external API lookups, you can either (a) land all your data and then joins using a transformation tool like dbt, or (b) create a custom mapper plugin with inline lookup logic.

A feature for all Singer users, enabled by the SDK¶

The mapping features described here are created for the users of SDK-based taps and targets, which support inline transformations with stream_maps and stream_map_config out-of-box.

Note: to support non-SDK taps and targets, the standalone inline mapper plugin meltano-map-transformer follows all specifications defined here and can apply mapping transformations between any Singer tap and target, even if they are not built using the SDK.

The following behaviors are implemented by the SDK automatically:

For taps, the SCHEMA and RECORD messages will automatically be transformed, duplicated, filtered, or aliased, as per the stream_maps config settings after all other tap-specific logic is executed.
- Because this process happens automatically after all other tap logic is executed, the tap developer does not have to write any custom handling logic.
- The tap development process is fully insulated from this ‘out-of-box’ functionality.
Similarly for targets, the received streams are processed by the stream_maps config setting prior to any Sink processing functions.
- This means that the target developer can assume that all streams and records are transformed, aliased, filtered, etc. before any custom target code is executed.
The standalone mapper plugin meltano-map-transformer is a hybrid tap/target which simply receives input from a tap, transforms all stream and schema messages via the stream_maps config option, and then emits the resulting stream(s) to a downstream target.
- A standalone mapper is not needed in cases where either the tap or target is built on the SDK (since either could accept the stream_maps config option) but it is useful in cases where using legacy taps or targets which do not yet support this functionality - or in cases where you want to run a one-time sync with special logic and otherwise keep tap and target config untouched.

Constructing the `stream_maps` config object¶

The stream_maps config expects a mapping of stream names to a structured transform object.

Here is a sample stream_maps transformation which obfuscates phone_number with a fake value, removes all references to email and adds email_domain and email_hash as new properties:

meltano.yml or config.json:

meltano.yml

stream_maps:
  # Apply these transforms to the stream called 'customers'
  customers:
    # drop the PII field from RECORD and SCHEMA messages
    email: __NULL__
    # capture just the email domain
    email_domain: owner_email.split('@')[-1]
    # for uniqueness checks
    email_hash: md5(config['hash_seed'] + owner_email)
    # generate a fake phone number
    phone_number: fake.phone_number()
stream_map_config:
  # hash outputs are not able to be replicated without the original seed:
  hash_seed: 01AWZh7A6DzGm6iJZZ2T
faker_config:
  # set specific seed
  seed: 0
  # set specific locales
  locale:
  - en_US
  - en_GB

JSON

{
    "stream_maps": {
        "customers": {
            "email": null,
            "email_domain": "owner_email.split('@')[-1]",
            "email_hash": "md5(config['hash_seed'] + owner_email)",
            "phone_number": "fake.phone_number()"
        }
    },
    "stream_map_config": {
        "hash_seed": "01AWZh7A6DzGm6iJZZ2T"
    },
    "faker_config": {
        "seed": 0,
        "locale": [
            "en_US",
            "en_GB"
        ]
    }
}

If map expressions should have access to special config, such as in the one-way hash algorithm above, define those config arguments within the optional stream_map_config setting. Values defined in stream_map_config will be available to expressions using the config dictionary.

Constructing Expressions¶

Expressions are defined and parsed using the simpleeval expression library. This library accepts most native python expressions and is extended by custom functions which have been declared within the SDK.

Compound Expressions¶

Starting in version 0.33.0, the SDK supports the use of simple comprehensions, e.g. [x + 1 for x in [1,2,3]]. This is a powerful feature which allows you to perform complex transformations on lists of values. For example, you can use comprehensions to filter out values in an array:

meltano.yml

stream_maps:
  users:
    id: id
    fields: "[f for f in fields if f['key'] != 'age']"

JSON

{
  "stream_maps": {
    "users": {
      "id": "id",
      "fields": "[f for f in fields if f['key'] != 'age']"
    }
  }
}

Accessing Stream Properties within Mapping Expressions¶

By default, all stream properties are made available via the property’s given name. For instance, assuming a field called customer_id in the stream, you can write customer_id.lower() to apply Python’s lower() function to all customer IDs.

Note: In some cases, property names may collide with built-in functions or keywords. To handle these cases, use the record name as described below (or its shorthand _).

These are all equivalent means of transforming the customer_id property of the current record:

customer_id.lower()
record['customer_id'].lower()
_['customer_id'].lower()

Other Built-in Functions and Names¶

Currently, there are a small handful of convenience functions and object aliases, which can be referenced directly by mapping expressions.

Built-In Functions¶

The following functions and namespaces are available for use in mapping expressions:

Function	Description
`md5()`	Returns an inline MD5 hash of any string, outputting the string representation of the hash’s hex digest. This is defined by the SDK internally with native python: `hashlib.md5(<input>.encode("utf-8")).hexdigest()`.
`sha256()`	Returns an inline SHA256 hash of any string, outputting the string representation of the hash’s hex digest. This is defined by the SDK internally with native python: `hashlib.sha256(<input>.encode("utf-8")).hexdigest()`.
`datetime`	This is the datetime module object from the Python standard library. You can access `datetime.datetime`, `datetime.timedelta`, etc.
`json`	This is the json module object from the Python standard library. Primarily used for calling `json.dumps()` and `json.loads()`.

Tip

With json.dumps(), you might want to pass the default argument. For example, json.dumps(obj, default=str) will call str() on an object if it is otherwise not serializable. This is useful for serializing datetime objects, decimal.Decimal instances, etc.

Built-in Variable Names¶

The following variables are available in the context of a mapping expression:

Variable	Description
`config`	A dictionary with the `stream_map_config` values from settings. This can be used to provide a secret hash seed, for instance.
`record`	An alias for the record values dictionary in the current stream.
`_`	Same as `record` but shorter to type.
`self`	The existing property value if the property already exists.
`fake`	A `Faker` instance, configurable via `faker_config` (see previous example) - see the built-in standard providers for available methods.
`__stream_name__`	The name of the stream. Useful when applying the same transformation to multiple streams. If used outside of `__alias__`, this will reference the aliased stream name.
`__original_stream_name__`	The original name of the stream. Useful when `__alias__` is specified but you want to reference the original (unaliased) stream name elsewhere (e.g. property mapping). If `__alias__` is not specified, this will reference the same value as `__stream_name__`.

Tip

To use the fake object, the faker library must be installed.

Added in version 0.35.0: The faker object.

Added in version 0.42.0: The __stream_name__ variable.

Added in version 0.48.0: The __original_stream_name__ variable.

Built-in Alias Variable Names¶

The following variables are available in the context of the __alias__ expression:

Variable	Description
`__stream_name__`	The existing stream name

Added in version 0.42.0: The __stream_name__ variable.

Automatic Schema Detection¶

For performance reasons, type detection is performed at runtime using text analysis of the provided expressions. Type detection is performed once per stream, prior to records being generated.

The following logic is applied in determining the SCHEMA of the transformed stream:

Calculations which begin with the text str(, float(, int(, or bool( will be assumed to be belonging to the specified type.
Otherwise, if the property already existed in the original stream, it will be assumed to have the same data type as the original stream.
Otherwise, if no type is detected using the above rules, any new stream properties will be assumed to be of type str .

Type Casting¶

Stream maps support explicit type casting to convert values to different data types. By wrapping expressions with type casting functions, you can ensure proper schema detection and data type conversion.

Supported Type Casting Functions¶

The following type casting functions are available in stream mapping expressions:

Function	Target Type	JSON Schema Type	Description
`int()`	Integer	`integer`	Converts a value to an integer type
`float()`	Decimal/Number	`number`	Converts a value to a floating-point number
`str()`	String	`string`	Converts a value to a string type
`bool()`	Boolean	`boolean`	Converts a value to a boolean type

Type Casting Examples¶

Converting string values to integers:

meltano.yml

stream_maps:
  mystream:
    # Convert a string literal to integer
    int_test: int('0')
    # Convert a calculation result to integer
    fixed_count: int(count - 1)
    # Extract year from date as integer
    create_year: int(datetime.date.fromisoformat(create_date).year)

JSON

{
  "stream_maps": {
    "mystream": {
      "int_test": "int('0')",
      "fixed_count": "int(count - 1)",
      "create_year": "int(datetime.date.fromisoformat(create_date).year)"
    }
  }
}

Converting values to floats:

meltano.yml

stream_maps:
  mystream:
    # Convert timestamp to float
    joined_timestamp: float(datetime.datetime.fromisoformat(joined_at).timestamp())

JSON

{
  "stream_maps": {
    "mystream": {
      "joined_timestamp": "float(datetime.datetime.fromisoformat(joined_at).timestamp())"
    }
  }
}

Converting values to strings:

meltano.yml

stream_maps:
  repositories:
    # Explicitly cast to string (useful for schema detection)
    description: str('[masked]')

JSON

{
  "stream_maps": {
    "repositories": {
      "description": "str('[masked]')"
    }
  }
}

Converting values to booleans:

meltano.yml

stream_maps:
  mystream:
    # Convert to boolean with null handling
    is_active: bool(status_value) if status_value else None

JSON

{
  "stream_maps": {
    "mystream": {
      "is_active": "bool(status_value) if status_value else None"
    }
  }
}

When to Use Type Casting¶

Type casting is particularly useful in the following scenarios:

Schema Detection Hints: When creating new calculated fields, wrap the expression in a type casting function to ensure the SDK correctly detects the output type in the schema.
Type Conversion: When you need to convert a value from one type to another (e.g., converting a numeric string to an actual integer).
Consistent Data Types: When working with data that may have inconsistent types in the source, you can enforce a specific type in the output.

Example combining type casting with conditional logic:

meltano.yml

stream_maps:
  nested_jellybean:
    # Extract custom field value and cast to integer, handling null values
    custom_field_2: >-
      int(dict([(x["id"], x["value"]) for x in custom_fields]).get(2))
      if dict([(x["id"], x["value"]) for x in custom_fields]).get(2)
      else None

JSON

{
  "stream_maps": {
    "nested_jellybean": {
      "custom_field_2": "int(dict([(x[\"id\"], x[\"value\"]) for x in custom_fields]).get(2)) if dict([(x[\"id\"], x[\"value\"]) for x in custom_fields]).get(2) else None"
    }
  }
}

Note

Type casting functions are standard Python built-ins that are available in the stream mapping expression evaluator. The SDK performs static type detection by examining the beginning of the expression string, so the type casting function must appear at the start of the expression for proper schema detection.

Known Limitations¶

The below functionality may be expanded or improved in the future. Please send us an Issue or MR if you are interested in contributing to these features.

No nested property declarations or removals¶

Only first-level properties may be added, removed, or transformed. This means, for example, that you can add or remove a top-level field called customer_email, but you cannot add or remove a nested email property if embedded in a customer json object.

Schema detection capabilities are limited¶

Schema detection currently relies on somewhat naive static text parsing. The workaround for the user is fairly trivial - which is to send hints by wrapping the entire expression in str(), float(), int(), etc. While this is perhaps not optimal, it meets our core requirement for static type evaluation with minimal config complexity.

Security Implications for Low-Trust Environments¶

While simpleeval does provide some isolation and sandboxing capabilities built-in, there are always security implications when allowing user-provided code to run on managed servers. For this reason, administrators should not permit arbitrary setting injection from untrusted users. As a rule, tap and target settings should never be permitted to be modified by untrusted users.

Else behavior currently limited to `null` assignment¶

The only operation currently allowed for the __else__ instruction is null, meaning to exclude any streams or properties not otherwise defined. In the future, we may add additional options or advanced logic. For instance, we could in the future add the ability to remove or treat a property from any stream in which it appears. We could also hash any properties not otherwise declared in the map (for PII reasons and to enable advanced testing scenarios).

Q&A¶

Q: How do stream map operations interact with stream selection via the Singer catalog metadata?¶

Answer: Stream maps are applied only after stream selection rules are applied. This means that if a stream or property is not selected, it will not be available for stream map operations. Stream maps are not intended to be a replacement for catalog-based selection, but they may be used to further refine streams beyond the original selection parameters.

Q: If streams are excluded by applying mapping rules, does the tap automatically skip them?¶

Answer: It depends. For SDK-based taps, yes. If an entire stream is specified to be excluded at the tap level, then the stream will be skipped exactly as if it were deselected in the catalog metadata.

If a stream is specified to be excluded at the target level, or in a standalone mapper between the tap and target, the filtering occurs downstream from the tap and therefore cannot affect the selection rules of the tap itself. Except in special test cases or in cases where runtime is trivial, we highly recommend implementing stream-level exclusions at the tap level rather than within the downstream target or mapper plugins.

Q: Why use a separate `stream_map_config` option instead of granting access to all `config` values?¶

Answer: The base-level config is also the primary mechanism for submitting auth secrets to the plugin. If we provided direct access to all config options, it would drastically increase the security risks associated with code injection and accidental or malicious leakage of credentials to downstream logs. By limiting to only those config values intended for use by the mapper, we significantly improve the security profile of the feature. Additionally, plugins are generally expected to fail if they receive unexpected config arguments. The intended use cases for stream map config values are user-defined in nature (such as the hashing use case defined above), and are unlikely to overlap with the plugin’s already-existing settings.

Q: What is the difference between `primary_keys` and `key_properties`?¶

Answer: These two are generally identical - and will only differ in cases like the above where key_properties is manually overridden or nullified by the user of the tap. Developers will specify primary_keys for each stream in the tap, but they do not control if the user will override key_properties behavior when initializing the stream. Primary keys describe the nature of the upstream data as known by the source system. However, either through manual catalog manipulation and/or by setting stream map transformations, the in-flight dedupe keys (key_properties) may be overridden or nullified by the user at any time.

Additionally, some targets do not support primary key distinctions, and there are valid use cases to intentionally unset the key_properties in an extract-load pipeline. For instance, it is common to intentionally nullify key properties to trigger “append-only” loading behavior in certain targets, as may be required for historical reporting. This does not change the underlying nature of the primary_key configuration in the upstream source data, only how it will be landed or deduped in the downstream source.

Q: How do I use Meltano environment variables to configure stream maps?¶

Answer: Environment variables in Meltano can be used to configure stream maps, but you first need to add the corresponding settings to your plugins settings option. For example:

plugins:
  extractors:
  - name: tap-csv
    variant: meltanolabs
    pip_url: git+https://github.com/MeltanoLabs/tap-csv.git
    settings:
    - name: stream_maps.customers.email
    - name: stream_maps.customers.email_domain
    - name: stream_maps.customers.email_hash
    - name: stream_maps.customers.__else__
    - name: stream_maps.stream_map_config

Then, you can set the following environment variables:

TAP_CSV_STREAM_MAPS_CUSTOMERS_EMAIL_DOMAIN='email.split("@")[-1]'
TAP_CSV_STREAM_MAPS_CUSTOMERS_EMAIL_HASH='md5(config["hash_seed"] + email)'
TAP_CSV_STREAM_MAP_CONFIG_HASH_SEED='01AWZh7A6DzGm6iJZZ2T'

Inline Stream Maps¶

Introduction¶

Stream-Level Mapping Applications¶

Property-Level Mapping Applications¶

Schema Flattening Applications¶

Flattening Configuration Options¶

Flattening Example¶

Out-of-scope capabilities¶

A feature for all Singer users, enabled by the SDK¶

Constructing the stream_maps config object¶

Constructing Expressions¶

Compound Expressions¶

Accessing Stream Properties within Mapping Expressions¶

Other Built-in Functions and Names¶

Built-In Functions¶

Built-in Variable Names¶

Built-in Alias Variable Names¶

Automatic Schema Detection¶

Type Casting¶

Supported Type Casting Functions¶

Type Casting Examples¶

When to Use Type Casting¶

Customized stream_map Behaviors¶

Removing a single stream or property¶

Remove all undeclared streams or properties¶

Unset or modify the stream’s primary key behavior¶

Add a property with a string literal value¶

Masking data with Faker¶

Aliasing a stream using __alias__¶

Duplicating or splitting a stream using __source__¶

Filtering out records from a stream using __filter__ operation¶

Aliasing properties¶

Applying a mapping across two or more streams¶

Aliasing two or more streams¶

Understanding Filters’ Affects on Parent-Child Streams¶

Known Limitations¶

No nested property declarations or removals¶

Schema detection capabilities are limited¶

Security Implications for Low-Trust Environments¶

Else behavior currently limited to null assignment¶

Q&A¶

Q: How do stream map operations interact with stream selection via the Singer catalog metadata?¶

Q: If streams are excluded by applying mapping rules, does the tap automatically skip them?¶

Q: Why use a separate stream_map_config option instead of granting access to all config values?¶

Q: What is the difference between primary_keys and key_properties?¶

Q: How do I use Meltano environment variables to configure stream maps?¶

Constructing the `stream_maps` config object¶

Customized `stream_map` Behaviors¶

Aliasing a stream using `alias`¶

Duplicating or splitting a stream using `source`¶

Filtering out records from a stream using `filter` operation¶

Else behavior currently limited to `null` assignment¶

Q: Why use a separate `stream_map_config` option instead of granting access to all `config` values?¶

Q: What is the difference between `primary_keys` and `key_properties`?¶