Ingestion mappings

This article describes ingestion mappings.

Ingestion mappings are used during ingestion to map incoming data to columns inside tables.

Data Explorer supports different types of mappings, both row-oriented (CSV, JSON, AVRO and W3CLOGFILE), and column-oriented (Parquet and ORC).

Ingestion mappings can be defined in the ingest command, or can be precreated and referenced from the ingest command using ingestionMappingReference parameters. Ingestion is possible without specifying a mapping. For more information, see identity mapping.

Each element in the mapping list is constructed from three fields:

Property	Required	Description
Column	✔️	Target column name in the table.
Datatype		Datatype with which to create the mapped column if it doesn’t already exist in the table.
Properties		Property-bag containing properties specific for each mapping as described in each specific mapping type page.

Supported mapping types

The following table defines mapping types to be used when ingesting or querying external data of a specific format.

Data Format	Mapping Type
CSV	CSV Mapping
TSV	CSV Mapping
TSVe	CSV Mapping
PSV	CSV Mapping
SCSV	CSV Mapping
SOHsv	CSV Mapping
TXT	CSV Mapping
RAW	CSV Mapping
JSON	JSON Mapping
AVRO	AVRO Mapping
APACHEAVRO	AVRO Mapping
Parquet	Parquet Mapping
ORC	ORC Mapping
W3CLOGFILE	W3CLOGFILE Mapping

Ingestion mapping examples

The following examples use the RawEvents table with the following schema:

.create table RawEvents (timestamp: datetime, deviceId: guid, messageId: guid, temperature: decimal, humidity: decimal)

Simple mapping

The following example shows ingestion where the mapping is defined in the ingest command. The command ingests a JSON file from a URL into the RawEvents table. The mapping specifies the path to each field in the JSON file.

.ingest into table RawEvents ('https://kustosamplefiles.blob.core.windows.net/jsonsamplefiles/simple.json') 
    with (
            format = "json",
            ingestionMapping =
            ```
            [ 
              {"column":"timestamp","Properties":{"path":"$.timestamp"}},
              {"column":"deviceId","Properties":{"path":"$.deviceId"}},
              {"column":"messageId","Properties":{"path":"$.messageId"}},
              {"column":"temperature","Properties":{"path":"$.temperature"}},
              {"column":"humidity","Properties":{"path":"$.humidity"}}
            ]
            ```
          )

Mapping with `ingestionMappingReference`

To map the same JSON file using a precreated mapping, create the RawEventMapping ingestion mapping reference with the following command:

.create table RawEvents ingestion json mapping 'RawEventMapping' 
  ```
  [ 
    {"column":"timestamp","Properties":{"path":"$.timestamp"}},
    {"column":"deviceId","Properties":{"path":"$.deviceId"}},
    {"column":"messageId","Properties":{"path":"$.messageId"}},
    {"column":"temperature","Properties":{"path":"$.temperature"}},
    {"column":"humidity","Properties":{"path":"$.humidity"}}
  ]
  ```

Ingest the JSON file using the RawEventMapping ingestion mapping reference with the following command:

.ingest into table RawEvents ('https://kustosamplefiles.blob.core.windows.net/jsonsamplefiles/simple.json') 
  with (
          format="json",
          ingestionMappingReference="RawEventMapping"
        )

Identity mapping

Ingestion is possible without specifying ingestionMapping or ingestionMappingReference properties. The data is mapped using an identity data mapping derived from the table’s schema. The table schema remains the same. format property should be specified. See ingestion formats.

Format type	Format	Mapping logic
Tabular data formats with defined order of columns, such as delimiter-separated or single-line formats.	CSV, TSV, TSVe, PSV, SCSV, Txt, SOHsv, Raw	All table columns are mapped in their respective order to data columns in order they appear in the data source. Column data type is taken from the table schema.
Formats with named columns or records with named fields.	JSON, Parquet, Avro, ApacheAvro, Orc, W3CLOGFILE	All table columns are mapped to data columns or record fields having the same name (case-sensitive). Column data type is taken from the table schema.

Mapping transformations

Some of the data format mappings (Parquet, JSON, and AVRO) support simple and useful ingest-time transformations. Where the scenario requires more complex processing at ingest time, use Update policy, which allows defining lightweight processing using KQL expression.

Path-dependant transformation	Description	Conditions
PropertyBagArrayToDictionary	Transforms JSON array of properties, such as `{events:[{"n1":"v1"},{"n2":"v2"}]}`, to dictionary and serializes it to valid JSON document, such as `{"n1":"v1","n2":"v2"}`.	Available for `JSON`, `Parquet`, `AVRO`, and `ORC` mapping types.
SourceLocation	Name of the storage artifact that provided the data, type string (for example, the blob’s “BaseUri” field).	Available for `CSV`, `JSON`, `Parquet`, `AVRO`, `ORC`, and `W3CLOGFILE` mapping types.
SourceLineNumber	Offset relative to that storage artifact, type long (starting with ‘1’ and incrementing per new record).	Available for `CSV`, `JSON`, `Parquet`, `AVRO`, `ORC`, and `W3CLOGFILE` mapping types.
DateTimeFromUnixSeconds	Converts number representing unix-time (seconds since 1970-01-01) to UTC datetime string.	Available for `CSV`, `JSON`, `Parquet`, `AVRO`, and `ORC` mapping types.
DateTimeFromUnixMilliseconds	Converts number representing unix-time (milliseconds since 1970-01-01) to UTC datetime string.	Available for `CSV`, `JSON`, `Parquet`, `AVRO`, and `ORC` mapping types.
DateTimeFromUnixMicroseconds	Converts number representing unix-time (microseconds since 1970-01-01) to UTC datetime string.	Available for `CSV`, `JSON`, `Parquet`, `AVRO`, and `ORC` mapping types.
DateTimeFromUnixNanoseconds	Converts number representing unix-time (nanoseconds since 1970-01-01) to UTC datetime string.	Available for `CSV`, `JSON`, `Parquet`, `AVRO`, and `ORC` mapping types.
DropMappedFields	Maps an object in the JSON document to a column and removes any nested fields already referenced by other column mappings.	Available for `JSON`, `Parquet`, `AVRO`, and `ORC` mapping types.
BytesAsBase64	Treats the data as byte array and converts it to a base64-encoded string.	Available for `AVRO` mapping type. For `ApacheAvro` format, the schema type of the mapped data field should be `bytes` or `fixed` Avro type. For `Avro` format, the field should be an array containing byte values from [0-255] range. `null` is ingested if the data doesn’t represent a valid byte array.

Mapping transformation examples

`DropMappedFields` transformation:

Given the following JSON contents:

{
    "Time": "2012-01-15T10:45",
    "Props": {
        "EventName": "CustomEvent",
        "Revenue": 0.456
    }
}

The following data mapping maps entire Props object into dynamic column Props while excluding already mapped columns (Props.EventName is already mapped into column EventName, so it’s excluded).

[
    { "Column": "Time", "Properties": { "Path": "$.Time" } },
    { "Column": "EventName", "Properties": { "Path": "$.Props.EventName" } },
    { "Column": "Props", "Properties": { "Path": "$.Props", "Transform":"DropMappedFields" } },
]

The ingested data looks as follows:

Time	EventName	Props
`2012-01-15T10:45`	`CustomEvent`	`{"Revenue": 0.456}`

`BytesAsBase64` transformation

Given the following AVRO file contents:

{
    "Time": "2012-01-15T10:45",
    "Props": {
        "id": [227,131,34,92,28,91,65,72,134,138,9,133,51,45,104,52]
    }
}

The following data mapping maps the ID column twice, with and without the transformation.

[
    { "Column": "ID", "Properties": { "Path": "$.props.id" } },
    { "Column": "Base64EncodedId", "Properties": { "Path": "$.props.id", "Transform":"BytesAsBase64" } },
]

The ingested data looks as follows:

ID	Base64EncodedId
`[227,131,34,92,28,91,65,72,134,138,9,133,51,45,104,52]`	`44MiXBxbQUiGigmFMy1oNA==`

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.