1 - AVRO Mapping

Learn how to use AVRO mapping to map data to columns inside tables upon ingestion.

Use AVRO mapping to map incoming data to columns inside tables when your ingestion source file is in AVRO format.

Each AVRO mapping element must contain either of the following optional properties:

PropertyTypeDescription
FieldstringName of the field in the AVRO record.
PathstringIf the value starts with $, it’s treated as the path to the field in the AVRO document. This path specifies the part of the AVRO document that becomes the content of the column in the table. The path that denotes the entire AVRO record is $. If the value doesn’t start with $, it’s treated as a constant value. Paths that include special characters should be escaped as ['Property Name']. For more information, see JSONPath syntax.
ConstValuestringThe constant value to be used for a column instead of some value inside the AVRO file.
TransformstringTransformation that should be applied on the content with mapping transformations.

Examples

JSON serialization

The following example mapping is serialized as a JSON string when provided as part of the .ingest management command.

[
  {"Column": "event_timestamp", "Properties": {"Field": "Timestamp"}},
  {"Column": "event_name",      "Properties": {"Field": "Name"}},
  {"Column": "event_type",      "Properties": {"Field": "Type"}},
  {"Column": "event_time",      "Properties": {"Field": "Timestamp", "Transform": "DateTimeFromUnixMilliseconds"}},
  {"Column": "ingestion_time",  "Properties": {"ConstValue": "2021-01-01T10:32:00"}},
  {"Column": "full_record",     "Properties": {"Path": "$"}}
]

Here the serialized JSON mapping is included in the context of the .ingest management command.

.ingest into Table123 (@"source1", @"source2")
  with
  (
      format = "AVRO",
      ingestionMapping =
      ```
      [
        {"Column": "column_a", "Properties": {"Field": "Field1"}},
        {"Column": "column_b", "Properties": {"Field": "$.[\'Field name with space\']"}}
      ]
      ```
  )

Precreated mapping

When the mapping is precreated, reference the mapping by name in the .ingest management command.

.ingest into Table123 (@"source1", @"source2")
    with
    (
        format="AVRO",
        ingestionMappingReference = "Mapping_Name"
    )

Identity mapping

Use AVRO mapping during ingestion without defining a mapping schema (see identity mapping).

.ingest into Table123 (@"source1", @"source2")
    with
    (
        format="AVRO"
    )

2 - CSV Mapping

Learn how to use CSV mapping to map data to columns inside tables upon ingestion.

Use CSV mapping to map incoming data to columns inside tables when your ingestion source file is any of the following delimiter-separated tabular formats: CSV, TSV, PSV, SCSV, SOHsv, TXT and RAW. For more information, see supported data formats.

Each CSV mapping element must contain either of the following optional properties:

PropertyTypeDescription
OrdinalintThe column order number in CSV.
ConstValuestringThe constant value to be used for a column instead of some value inside the CSV file.
TransformstringTransformation that should be applied on the content with mapping transformations. The only supported transformation by is SourceLocation.

Examples

[
  {"Column": "event_time", "Properties": {"Ordinal": "0"}},
  {"Column": "event_name", "Properties": {"Ordinal": "1"}},
  {"Column": "event_type", "Properties": {"Ordinal": "2"}},
  {"Column": "ingestion_time", "Properties": {"ConstValue": "2023-01-01T10:32:00"}}
  {"Column": "source_location", "Properties": {"Transform": "SourceLocation"}}
]

The mapping above is serialized as a JSON string when it’s provided as part of the .ingest management command.

.ingest into Table123 (@"source1", @"source2")
    with
    (
        format="csv",
        ingestionMapping =
        ```
        [
            {"Column": "event_time", "Properties": {"Ordinal": "0"}},
            {"Column": "event_name", "Properties": {"Ordinal": "1"}},
            {"Column": "event_type", "Properties": {"Ordinal": "2"}},
            {"Column": "ingestion_time", "Properties": {"ConstValue": "2023-01-01T10:32:00"}},
            {"Column": "source_location", "Properties": {"Transform": "SourceLocation"}}
        ]
        ```
    )

Pre-created mapping

When the mapping is pre-created, reference the mapping by name in the .ingest management command.

.ingest into Table123 (@"source1", @"source2")
    with
    (
        format="csv",
        ingestionMappingReference = "MappingName"
    )

Identity mapping

Use CSV mapping during ingestion without defining a mapping schema (see identity mapping).

.ingest into Table123 (@"source1", @"source2")
    with
    (
        format="csv"
    )

3 - Ingestion mappings

This article describes ingestion mappings.

Ingestion mappings are used during ingestion to map incoming data to columns inside tables.

Data Explorer supports different types of mappings, both row-oriented (CSV, JSON, AVRO and W3CLOGFILE), and column-oriented (Parquet and ORC).

Ingestion mappings can be defined in the ingest command, or can be precreated and referenced from the ingest command using ingestionMappingReference parameters. Ingestion is possible without specifying a mapping. For more information, see identity mapping.

Each element in the mapping list is constructed from three fields:

PropertyRequiredDescription
Column✔️Target column name in the table.
DatatypeDatatype with which to create the mapped column if it doesn’t already exist in the table.
PropertiesProperty-bag containing properties specific for each mapping as described in each specific mapping type page.

Supported mapping types

The following table defines mapping types to be used when ingesting or querying external data of a specific format.

Data FormatMapping Type
CSVCSV Mapping
TSVCSV Mapping
TSVeCSV Mapping
PSVCSV Mapping
SCSVCSV Mapping
SOHsvCSV Mapping
TXTCSV Mapping
RAWCSV Mapping
JSONJSON Mapping
AVROAVRO Mapping
APACHEAVROAVRO Mapping
ParquetParquet Mapping
ORCORC Mapping
W3CLOGFILEW3CLOGFILE Mapping

Ingestion mapping examples

The following examples use the RawEvents table with the following schema:

.create table RawEvents (timestamp: datetime, deviceId: guid, messageId: guid, temperature: decimal, humidity: decimal) 

Simple mapping

The following example shows ingestion where the mapping is defined in the ingest command. The command ingests a JSON file from a URL into the RawEvents table. The mapping specifies the path to each field in the JSON file.

.ingest into table RawEvents ('https://kustosamplefiles.blob.core.windows.net/jsonsamplefiles/simple.json') 
    with (
            format = "json",
            ingestionMapping =
            ```
            [ 
              {"column":"timestamp","Properties":{"path":"$.timestamp"}},
              {"column":"deviceId","Properties":{"path":"$.deviceId"}},
              {"column":"messageId","Properties":{"path":"$.messageId"}},
              {"column":"temperature","Properties":{"path":"$.temperature"}},
              {"column":"humidity","Properties":{"path":"$.humidity"}}
            ]
            ```
          )

Mapping with ingestionMappingReference

To map the same JSON file using a precreated mapping, create the RawEventMapping ingestion mapping reference with the following command:

.create table RawEvents ingestion json mapping 'RawEventMapping' 
  ```
  [ 
    {"column":"timestamp","Properties":{"path":"$.timestamp"}},
    {"column":"deviceId","Properties":{"path":"$.deviceId"}},
    {"column":"messageId","Properties":{"path":"$.messageId"}},
    {"column":"temperature","Properties":{"path":"$.temperature"}},
    {"column":"humidity","Properties":{"path":"$.humidity"}}
  ]
  ```

Ingest the JSON file using the RawEventMapping ingestion mapping reference with the following command:

.ingest into table RawEvents ('https://kustosamplefiles.blob.core.windows.net/jsonsamplefiles/simple.json') 
  with (
          format="json",
          ingestionMappingReference="RawEventMapping"
        )

Identity mapping

Ingestion is possible without specifying ingestionMapping or ingestionMappingReference properties. The data is mapped using an identity data mapping derived from the table’s schema. The table schema remains the same. format property should be specified. See ingestion formats.

Format typeFormatMapping logic
Tabular data formats with defined order of columns, such as delimiter-separated or single-line formats.CSV, TSV, TSVe, PSV, SCSV, Txt, SOHsv, RawAll table columns are mapped in their respective order to data columns in order they appear in the data source. Column data type is taken from the table schema.
Formats with named columns or records with named fields.JSON, Parquet, Avro, ApacheAvro, Orc, W3CLOGFILEAll table columns are mapped to data columns or record fields having the same name (case-sensitive). Column data type is taken from the table schema.

Mapping transformations

Some of the data format mappings (Parquet, JSON, and AVRO) support simple and useful ingest-time transformations. Where the scenario requires more complex processing at ingest time, use Update policy, which allows defining lightweight processing using KQL expression.

Path-dependant transformationDescriptionConditions
PropertyBagArrayToDictionaryTransforms JSON array of properties, such as {events:[{"n1":"v1"},{"n2":"v2"}]}, to dictionary and serializes it to valid JSON document, such as {"n1":"v1","n2":"v2"}.Available for JSON, Parquet, AVRO, and ORC mapping types.
SourceLocationName of the storage artifact that provided the data, type string (for example, the blob’s “BaseUri” field).Available for CSV, JSON, Parquet, AVRO, ORC, and W3CLOGFILE mapping types.
SourceLineNumberOffset relative to that storage artifact, type long (starting with ‘1’ and incrementing per new record).Available for CSV, JSON, Parquet, AVRO, ORC, and W3CLOGFILE mapping types.
DateTimeFromUnixSecondsConverts number representing unix-time (seconds since 1970-01-01) to UTC datetime string.Available for CSV, JSON, Parquet, AVRO, and ORC mapping types.
DateTimeFromUnixMillisecondsConverts number representing unix-time (milliseconds since 1970-01-01) to UTC datetime string.Available for CSV, JSON, Parquet, AVRO, and ORC mapping types.
DateTimeFromUnixMicrosecondsConverts number representing unix-time (microseconds since 1970-01-01) to UTC datetime string.Available for CSV, JSON, Parquet, AVRO, and ORC mapping types.
DateTimeFromUnixNanosecondsConverts number representing unix-time (nanoseconds since 1970-01-01) to UTC datetime string.Available for CSV, JSON, Parquet, AVRO, and ORC mapping types.
DropMappedFieldsMaps an object in the JSON document to a column and removes any nested fields already referenced by other column mappings.Available for JSON, Parquet, AVRO, and ORC mapping types.
BytesAsBase64Treats the data as byte array and converts it to a base64-encoded string.Available for AVRO mapping type. For ApacheAvro format, the schema type of the mapped data field should be bytes or fixed Avro type. For Avro format, the field should be an array containing byte values from [0-255] range. null is ingested if the data doesn’t represent a valid byte array.

Mapping transformation examples

DropMappedFields transformation:

Given the following JSON contents:

{
    "Time": "2012-01-15T10:45",
    "Props": {
        "EventName": "CustomEvent",
        "Revenue": 0.456
    }
}

The following data mapping maps entire Props object into dynamic column Props while excluding already mapped columns (Props.EventName is already mapped into column EventName, so it’s excluded).

[
    { "Column": "Time", "Properties": { "Path": "$.Time" } },
    { "Column": "EventName", "Properties": { "Path": "$.Props.EventName" } },
    { "Column": "Props", "Properties": { "Path": "$.Props", "Transform":"DropMappedFields" } },
]

The ingested data looks as follows:

TimeEventNameProps
2012-01-15T10:45CustomEvent{"Revenue": 0.456}

BytesAsBase64 transformation

Given the following AVRO file contents:

{
    "Time": "2012-01-15T10:45",
    "Props": {
        "id": [227,131,34,92,28,91,65,72,134,138,9,133,51,45,104,52]
    }
}

The following data mapping maps the ID column twice, with and without the transformation.

[
    { "Column": "ID", "Properties": { "Path": "$.props.id" } },
    { "Column": "Base64EncodedId", "Properties": { "Path": "$.props.id", "Transform":"BytesAsBase64" } },
]

The ingested data looks as follows:

IDBase64EncodedId
[227,131,34,92,28,91,65,72,134,138,9,133,51,45,104,52]44MiXBxbQUiGigmFMy1oNA==

4 - JSON Mapping

Learn how to use JSON mapping to map data to columns inside tables upon ingestion.

Use JSON mapping to map incoming data to columns inside tables when your ingestion source file is in JSON format.

Each JSON mapping element must contain either of the following optional properties:

PropertyTypeDescription
PathstringIf the value starts with $ it’s interpreted as the JSON path to the field in the JSON document that will become the content of the column in the table. The JSON path that denotes the entire document is $. If the value doesn’t start with $ it’s interpreted as a constant value. JSON paths that include special characters should be escaped as ['Property Name']. For more information, see JSONPath syntax.
ConstValuestringThe constant value to be used for a column instead of some value inside the JSON file.
TransformstringTransformation that should be applied on the content with mapping transformations.

Examples

[
  {"Column": "event_timestamp", "Properties": {"Path": "$.Timestamp"}},
  {"Column": "event_name",      "Properties": {"Path": "$.Event.Name"}},
  {"Column": "event_type",      "Properties": {"Path": "$.Event.Type"}},
  {"Column": "source_uri",      "Properties": {"Transform": "SourceLocation"}},
  {"Column": "source_line",     "Properties": {"Transform": "SourceLineNumber"}},
  {"Column": "event_time",      "Properties": {"Path": "$.Timestamp", "Transform": "DateTimeFromUnixMilliseconds"}},
  {"Column": "ingestion_time",  "Properties": {"ConstValue": "2021-01-01T10:32:00"}},
  {"Column": "full_record",     "Properties": {"Path": "$"}}
]

The mapping above is serialized as a JSON string when it’s provided as part of the .ingest management command.

.ingest into Table123 (@"source1", @"source2")
  with
  (
      format = "json",
      ingestionMapping =
      ```
      [
        {"Column": "column_a", "Properties": {"Path": "$.Obj.Property"}},
        {"Column": "column_b", "Properties": {"Path": "$.Property"}},
        {"Column": "custom_column", "Properties": {"Path": "$.[\'Property name with space\']"}}
      ]
      ```
  )

Pre-created mapping

When the mapping is pre-created, reference the mapping by name in the .ingest management command.

.ingest into Table123 (@"source1", @"source2")
    with
    (
        format="json",
        ingestionMappingReference = "Mapping_Name"
    )

Identity mapping

Use JSON mapping during ingestion without defining a mapping schema (see identity mapping).

.ingest into Table123 (@"source1", @"source2")
    with
    (
        format="json"
    )

Copying JSON mapping

You can copy JSON mapping of an existing table and create a new table with the same mapping using the following process:

  1. Run the following command on the table whose mapping you want to copy:

    .show table TABLENAME ingestion json mappings
    | extend formatted_mapping = strcat("'",replace_string(Mapping, "'", "\\'"),"'")
    | project formatted_mapping
    
  2. Use the output of the above command to create a new table with the same mapping:

    .create table TABLENAME ingestion json mapping "TABLENAME_Mapping" RESULT_OF_ABOVE_CMD
    

5 - ORC Mapping

Learn how to use ORC mapping to map data to columns inside tables upon ingestion.

Use ORC mapping to map incoming data to columns inside tables when your ingestion source file is in ORC format.

Each ORC mapping element must contain either of the following optional properties:

PropertyTypeDescription
FieldstringName of the field in the ORC record.
PathstringIf the value starts with $ it’s interpreted as the path to the field in the ORC document that will become the content of the column in the table. The path that denotes the entire ORC record is $. If the value doesn’t start with $ it’s interpreted as a constant value. Paths that include special characters should be escaped as ['Property Name']. For more information, see JSONPath syntax.
ConstValuestringThe constant value to be used for a column instead of some value inside the ORC file.
TransformstringTransformation that should be applied on the content with mapping transformations.

Examples

[
  {"Column": "event_timestamp", "Properties": {"Path": "$.Timestamp"}},
  {"Column": "event_name",      "Properties": {"Path": "$.Event.Name"}},
  {"Column": "event_type",      "Properties": {"Path": "$.Event.Type"}},
  {"Column": "event_time",      "Properties": {"Path": "$.Timestamp", "Transform": "DateTimeFromUnixMilliseconds"}},
  {"Column": "ingestion_time",  "Properties": {"ConstValue": "2021-01-01T10:32:00"}},
  {"Column": "full_record",     "Properties": {"Path": "$"}}
]

The mapping above is serialized as a JSON string when it’s provided as part of the .ingest management command.

.ingest into Table123 (@"source1", @"source2")
  with
  (
      format = "orc",
      ingestionMapping =
      ```
      [
        {"Column": "column_a", "Properties": {"Path": "$.Field1"}},
        {"Column": "column_b", "Properties": {"Path": "$.[\'Field name with space\']"}}
      ]
      ```
  )

Pre-created mapping

When the mapping is pre-created, reference the mapping by name in the .ingest management command.

.ingest into Table123 (@"source1", @"source2")
    with
    (
        format="orc",
        ingestionMappingReference = "ORC_Mapping"
    )

Identity mapping

Use ORC mapping during ingestion without defining a mapping schema (see identity mapping).

.ingest into Table123 (@"source1", @"source2")
    with
    (
        format="orc"
    )

6 - Parquet Mapping

Learn how to use Parquet mapping to map data to columns inside tables upon ingestion and optimize data processing in Kusto.

Use Parquet mapping to map incoming data to columns inside tables when your ingestion source file is in Parquet format.

Each Parquet mapping element must contain either of the following optional properties:

PropertyTypeDescription
FieldstringName of the field in the Parquet record.
PathstringIf the value starts with $ it’s interpreted as the path to the field in the Parquet document that will become the content of the column in the table. The path that denotes the entire Parquet record is $. If the value doesn’t start with $ it’s interpreted as a constant value. Paths that include special characters should be escaped as ['Property Name']. For more information, see JSONPath syntax.
ConstValuestringThe constant value to be used for a column instead of some value inside the Parquet file.
TransformstringTransformation that should be applied on the content with mapping transformations.

Parquet type conversions

Comprehensive support is provided for converting data types when you’re ingesting or querying data from a Parquet source.

The following table provides a mapping of Parquet field types, and the table column types they can be converted to. The first column lists the Parquet type, and the others show the table column types they can be converted to.

Parquet typeboolintlongrealdecimaldatetimetimespanstringguiddynamic
INT8✔️✔️✔️✔️✔️✔️
INT16✔️✔️✔️✔️✔️✔️
INT32✔️✔️✔️✔️✔️✔️
INT64✔️✔️✔️✔️✔️✔️
UINT8✔️✔️✔️✔️✔️✔️
UINT16✔️✔️✔️✔️✔️✔️
UINT32✔️✔️✔️✔️✔️✔️
UINT64✔️✔️✔️✔️✔️
FLOAT32✔️✔️✔️✔️✔️✔️
FLOAT64✔️✔️✔️✔️✔️✔️
BOOLEAN✔️✔️✔️
DECIMAL (I32)✔️✔️✔️✔️✔️✔️
DECIMAL (I64)✔️✔️✔️✔️✔️✔️
DECIMAL (FLBA)✔️✔️✔️✔️
DECIMAL (BA)✔️✔️✔️✔️✔️✔️
TIMESTAMP✔️✔️
DATE✔️✔️
STRING✔️✔️✔️✔️✔️✔️
UUID✔️✔️
JSON✔️✔️
LIST✔️
MAP✔️
STRUCT✔️

Examples

[
  {"Column": "event_timestamp", "Properties": {"Path": "$.Timestamp"}},
  {"Column": "event_name",      "Properties": {"Path": "$.Event.Name"}},
  {"Column": "event_type",      "Properties": {"Path": "$.Event.Type"}},
  {"Column": "event_time",      "Properties": {"Path": "$.Timestamp", "Transform": "DateTimeFromUnixMilliseconds"}},
  {"Column": "ingestion_time",  "Properties": {"ConstValue": "2021-01-01T10:32:00"}},
  {"Column": "full_record",     "Properties": {"Path": "$"}}
]

The mapping above is serialized as a JSON string when it’s provided as part of the .ingest management command.

.ingest into Table123 (@"source1", @"source2")
  with
  (
    format = "parquet",
    ingestionMapping =
    ```
    [
      {"Column": "column_a", "Properties": {"Path": "$.Field1.Subfield"}},
      {"Column": "column_b", "Properties": {"Path": "$.[\'Field name with space\']"}},
    ]
    ```
  )

Pre-created mapping

When the mapping is pre-created, reference the mapping by name in the .ingest management command.

.ingest into Table123 (@"source1", @"source2")
  with
  (
      format="parquet",
      ingestionMappingReference = "Mapping_Name"
  )

Identity mapping

Use Parquet mapping during ingestion without defining a mapping schema (see identity mapping).

.ingest into Table123 (@"source1", @"source2")
  with
  (
    format="parquet"
  )

7 - Queued ingestion commands use case

Learn how to ingest historical data using the queued ingestion commands.

The queued ingestion commands allow you to ingest individual blobs by URL or ingest batches of data by listing folders or containers. This article walks through a common use case: fine-tuning the ingestion of historical data. You can use these commands to test how historical data is ingested and resolve any issues before performing full ingestion. The following tasks demonstrate how to use queued ingestion commands effectively:

Ingest single blobs (Preview)

You can start by ingesting a single blob directly using its URL. Make sure to include a SAS token or use a managed identity to grant the service permission to access and download the blob.

.ingest-from-storage-queued into table database('TestDatabase').Logs
with (format='csv', enableTracking=true)
<|
'https://sample.blob.core.windows.net/sample/test_*csv?...'

Output

IngestionOperationIdClientRequestIdOperationInfo
00001111;11112222;00001111-aaaa-2222-bbbb-3333cccc4444Kusto.Web.KWE,Query;11112222;11112222;22223333-bbbb-3333-cccc-4444cccc5555.show queued ingestion operations “00001111;11112222;00001111-aaaa-2222-bbbb-3333cccc4444”

List blobs in a folder

To understand the historical data better, you list a maximum of 10 blobs from the Azure blob storage container.

.list blobs (
    "https://<BlobStorageLocation>/<FolderName>;managed_identity=system"
)
MaxFiles=10

Output

BlobUriSizeInBytesCapturedVariables
https://<BlobStorageLocation>/<FolderName>/part-100.parquet7,429,062{}
https://<BlobStorageLocation>/<FolderName>/part-101.parquet262,610{}
https://<BlobStorageLocation>/<FolderName>/part-102.parquet6,154,166{}
https://<BlobStorageLocation>/<FolderName>/part-103.parquet7,460,408{}
https://<BlobStorageLocation>/<FolderName>/part-104.parquet6,154,166{}
https://<BlobStorageLocation>/<FolderName>/part-105.parquet7,441,587{}
https://<BlobStorageLocation>/<FolderName>/part-106.parquet1,087,425{}
https://<BlobStorageLocation>/<FolderName>/part-107.parquet6,238,357{}
https://<BlobStorageLocation>/<FolderName>/part-208.csv7,460,408{}
https://<BlobStorageLocation>/<FolderName>/part-109.parquet6,338,148{}

You can now verify if the blobs are the correct blobs to ingest.

Ingest folder

Next you queue 10 parquet files for ingestion into the Logs table in the TestDatabase database with tracking enabled for the ingestion.

.ingest-from-storage-queued into table database('TestDatabase').Logs
EnableTracking=true
with (format='parquet')
<|
    .list blobs (
        "https://<BlobStorageLocation>/<FolderName>;managed_identity=system"
    )
    MaxFiles=10

Output

IngestionOperationIdClientRequestIdOperationInfo
00001111;11112222;00001111-aaaa-2222-bbbb-3333cccc4444Kusto.Web.KWE,Query;11112222;11112222;22223333-bbbb-3333-cccc-4444cccc5555.show queued ingestion operations “00001111;11112222;00001111-aaaa-2222-bbbb-3333cccc4444”

The OperationInfo, which includes the IngestionOperationId, is then used to track the ingestion status.

Track ingestion status

You run the .show queued ingestion operations command to check whether the ingestion is complete or if there are any errors.

.show queued ingestion operations "00001111;11112222;00001111-aaaa-2222-bbbb-3333cccc4444"

Output

IngestionOperationIdStarted OnLast Updated OnStateDiscoveredInProgressIngestedFailedCanceledSampleFailedReasonsDatabaseTable
00001111;11112222;00001111-aaaa-2222-bbbb-3333cccc44442025-03-19 14:57:41.00000002025-01-10 15:15:04.0000000Completed1001000TestDatabaseLogs

If the State isn’t Completed, you can run the .show queued ingestion operations again. Running it again allows you to monitor the increase in the number of ingested blobs until the State changes to Completed. You can also cancel the ingestion, if necessary.

Filter queued files for ingestion

After the results of the ingestion are examined, another attempt at listing blobs for ingestion is made. This time the parquet suffix is added to ensure that only parquet files are ingested.

.list blobs (
    "https://<BlobStorageLocation>/<FolderName>;managed_identity=system"
)
Suffix="parquet"
MaxFiles=10

Output

BlobUriSizeInBytesCapturedVariables
https://<BlobStorageLocation>/<FolderName>/part-100.parquet7,429,062{}
https://<BlobStorageLocation>/<FolderName>/part-101.parquet262,610{}
https://<BlobStorageLocation>/<FolderName>/part-102.parquet6,154,166{}
https://<BlobStorageLocation>/<FolderName>/part-103.parquet7,460,408{}
https://<BlobStorageLocation>/<FolderName>/part-104.parquet6,154,166{}
https://<BlobStorageLocation>/<FolderName>/part-105.parquet7,441,587{}
https://<BlobStorageLocation>/<FolderName>/part-106.parquet1,087,425{}
https://<BlobStorageLocation>/<FolderName>/part-107.parquet6,238,357{}
https://<BlobStorageLocation>/<FolderName>/part-108.parquet7,460,408{}
https://<BlobStorageLocation>/<FolderName>/part-109.parquet6,338,148{}

Capture the creation time

A path format is added to capture the creation time.

.list blobs (
    "https://<BlobStorageLocation>/<FolderName>;managed_identity=system"
)
Suffix="parquet"
MaxFiles=10
PathFormat=("output/03/Year=" datetime_pattern("yyyy'/Month='MM'/Day='dd", creationTime) "/")

Output

BlobUriSizeInBytesCapturedVariables
https://<BlobStorageLocation>/<FolderName>/output/03/Year=2025/Month=03/Day=20/Hour=00/part-100.parquet7,429,062{“creationTime”: “03/20/2025 00:00:00”}
https://<BlobStorageLocation>/<FolderName>/output/03/Year=2025/Month=03/Day=20/Hour=00/part-101.parquet262,610{“creationTime”: “03/20/2025 00:00:00”}
https://<BlobStorageLocation>/<FolderName>/output/03/Year=2025/Month=03/Day=20/Hour=00/part-102.parquet6,154,166{“creationTime”: “03/20/2025 00:00:00”}
https://<BlobStorageLocation>/<FolderName>/output/03/Year=2025/Month=03/Day=20/Hour=00/part-103.parquet7,460,408{“creationTime”: “03/20/2025 00:00:00”}
https://<BlobStorageLocation>/<FolderName>/output/03/Year=2025/Month=03/Day=20/Hour=00/part-104.parquet6,154,166{“creationTime”: “03/20/2025 00:00:00”}
https://<BlobStorageLocation>/<FolderName>/output/03/Year=2025/Month=03/Day=20/Hour=00/part-105.parquet7,441,587{“creationTime”: “03/20/2025 00:00:00”}
https://<BlobStorageLocation>/<FolderName>/output/03/Year=2025/Month=03/Day=20/Hour=00/part-106.parquet1,087,425{“creationTime”: “03/20/2025 00:00:00”}
https://<BlobStorageLocation>/<FolderName>/output/03/Year=2025/Month=03/Day=20/Hour=00/part-107.parquet6,238,357{“creationTime”: “03/20/2025 00:00:00”}
https://<BlobStorageLocation>/<FolderName>/output/03/Year=2025/Month=03/Day=20/Hour=00/part-108.parquet7,460,408{“creationTime”: “03/20/2025 00:00:00”}
https://<BlobStorageLocation>/<FolderName>/output/03/Year=2025/Month=03/Day=20/Hour=00/part-109.parquet6,338,148{“creationTime”: “03/20/2025 00:00:00”}

The CapturedVariables column dates match the dates specified in the BlobUri column.

Ingest 20 files

Now 20 files in parquet format are ingested from the Azure blob storage container, along with their creation time.

.ingest-from-storage-queued into table database('TestDatabase').Logs
EnableTracking=true
with (format='parquet')
<|
    .list blobs (
        "https://<BlobStorageLocation>/<FolderName>;managed_identity=system"
    )
    Suffix="parquet"
    MaxFiles=20
    PathFormat=("output/03/Year=" datetime_pattern("yyyy'/Month='MM'/Day='dd", creationTime) "/")

Output

IngestionOperationIdClientRequestIdOperationInfo
22223333;22223333;11110000-bbbb-2222-cccc-4444dddd5555Kusto.Web.KWE,Query;22223333;22223333;33334444-dddd-4444-eeee-5555eeee5555.show queued ingestion operations “22223333;22223333;11110000-bbbb-2222-cccc-4444dddd5555”

The OperationInfo is then used to track the ingestion status.

Track follow up ingestion status

The .show queued ingestion operations command is run to check whether there are any issues with this ingestion.

.show queued ingestion operations "22223333;22223333;11110000-bbbb-2222-cccc-4444dddd5555"

Output

|IngestionOperationId|Started On |Last Updated On |State |Discovered |InProgress|Canceled|Ingested |Failed|Canceled |SampleFailedReasons|Database|Table| |–|–|–|–|–|–|–|–|–|–|–|–| |22223333;22223333;11110000-bbbb-2222-cccc-4444dddd5555 |2025-02-20 14:57:41.0000000 | | InProgress| 10 |10 |0 |0 |0 | |TestDatabase|Logs|

The .show extents command is run to check whether extents are created with an anterior date for data integrity and historical accuracy.

.show table Logs extents

The MinCreatedOn and MaxCreatedOn values should show the data creation time, rather than the data ingestion time. For more information about these returns, see .show extents.

You can cancel the ingestion, if necessary.

Perform your full ingestion

By running queued ingestion commands on a sample, you can discover problems your ingestion might encounter. Now that you fixed them, you’re ready to ingest all your historical data and wait for the full ingestion to complete.

Cancel ingestion

At any time during the ingestion process, you can cancel your queued ingestion.

.cancel queued ingestion operation '22223333;22223333;11110000-bbbb-2222-cccc-4444dddd5555'

Output

IngestionOperationIdStarted OnLast Updated OnStateDiscoveredPendingCanceledIngestedFailedSampleFailedReasonsDatabaseTable
00001111;11112222;00001111-aaaa-2222-bbbb-3333cccc44442025-03-20 15:03:11.0000000Canceled1010000TestDatabaseLogs

You can then roll back the ingestion, fix the issues, and rerun the ingestion.

8 - Queued ingestion overview commands

Learn about queued ingestion and its commands.

Queued ingestion commands allow you to ingest specific folders, or an entire container and manage the operations related to queued ingestion. You can also ingest multiple or individual blobs by URL and from a source file. The ingestion commands are useful for preparing and testing distinct ingestion scenarios before the final ingestion. Using them helps to ensure that fields, columns, partitioning, and other needs are handled properly during ingestion.

Management commands

Queued storage commands include:

9 - W3CLOGFILE Mapping

Learn how to use W3CLOGFILE mapping to map data to columns inside tables upon ingestion.

Use W3CLOGFILE mapping to map incoming data to columns inside tables when your ingestion source file is in W3CLOGFILE format.

Each W3CLOGFILE mapping element must contain either of the following optional properties:

PropertyTypeDescription
FieldstringName of the field in the W3CLOGFILE log record.
ConstValuestringThe constant value to be used for a column instead of some value inside the W3CLOGFILE file.
TransformstringTransformation that should be applied on the content with mapping transformations.

Examples

[
   {"Column": "Date",          "Properties": {"Field": "date"}},
   {"Column": "Time",          "Properties": {"Field": "time"}},
   {"Column": "IP",            "Properties": {"Field": "s-ip"}},
   {"Column": "ClientMethod",  "Properties": {"Field": "cs-method"}},
   {"Column": "ClientQuery",   "Properties": {"Field": "cs-uri-query"}},
   {"Column": "ServerPort",    "Properties": {"Field": "s-port"}},
   {"Column": "ClientIP",      "Properties": {"Field": "c-ip"}},
   {"Column": "UserAgent",     "Properties": {"Field": "cs(User-Agent)"}},
   {"Column": "Referer",       "Properties": {"Field": "cs(Referer)"}},
   {"Column": "Status",        "Properties": {"Field": "sc-status"}},
   {"Column": "ResponseBytes", "Properties": {"Field": "sc-bytes"}},
   {"Column": "RequestBytes",  "Properties": {"Field": "cs-bytes"}},
   {"Column": "TimeTaken",     "Properties": {"Field": "time-taken"}}
]

The mapping above is serialized as a JSON string when it’s provided as part of the .ingest management command.

.ingest into Table123 (@"source1", @"source2")
  with
  (
      format = "w3clogfile",
      ingestionMapping =
      ```
      [
         {"Column": "column_a", "Properties": {"Field": "field1"}},
         {"Column": "column_b", "Properties": {"Field": "field2"}}
      ]
      ```
  )

Pre-created mapping

When the mapping is pre-created, reference the mapping by name in the .ingest management command.

.ingest into Table123 (@"source1", @"source2")
    with
    (
        format="w3clogfile",
        ingestionMappingReference = "Mapping_Name"
    )

Identity mapping

Use W3CLOGFILE mapping during ingestion without defining a mapping schema (see identity mapping).

.ingest into Table123 (@"source1", @"source2")
    with
    (
        format="w3clogfile"
    )

10 - .ingest inline command (push)

This article describes the .ingest inline command (push).

This command inserts data into a table by pushing the data included within the command to the table.

Permissions

You must have at least Table Ingestor permissions to run this command.

Syntax

.ingest inline into table TableName [with ( IngestionPropertyName = IngestionPropertyValue [, …] )] <| Data

.ingest inline into table TableName [with ( IngestionPropertyName = IngestionPropertyValue [, …] )] [ Data ]

Parameters

NameTypeRequiredDescription
TableNamestring✔️The name of the table into which to ingest data. The table name is always relative to the database in context. Its schema is the default schema assumed for the data if no schema mapping object is provided.
Datastring✔️The data content to ingest. Unless otherwise modified by the ingestion properties, this content is parsed as CSV.
IngestionPropertyName, IngestionPropertyValuestringAny number of ingestion properties that affect the ingestion process.

Returns

The result is a table with as many records as the number of generated data shards (“extents”). If no data shards are generated, a single record is returned with an empty (zero-valued) extent ID.

NameTypeDescription
ExtentIdguidThe unique identifier for the data shard that’s generated by the command.

Examples

Ingest with <| syntax

The following command ingests data into a table Purchases with two columns: SKU (of type string) and Quantity (of type long).

.ingest inline into table Purchases <|
    Shoes,1000
    Wide Shoes,50
    "Coats black",20
    "Coats with ""quotes""",5

Ingest with bracket syntax

The following command ingests data into a table Logs with two columns: Date (of type datetime) and EventDetails (of type dynamic).

.ingest inline into table Logs
    [2015-01-01,"{""EventType"":""Read"", ""Count"":""12""}"]
    [2015-01-01,"{""EventType"":""Write"", ""EventValue"":""84""}"]

11 - .show data operations

Learn how to use the .show data operations command to return data operations that reached a final state.

Returns a table with data operations that reached a final state. Data operations are available for 30 days from when they ran.

Any operation that results in new extents (data shards) added to the system is considered a data operation.

Permissions

You must have Database Admin or Database Monitor permissions to see any data operations invoked on your database.

Any user can see their own data operations.

For more information, see Kusto role-based access control.

Syntax

.show data operations

Returns

This command returns a table with the following columns:

Output parameterTypeDescription
TimestampdatetimeThe time when the operation reached its final state.
DatabasestringThe database name.
TablestringThe table name.
ClientActivityIdstringThe operation client activity ID.
OperationKindstringOne of BatchIngest, SetOrAppend, RowStoreSeal, MaterializedView, QueryAcceleration, and UpdatePolicy.
OriginalSizelongThe original size of the ingested data.
ExtentSizelongThe extent size.
RowCountlongThe number of rows in the extent.
ExtentCountintThe number of extents.
TotalCputimespanThe total CPU time used by the data operation.
DurationtimespanThe duration of the operation.
PrincipalstringThe identity that initiated the data operation.
PropertiesdynamicAdditional information about the data operation.

Example

The following example returns information about UpdatePolicy, BatchIngest, and SetOrAppend operations.

.show data operations

Output

TimestampDatabaseTableClientActivityIdOperationKindOriginalSizeExtentSizeRowCountExtentCountTotalCpuDurationPrincipalProperties
2024-07-18 15:21:10.5432134TestLogsUTResultsDM.IngestionExecutor;abcd1234-1234-1234-abcd-1234abcdce;1UpdatePolicy100,82975,578279100:00:00.265625000:00:28.9101535aadapp=xxx{“SourceTable”: “UTLogs”}
2024-07-18 15:21:12.9481819TestLogsUTLogsDM.IngestionExecutor;abcd1234-1234-1234-abcd-1234abcdce;1BatchIngest1,045,027,298123,067,9471,688,705200:00:22.984375000:00:29.9745733aadapp=xxx{“Format”: “Csv”,“NumberOfInputStreams”:2}
2024-07-18 15:21:16.1095441KustoAutoIncidentKustoGPTSummarycdef12345-6789-ghij-0123-klmn45678SetOrAppend1,4203,1901100:00:00.015625000:00:00.0638211aaduser=xxx

12 - Data formats supported for ingestion

Learn about the various data and compression formats supported for ingestion.

Data ingestion adds data to a table and makes it available for query. For all ingestion methods, other than ingest-from-query, the data must be in one of the supported formats. The following table lists and describes the formats that are supported for data ingestion.

To learn why ingestion might fail, see Ingestion failures. and Ingestion error codes in Azure Data Explorer.

FormatExtensionDescription
ApacheAvro.avroAn Avro format that supports logical types. Supported compression codecs: null, deflate, and snappy. The reader implementation of the apacheavro format is based on the official Apache Avro library. For details on ingesting Event Hubs Capture Avro files, see Ingesting Event Hubs Capture Avro files.
Avro.avroA legacy implementation of the Avro format based on the .NET library. Supported compression codecs: null and deflate. To use snappy, use the ApacheAvro data format.
AzMonStreamN/AAzure Monitor exports data in this format to Azure Event Hubs. This format is supported only by Azure Event Hubs.
CSV.csvA text file with comma-separated values (,). See RFC 4180: Common Format and MIME Type for Comma-Separated Values (CSV) Files.
JSON.jsonA text file with JSON objects delimited by \n or \r\n. See JSON Lines (JSONL).
MultiJSON.multijsonA text file with a JSON array of property bags (each representing a record), or any number of property bags delimited by whitespace, \n, or \r\n. Each property bag can span multiple lines.
ORC.orcAn ORC file.
Parquet.parquetA Parquet file.
PSV.psvA text file with pipe-separated values (|).
RAW.rawA text file whose entire contents are a single string value.
SCsv.scsvA text file with semicolon-separated values (;).
SOHsv.sohsvA text file with SOH-separated values. (SOH is ASCII codepoint 1; this format is used by Hive on HDInsight.)
TSV.tsvA text file with tab-separated values (\t).
TSVE.tsvA text file with tab-separated values (\t). A backslash character (\) is used for escaping.
TXT.txtA text file with lines delimited by \n. Empty lines are skipped.
W3CLOGFILE.logWeb log file format standardized by the W3C.

For more information about ingesting data by using the json or multijson formats, see Ingest JSON formats.

Supported data compression formats

Compress blobs and files with these algorithms:

CompressionExtension
gzip.gz
zip.zip

Indicate compression by appending the extension to the blob or file name.

For example:

  • MyData.csv.zip indicates a blob or file formatted as CSV, compressed with zip (archive or single file).
  • MyData.json.gz indicates a blob or file formatted as JSON, compressed with gzip.

Blob or file names that include only the compression extension (for example, MyData.zip) are also supported. In this case, specify the file format as an ingestion property because it can’t be inferred.

13 - Data ingestion properties

Learn about the various data ingestion properties.

Data ingestion adds data to a table and makes it available for query. Add properties to the ingestion command after the with keyword.

14 - Ingest from query

14.1 - .cancel operation command

Learn how to use the .cancel operation command to cancel a long-running operation.

This command cancels a long-running ingest from query operation. This command is useful when the operation is taking too long and you would like to abort it while running.

The cancel operation command isn’t guaranteed to succeed. The output of the .cancel operation command indicates whether or not cancellation was successful.

Syntax

.cancel operation OperationId [with ( reason = ReasonPhrase )]

Parameters

NameTypeRequiredDescription
OperationIdguid✔️A guid of the operation ID returned from the running command.
ReasonPhrasestringThe reason for canceling the running command.

Returns

Output parameterTypeDescription
OperationIdguidThe operation ID of the operation that was canceled.
OperationstringThe operation kind that was canceled.
StartedOndatetimeThe start time of the operation that was canceled.
CancellationStatestringReturns one of the following options:
Cancelled successfully: the operation was canceled
Cancel failed: the operation can’t be canceled at this point. The operation may still be running or may have completed.
ReasonPhrasestringReason why cancellation wasn’t successful.

Example

.cancel operation 078b2641-f10d-4694-96f8-1ee2b75dda48 with(Reason="Command canceled by me")
OperationIdOperationStartedOnCancellationStateReasonPhrase
c078b2641-f10d-4694-96f8-1ee2b75dda48TableSetOrAppend2022-07-18 09:03:55.1387320Canceled successfullyCommand canceled by me

14.2 - Kusto query ingestion (set, append, replace)

Learn how to use the .set, .append, .set-or-append, and .set-or-replace commands to ingest data from a query.

These commands execute a query or a management command and ingest the results of the query into a table. The difference between these commands is how they treat existing or nonexistent tables and data.

CommandIf table existsIf table doesn’t exist
.setThe command fails.The table is created and data is ingested.
.appendData is appended to the table.The command fails.
.set-or-appendData is appended to the table.The table is created and data is ingested.
.set-or-replaceData replaces the data in the table.The table is created and data is ingested.

To cancel an ingest from query command, see cancel operation.

Permissions

To perform different actions on a table, you need specific permissions:

  • To add rows to an existing table using the .append command, you need a minimum of Table Ingestor permissions.
  • To create a new table using the various .set commands, you need a minimum of Database User permissions.
  • To replace rows in an existing table using the .set-or-replace command, you need a minimum of Table Admin permissions.

For more information on permissions, see Kusto role-based access control.

Syntax

(.set | .append | .set-or-append | .set-or-replace) [async] tableName [with (propertyName = propertyValue [, …])] <| queryOrCommand

Parameters

NameTypeRequiredDescription
asyncstringIf specified, the command returns immediately and continues ingestion in the background. Use the returned OperationId with the .show operations command to retrieve the ingestion completion status and results.
tableNamestring✔️The name of the table to ingest data into. The tableName is always related to the database in context.
propertyName, propertyValuestringOne or more supported ingestion properties used to control the ingestion process.
queryOrCommandstring✔️The text of a query or a management command whose results are used as data to ingest. Only .show management commands are supported.

Performance tips

  • Set the distributed property to true if the amount of data produced by the query is large, exceeds one gigabyte (GB), and doesn’t require serialization. Then, multiple nodes can produce output in parallel. Don’t use this flag when query results are small, since it might needlessly generate many small data shards.
  • Data ingestion is a resource-intensive operation that might affect concurrent activities on the database, including running queries. Avoid running too many ingestion commands at the same time.
  • Limit the data for ingestion to less than one GB per ingestion operation. If necessary, use multiple ingestion commands.

Supported ingestion properties

PropertyTypeDescription
distributedboolIf true, the command ingests from all nodes executing the query in parallel. Default is false. See performance tips.
creationTimestringThe datetime value, formatted as an ISO8601 string, to use at the creation time of the ingested data extents. If unspecified, now() is used. When specified, make sure the Lookback property in the target table’s effective Extents merge policy is aligned with the specified value.
extend_schemaboolIf true, the command might extend the schema of the table. Default is false. This option applies only to .append, .set-or-append, and set-or-replace commands. This option requires at least Table Admin permissions.
recreate_schemaboolIf true, the command might recreate the schema of the table. Default is false. This option applies only to the .set-or-replace command. This option takes precedence over the extend_schema property if both are set. This option requires at least Table Admin permissions.
folderstringThe folder to assign to the table. If the table already exists, this property overwrites the table’s folder.
ingestIfNotExistsstringIf specified, ingestion fails if the table already has data tagged with an ingest-by: tag with the same value. For more information, see ingest-by: tags.
policy_ingestiontimeboolIf true, the Ingestion Time Policy is enabled on the table. The default is true.
tagsstringA JSON string that represents a list of tags to associate with the created extent.
docstringstringA description used to document the table.
persistDetailsA Boolean value that, if specified, indicates that the command should persist the detailed results for retrieval by the .show operation details command. Defaults to false.with (persistDetails=true)

Schema considerations

  • .set-or-replace preserves the schema unless one of extend_schema or recreate_schema ingestion properties is set to true.
  • .set-or-append and .append commands preserve the schema unless the extend_schema ingestion property is set to true.
  • Matching the result set schema to that of the target table is based on the column types. There’s no matching of column names. Make sure that the query result schema columns are in the same order as the table, otherwise data is ingested into the wrong columns.

Character limitation

The command fails if the query generates an entity name with the $ character. The entity names must comply with the naming rules, so the $ character must be removed for the ingest command to succeed.

For example, in the following query, the search operator generates a column $table. To store the query results, use project-rename to rename the column.

.set Texas <| search State has 'Texas' | project-rename tableName=$table

Returns

Returns information on the extents created because of the .set or .append command.

Examples

Create and update table from query source

The following query creates the :::no-loc text=“RecentErrors”::: table with the same schema as :::no-loc text=“LogsTable”:::. It updates :::no-loc text=“RecentErrors”::: with all error logs from :::no-loc text=“LogsTable”::: over the last hour.

.set RecentErrors <|
   LogsTable
   | where Level == "Error" and Timestamp > now() - time(1h)

Create and update table from query source using the distributed flag

The following example creates a new table called OldExtents in the database, asynchronously. The dataset is expected to be bigger than one GB (more than ~one million rows) so the distributed flag is used. It updates OldExtents with ExtentId entries from the MyExtents table that were created more than 30 days ago.

.set async OldExtents with(distributed=true) <|
   MyExtents 
   | where CreatedOn < now() - time(30d)
   | project ExtentId

Append data to table

The following example filters ExtentId entries in the MyExtents table that were created more than 30 days ago and appends the entries to the OldExtents table with associated tags.

.append OldExtents with(tags='["TagA","TagB"]') <| 
   MyExtents 
   | where CreatedOn < now() - time(30d) 
   | project ExtentId

Create or append a table with possibly existing tagged data

The following example either appends to or creates the OldExtents table asynchronously. It filters ExtentId entries in the MyExtents table that were created more than 30 days ago and specifies the tags to append to the new extents with ingest-by:myTag. The ingestIfNotExists parameter ensures that the ingestion only occurs if the data doesn’t already exist in the table with the specified tag.

.set-or-append async OldExtents with(tags='["ingest-by:myTag"]', ingestIfNotExists='["myTag"]') <|
   MyExtents
   | where CreatedOn < now() - time(30d)
   | project ExtentId

Create table or replace data with associated data

The following query replaces the data in the OldExtents table, or creates the table if it doesn’t already exist, with ExtentId entries in the MyExtents table that were created more than 30 days ago. Tag the new extent with ingest-by:myTag if the data doesn’t already exist in the table with the specified tag.

.set-or-replace async OldExtents with(tags='["ingest-by:myTag"]', ingestIfNotExists='["myTag"]') <| 
   MyExtents 
   | where CreatedOn < now() - time(30d) 
   | project ExtentId

Append data with associated data

The following example appends data to the OldExtents table asynchronously, using ExtentId entries from the MyExtents table that were created more than 30 days ago. It sets a specific creation time for the new extents.

.append async OldExtents with(creationTime='2017-02-13T11:09:36.7992775Z') <| 
   MyExtents 
   | where CreatedOn < now() - time(30d) 
   | project ExtentId     

Sample output

The following is a sample of the type of output you may see from your queries.

ExtentIdOriginalSizeExtentSizeCompressedSizeIndexSizeRowCount
23a05ed6-376d-4119-b1fc-6493bcb05563129158821568431410

15 - Kusto.ingest into command (pull data from storage)

This article describes The .ingest into command (pull data from storage).

The .ingest into command ingests data into a table by “pulling” the data from one or more cloud storage files. For example, the command can retrieve 1,000 CSV-formatted blobs from Azure Blob Storage, parse them, and ingest them together into a single target table. Data is appended to the table without affecting existing records, and without modifying the table’s schema.

Permissions

You must have at least Table Ingestor permissions to run this command.

Syntax

.ingest [async] into table TableName SourceDataLocator [with ( IngestionPropertyName = IngestionPropertyValue [, …] )]

Parameters

NameTypeRequiredDescription
asyncstringIf specified, the command returns immediately and continues ingestion in the background. The results of the command include an OperationId value that can then be used with the .show operation command to retrieve the ingestion completion status and results.
TableNamestring✔️The name of the table into which to ingest data. The table name is always relative to the database in context. If no schema mapping object is provided, the schema of the database in context is used.
SourceDataLocatorstring✔️A single or comma-separated list of storage connection strings. A single connection string must refer to a single file hosted by a storage account. Ingestion of multiple files can be done by specifying multiple connection strings, or by ingesting from a query of an external table.

Authentication and authorization

Each storage connection string indicates the authorization method to use for access to the storage. Depending on the authorization method, the principal might need to be granted permissions on the external storage to perform the ingestion.

The following table lists the supported authentication methods and the permissions needed for ingesting data from external storage.

Authentication methodAzure Blob Storage / Data Lake Storage Gen2Data Lake Storage Gen1
ImpersonationStorage Blob Data ReaderReader
Shared Access (SAS) tokenList + ReadThis authentication method isn’t supported in Gen1.
Microsoft Entra access token
Storage account access keyThis authentication method isn’t supported in Gen1.
Managed identityStorage Blob Data ReaderReader

Returns

The result of the command is a table with as many records as there are data shards (“extents”) generated by the command. If no data shards were generated, a single record is returned with an empty (zero-valued) extent ID.

NameTypeDescription
ExtentIdguidThe unique identifier for the data shard that was generated by the command.
ItemLoadedstringOne or more storage files that are related to this record.
DurationtimespanHow long it took to perform ingestion.
HasErrorsboolWhether or not this record represents an ingestion failure.
OperationIdguidA unique ID representing the operation. Can be used with the .show operation command.

Examples

Azure Blob Storage with shared access signature

The following example instructs your database to read two blobs from Azure Blob Storage as CSV files, and ingest their contents into table T. The ... represents an Azure Storage shared access signature (SAS) which gives read access to each blob. Obfuscated strings (the h in front of the string values) are used to ensure that the SAS is never recorded.

.ingest into table T (
    h'https://contoso.blob.core.windows.net/container/file1.csv?...',
    h'https://contoso.blob.core.windows.net/container/file2.csv?...'
)

Azure Blob Storage with managed identity

The following example shows how to read a CSV file from Azure Blob Storage and ingest its contents into table T using managed identity authentication. Authentication uses the managed identity ID (object ID) assigned to the Azure Blob Storage in Azure. For more information, see Create a managed identity for storage containers.

.ingest into table T ('https://StorageAccount.blob.core.windows.net/Container/file.csv;managed_identity=802bada6-4d21-44b2-9d15-e66b29e4d63e')

Azure Data Lake Storage Gen 2

The following example is for ingesting data from Azure Data Lake Storage Gen 2 (ADLSv2). The credentials used here (...) are the storage account credentials (shared key), and we use string obfuscation only for the secret part of the connection string.

.ingest into table T (
  'abfss://myfilesystem@contoso.dfs.core.windows.net/path/to/file1.csv;...'
)

Azure Data Lake Storage

The following example ingests a single file from Azure Data Lake Storage (ADLS). It uses the user’s credentials to access ADLS (so there’s no need to treat the storage URI as containing a secret). It also shows how to specify ingestion properties.

.ingest into table T ('adl://contoso.azuredatalakestore.net/Path/To/File/file1.ext;impersonate')
  with (format='csv')

Amazon S3 with an access key

The following example ingests a single file from Amazon S3 using an access key ID and a secret access key.

.ingest into table T ('https://bucketname.s3.us-east-1.amazonaws.com/path/to/file.csv;AwsCredentials=AKIAIOSFODNN7EXAMPLE,wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY')
  with (format='csv')

Amazon S3 with a presigned URL

The following example ingests a single file from Amazon S3 using a preSigned URL.

  with (format='csv')

16 - Streaming ingestion

16.1 - Clearing cached schema for streaming ingestion

This article describes management command for clearing cached database schema.

Nodes cache schema of the databases that receive data via streaming ingestion. This process optimizes performance and utilization of resources, but can cause propagation delays when the schema change.

Clear the cache to guarantee that subsequent streaming ingestion requests incorporate database or table schema changes. For more information, see Streaming ingestion and schema changes.

Permissions

You must have at least Database Ingestor permissions to run this command.

Syntax

.clear table TableName cache streamingingestion schema

.clear database cache streamingingestion schema

Parameters

NameTypeRequiredDescription
TableNamestring✔️The name of the table for which to clear the cache.

Returns

This command returns a table with the following columns:

ColumnTypeDescription
NodeIdstringIdentifier of the node
StatusstringSucceeded/Failed

Example

.clear database cache streamingingestion schema

.clear table T1 cache streamingingestion schema
NodeIdStatus
Node1Succeeded
Node2Failed

16.2 - Streaming ingestion and schema changes

This article discusses options of handling schema changes with streaming ingestion.

Cluster nodes cache the schema of databases that get data through streaming ingestion, boosting performance and resource use. However, when there are schema changes, it can lead to delays in updates.

Eventhouse nodes cache the schema of databases that get data through streaming ingestion, boosting performance and resource use. However, when there are schema changes, it can lead to delays in updates.

If schema changes and streaming ingestion aren’t synchronized, you can encounter failures like schema-related errors or incomplete and distorted data in the table.

This article outlines typical schema changes and provides guidance on avoiding problems with streaming ingestion during these changes.

Schema changes

The following list covers key examples of schema changes:

Coordinate schema changes with streaming ingestion

The schema cache is kept while the database is online. If there are schema changes, the system automatically refreshes the cache, but this refresh can take several minutes. If you rely on the automatic refresh, you can experience uncoordinated ingestion failures.

You can reduce the effects of propagation delay by explicitly clearing the schema cache on the nodes. If the streaming ingestion flow and schema changes are coordinated, you can completely eliminate failures and their associated data distortion.

To coordinate the streaming ingestion flow with schema changes:

  1. Suspend streaming ingestion.
  2. Wait until all outstanding streaming ingestion requests are complete.
  3. Do schema changes.
  4. Issue one or several .clear cache streaming ingestion schema commands.
    • Repeat until successful and all rows in the command output indicate success
  5. Resume streaming ingestion.