This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Data ingestion

1: .ingest inline command (push)
2: .show data operations
3: Data formats supported for ingestion
4: Data ingestion properties
5: Ingest from query

5.1: .cancel operation command
5.2: Kusto query ingestion (set, append, replace)

6: Kusto.ingest into command (pull data from storage)
7: Streaming ingestion

7.1: Clearing cached schema for streaming ingestion
7.2: Streaming ingestion and schema changes

1 - .ingest inline command (push)

This article describes the .ingest inline command (push).

This command inserts data into a table by pushing the data included within the command to the table.

Permissions

You must have at least Table Ingestor permissions to run this command.

Syntax

.ingest inline into table TableName [with ( IngestionPropertyName = IngestionPropertyValue [, …] )] <| Data

.ingest inline into table TableName [with ( IngestionPropertyName = IngestionPropertyValue [, …] )] [ Data ]

Parameters

Name	Type	Required	Description
TableName	`string`	✔️	The name of the table into which to ingest data. The table name is always relative to the database in context. Its schema is the default schema assumed for the data if no schema mapping object is provided.
Data	`string`	✔️	The data content to ingest. Unless otherwise modified by the ingestion properties, this content is parsed as CSV.
IngestionPropertyName, IngestionPropertyValue	`string`		Any number of ingestion properties that affect the ingestion process.

Returns

The result is a table with as many records as the number of generated data shards (“extents”). If no data shards are generated, a single record is returned with an empty (zero-valued) extent ID.

Name	Type	Description
ExtentId	`guid`	The unique identifier for the data shard that’s generated by the command.

Examples

Ingest with `<|` syntax

The following command ingests data into a table Purchases with two columns: SKU (of type string) and Quantity (of type long).

.ingest inline into table Purchases <|
    Shoes,1000
    Wide Shoes,50
    "Coats black",20
    "Coats with ""quotes""",5

Ingest with bracket syntax

The following command ingests data into a table Logs with two columns: Date (of type datetime) and EventDetails (of type dynamic).

.ingest inline into table Logs
    [2015-01-01,"{""EventType"":""Read"", ""Count"":""12""}"]
    [2015-01-01,"{""EventType"":""Write"", ""EventValue"":""84""}"]

2 - .show data operations

Learn how to use the .show data operations command to return data operations that reached a final state.

Returns a table with data operations that reached a final state. Data operations are available for 30 days from when they ran.

Any operation that results in new extents (data shards) added to the system is considered a data operation.

Permissions

You must have Database Admin or Database Monitor permissions to see any data operations invoked on your database.

Any user can see their own data operations.

For more information, see Kusto role-based access control.

Syntax

.show data operations

Returns

This command returns a table with the following columns:

Output parameter	Type	Description
Timestamp	`datetime`	The time when the operation reached its final state.
Database	`string`	The database name.
Table	`string`	The table name.
ClientActivityId	`string`	The operation client activity ID.
OperationKind	`string`	One of `BatchIngest`, `SetOrAppend`, `RowStoreSeal`, `MaterializedView`, `QueryAcceleration`, and `UpdatePolicy`.
OriginalSize	`long`	The original size of the ingested data.
ExtentSize	`long`	The extent size.
RowCount	`long`	The number of rows in the extent.
ExtentCount	`int`	The number of extents.
TotalCpu	`timespan`	The total CPU time used by the data operation.
Duration	`timespan`	The duration of the operation.
Principal	`string`	The identity that initiated the data operation.
Properties	`dynamic`	Additional information about the data operation.

Example

The following example returns information about UpdatePolicy, BatchIngest, and SetOrAppend operations.

.show data operations

Output

Timestamp	Database	Table	ClientActivityId	OperationKind	OriginalSize	ExtentSize	RowCount	ExtentCount	TotalCpu	Duration	Principal	Properties
2024-07-18 15:21:10.5432134	TestLogs	UTResults	DM.IngestionExecutor;abcd1234-1234-1234-abcd-1234abcdce;1	UpdatePolicy	100,829	75,578	279	1	00:00:00.2656250	00:00:28.9101535	aadapp=xxx	{“SourceTable”: “UTLogs”}
2024-07-18 15:21:12.9481819	TestLogs	UTLogs	DM.IngestionExecutor;abcd1234-1234-1234-abcd-1234abcdce;1	BatchIngest	1,045,027,298	123,067,947	1,688,705	2	00:00:22.9843750	00:00:29.9745733	aadapp=xxx	{“Format”: “Csv”,“NumberOfInputStreams”:2}
2024-07-18 15:21:16.1095441	KustoAuto	IncidentKustoGPTSummary	cdef12345-6789-ghij-0123-klmn45678	SetOrAppend	1,420	3,190	1	1	00:00:00.0156250	00:00:00.0638211	aaduser=xxx

3 - Data formats supported for ingestion

Learn about the various data and compression formats supported for ingestion.

Data ingestion is the process by which data is added to a table and is made available for query. For all ingestion methods, other than ingest-from-query, the data must be in one of the supported formats. The following table lists and describes the formats that is supported for data ingestion.

For more information about why ingestion might fail, see Ingestion failures and Ingestion error codes in Azure Data Explorer.

Format	Extension	Description
ApacheAvro	`.avro`	An AVRO format with support for logical types. The following compression codecs are supported: `null`, `deflate`, and `snappy`. Reader implementation of the `apacheavro` format is based on the official Apache Avro library. For information about ingesting Event Hub Capture Avro files, see Ingesting Event Hub Capture Avro files.
Avro	`.avro`	A legacy implementation for AVRO format based on .NET library. The following compression codecs are supported: `null`, `deflate` (for `snappy` - use `ApacheAvro` data format).
CSV	`.csv`	A text file with comma-separated values (`,`). See RFC 4180: Common Format and MIME Type for Comma-Separated Values (CSV) Files.
JSON	`.json`	A text file with JSON objects delimited by `\n` or `\r\n`. See JSON Lines (JSONL).
MultiJSON	`.multijson`	A text file with a JSON array of property bags (each representing a record), or any number of property bags delimited by whitespace, `\n` or `\r\n`. Each property bag can be spread on multiple lines.
ORC	`.orc`	An ORC file.
Parquet	`.parquet`	A Parquet file.
PSV	`.psv`	A text file with pipe-separated values (`\|`).
RAW	`.raw`	A text file whose entire contents is a single string value.
SCsv	`.scsv`	A text file with semicolon-separated values (`;`).
SOHsv	`.sohsv`	A text file with SOH-separated values. (SOH is ASCII codepoint 1; this format is used by Hive on HDInsight.)
TSV	`.tsv`	A text file with tab-separated values (`\t`).
TSVE	`.tsv`	A text file with tab-separated values (`\t`). A backslash character (`\`) is used for escaping.
TXT	`.txt`	A text file with lines delimited by `\n`. Empty lines are skipped.
W3CLOGFILE	`.log`	Web log file format standardized by the W3C.

For more info on ingesting data using json or multijson formats, see ingest json formats.

Supported data compression formats

Blobs and files can be compressed through any of the following compression algorithms:

Compression	Extension
gzip	.gz
zip	.zip

Indicate compression by appending the extension to the name of the blob or file.

For example:

MyData.csv.zip indicates a blob or a file formatted as CSV, compressed with zip (archive or a single file)
MyData.json.gz indicates a blob or a file formatted as JSON, compressed with gzip.

Blob or file names that don’t include the format extensions but just compression (for example, MyData.zip) is also supported. In this case, the file format must be specified as an ingestion property because it cannot be inferred.

Learn more about supported data formats
Learn more about Data ingestion properties
Learn more about data ingestion

4 - Data ingestion properties

Learn about the various data ingestion properties.

Data ingestion is the process by which data is added to a table and is made available for query. You add properties to the ingestion command after the with keyword.

Learn more about supported data formats
Learn more about data ingestion

5 - Ingest from query

5.1 - .cancel operation command

Learn how to use the .cancel operation command to cancel a long-running operation.

This command cancels a long-running ingest from query operation. This command is useful when the operation is taking too long and you would like to abort it while running.

The cancel operation command isn’t guaranteed to succeed. The output of the .cancel operation command indicates whether or not cancellation was successful.

Syntax

.cancel operation OperationId [with ( reason = ReasonPhrase )]

Parameters

Name	Type	Required	Description
OperationId	`guid`	✔️	A guid of the operation ID returned from the running command.
ReasonPhrase	`string`		The reason for canceling the running command.

Returns

Output parameter	Type	Description
OperationId	`guid`	The operation ID of the operation that was canceled.
Operation	`string`	The operation kind that was canceled.
StartedOn	`datetime`	The start time of the operation that was canceled.
CancellationState	`string`	Returns one of the following options: `Cancelled successfully`: the operation was canceled `Cancel failed`: the operation can’t be canceled at this point. The operation may still be running or may have completed.
ReasonPhrase	`string`	Reason why cancellation wasn’t successful.

Example

.cancel operation 078b2641-f10d-4694-96f8-1ee2b75dda48 with(Reason="Command canceled by me")

OperationId	Operation	StartedOn	CancellationState	ReasonPhrase
c078b2641-f10d-4694-96f8-1ee2b75dda48	TableSetOrAppend	2022-07-18 09:03:55.1387320	Canceled successfully	Command canceled by me

5.2 - Kusto query ingestion (set, append, replace)

Learn how to use the .set, .append, .set-or-append, and .set-or-replace commands to ingest data from a query.

These commands execute a query or a management command and ingest the results of the query into a table. The difference between these commands is how they treat existing or nonexistent tables and data.

Command	If table exists	If table doesn’t exist
`.set`	The command fails.	The table is created and data is ingested.
`.append`	Data is appended to the table.	The command fails.
`.set-or-append`	Data is appended to the table.	The table is created and data is ingested.
`.set-or-replace`	Data replaces the data in the table.	The table is created and data is ingested.

To cancel an ingest from query command, see cancel operation.

Permissions

To perform different actions on a table, you need specific permissions:

To add rows to an existing table using the .append command, you need a minimum of Table Ingestor permissions.
To create a new table using the various .set commands, you need a minimum of Database User permissions.
To replace rows in an existing table using the .set-or-replace command, you need a minimum of Table Admin permissions.

For more information on permissions, see Kusto role-based access control.

Syntax

(.set | .append | .set-or-append | .set-or-replace) [async] tableName [with (propertyName = propertyValue [, …])] <| queryOrCommand

Parameters

Name	Type	Required	Description
async	`string`		If specified, the command returns immediately and continues ingestion in the background. Use the returned `OperationId` with the `.show operations` command to retrieve the ingestion completion status and results.
tableName	`string`	✔️	The name of the table to ingest data into. The tableName is always related to the database in context.
propertyName, propertyValue	`string`		One or more supported ingestion properties used to control the ingestion process.
queryOrCommand	`string`	✔️	The text of a query or a management command whose results are used as data to ingest. Only `.show` management commands are supported.

Performance tips

Set the distributed property to true if the amount of data produced by the query is large, exceeds one gigabyte (GB), and doesn’t require serialization. Then, multiple nodes can produce output in parallel. Don’t use this flag when query results are small, since it might needlessly generate many small data shards.
Data ingestion is a resource-intensive operation that might affect concurrent activities on the database, including running queries. Avoid running too many ingestion commands at the same time.
Limit the data for ingestion to less than one GB per ingestion operation. If necessary, use multiple ingestion commands.

Supported ingestion properties

Property	Type	Description
`distributed`	`bool`	If `true`, the command ingests from all nodes executing the query in parallel. Default is `false`. See performance tips.
`creationTime`	`string`	The `datetime` value, formatted as an ISO8601 `string`, to use at the creation time of the ingested data extents. If unspecified, `now()` is used. When specified, make sure the `Lookback` property in the target table’s effective Extents merge policy is aligned with the specified value.
`extend_schema`	`bool`	If `true`, the command might extend the schema of the table. Default is `false`. This option applies only to `.append`, `.set-or-append`, and `set-or-replace` commands. This option requires at least Table Admin permissions.
`recreate_schema`	`bool`	If `true`, the command might recreate the schema of the table. Default is `false`. This option applies only to the `.set-or-replace` command. This option takes precedence over the `extend_schema` property if both are set. This option requires at least Table Admin permissions.
`folder`	`string`	The folder to assign to the table. If the table already exists, this property overwrites the table’s folder.
`ingestIfNotExists`	`string`	If specified, ingestion fails if the table already has data tagged with an `ingest-by:` tag with the same value. For more information, see ingest-by: tags.
`policy_ingestiontime`	`bool`	If `true`, the Ingestion Time Policy is enabled on the table. The default is `true`.
`tags`	`string`	A JSON `string` that represents a list of tags to associate with the created extent.
`docstring`	`string`	A description used to document the table.
`persistDetails`	A Boolean value that, if specified, indicates that the command should persist the detailed results for retrieval by the .show operation details command. Defaults to `false`.	`with (persistDetails=true)`

Schema considerations

.set-or-replace preserves the schema unless one of extend_schema or recreate_schema ingestion properties is set to true.
.set-or-append and .append commands preserve the schema unless the extend_schema ingestion property is set to true.
Matching the result set schema to that of the target table is based on the column types. There’s no matching of column names. Make sure that the query result schema columns are in the same order as the table, otherwise data is ingested into the wrong columns.

Character limitation

The command fails if the query generates an entity name with the $ character. The entity names must comply with the naming rules, so the $ character must be removed for the ingest command to succeed.

For example, in the following query, the search operator generates a column $table. To store the query results, use project-rename to rename the column.

.set Texas <| search State has 'Texas' | project-rename tableName=$table

Returns

Returns information on the extents created because of the .set or .append command.

Examples

Create and update table from query source

The following query creates the :::no-loc text=“RecentErrors”::: table with the same schema as :::no-loc text=“LogsTable”:::. It updates :::no-loc text=“RecentErrors”::: with all error logs from :::no-loc text=“LogsTable”::: over the last hour.

.set RecentErrors <|
   LogsTable
   | where Level == "Error" and Timestamp > now() - time(1h)

Create and update table from query source using the distributed flag

The following example creates a new table called OldExtents in the database, asynchronously. The dataset is expected to be bigger than one GB (more than ~one million rows) so the distributed flag is used. It updates OldExtents with ExtentId entries from the MyExtents table that were created more than 30 days ago.

.set async OldExtents with(distributed=true) <|
   MyExtents 
   | where CreatedOn < now() - time(30d)
   | project ExtentId

Append data to table

The following example filters ExtentId entries in the MyExtents table that were created more than 30 days ago and appends the entries to the OldExtents table with associated tags.

.append OldExtents with(tags='["TagA","TagB"]') <| 
   MyExtents 
   | where CreatedOn < now() - time(30d) 
   | project ExtentId

Create or append a table with possibly existing tagged data

The following example either appends to or creates the OldExtents table asynchronously. It filters ExtentId entries in the MyExtents table that were created more than 30 days ago and specifies the tags to append to the new extents with ingest-by:myTag. The ingestIfNotExists parameter ensures that the ingestion only occurs if the data doesn’t already exist in the table with the specified tag.

.set-or-append async OldExtents with(tags='["ingest-by:myTag"]', ingestIfNotExists='["myTag"]') <|
   MyExtents
   | where CreatedOn < now() - time(30d)
   | project ExtentId

Create table or replace data with associated data

The following query replaces the data in the OldExtents table, or creates the table if it doesn’t already exist, with ExtentId entries in the MyExtents table that were created more than 30 days ago. Tag the new extent with ingest-by:myTag if the data doesn’t already exist in the table with the specified tag.

.set-or-replace async OldExtents with(tags='["ingest-by:myTag"]', ingestIfNotExists='["myTag"]') <| 
   MyExtents 
   | where CreatedOn < now() - time(30d) 
   | project ExtentId

Append data with associated data

The following example appends data to the OldExtents table asynchronously, using ExtentId entries from the MyExtents table that were created more than 30 days ago. It sets a specific creation time for the new extents.

.append async OldExtents with(creationTime='2017-02-13T11:09:36.7992775Z') <| 
   MyExtents 
   | where CreatedOn < now() - time(30d) 
   | project ExtentId

Sample output

The following is a sample of the type of output you may see from your queries.

ExtentId	OriginalSize	ExtentSize	CompressedSize	IndexSize	RowCount
23a05ed6-376d-4119-b1fc-6493bcb05563	1291	5882	1568	4314	10

6 - Kusto.ingest into command (pull data from storage)

This article describes The .ingest into command (pull data from storage).

The .ingest into command ingests data into a table by “pulling” the data from one or more cloud storage files. For example, the command can retrieve 1,000 CSV-formatted blobs from Azure Blob Storage, parse them, and ingest them together into a single target table. Data is appended to the table without affecting existing records, and without modifying the table’s schema.

Permissions

You must have at least Table Ingestor permissions to run this command.

Syntax

.ingest [async] into table TableName SourceDataLocator [with ( IngestionPropertyName = IngestionPropertyValue [, …] )]

Parameters

Name	Type	Required	Description
`async`	`string`		If specified, the command returns immediately and continues ingestion in the background. The results of the command include an `OperationId` value that can then be used with the `.show operation` command to retrieve the ingestion completion status and results.
TableName	`string`	✔️	The name of the table into which to ingest data. The table name is always relative to the database in context. If no schema mapping object is provided, the schema of the database in context is used.
SourceDataLocator	`string`	✔️	A single or comma-separated list of storage connection strings. A single connection string must refer to a single file hosted by a storage account. Ingestion of multiple files can be done by specifying multiple connection strings, or by ingesting from a query of an external table.

Authentication and authorization

Each storage connection string indicates the authorization method to use for access to the storage. Depending on the authorization method, the principal might need to be granted permissions on the external storage to perform the ingestion.

The following table lists the supported authentication methods and the permissions needed for ingesting data from external storage.

Authentication method	Azure Blob Storage / Data Lake Storage Gen2	Data Lake Storage Gen1
Impersonation	Storage Blob Data Reader	Reader
Shared Access (SAS) token	List + Read	This authentication method isn’t supported in Gen1.
Microsoft Entra access token
Storage account access key		This authentication method isn’t supported in Gen1.
Managed identity	Storage Blob Data Reader	Reader

Returns

The result of the command is a table with as many records as there are data shards (“extents”) generated by the command. If no data shards were generated, a single record is returned with an empty (zero-valued) extent ID.

Name	Type	Description
ExtentId	`guid`	The unique identifier for the data shard that was generated by the command.
ItemLoaded	`string`	One or more storage files that are related to this record.
Duration	`timespan`	How long it took to perform ingestion.
HasErrors	`bool`	Whether or not this record represents an ingestion failure.
OperationId	`guid`	A unique ID representing the operation. Can be used with the `.show operation` command.

Examples

Azure Blob Storage with shared access signature

The following example instructs your database to read two blobs from Azure Blob Storage as CSV files, and ingest their contents into table T. The ... represents an Azure Storage shared access signature (SAS) which gives read access to each blob. Obfuscated strings (the h in front of the string values) are used to ensure that the SAS is never recorded.

.ingest into table T (
    h'https://contoso.blob.core.windows.net/container/file1.csv?...',
    h'https://contoso.blob.core.windows.net/container/file2.csv?...'
)

Azure Blob Storage with managed identity

The following example shows how to read a CSV file from Azure Blob Storage and ingest its contents into table T using managed identity authentication. Authentication uses the managed identity ID (object ID) assigned to the Azure Blob Storage in Azure. For more information, see Create a managed identity for storage containers.

.ingest into table T ('https://StorageAccount.blob.core.windows.net/Container/file.csv;managed_identity=802bada6-4d21-44b2-9d15-e66b29e4d63e')

Azure Data Lake Storage Gen 2

The following example is for ingesting data from Azure Data Lake Storage Gen 2 (ADLSv2). The credentials used here (...) are the storage account credentials (shared key), and we use string obfuscation only for the secret part of the connection string.

.ingest into table T (
  'abfss://myfilesystem@contoso.dfs.core.windows.net/path/to/file1.csv;...'
)

Azure Data Lake Storage

The following example ingests a single file from Azure Data Lake Storage (ADLS). It uses the user’s credentials to access ADLS (so there’s no need to treat the storage URI as containing a secret). It also shows how to specify ingestion properties.

.ingest into table T ('adl://contoso.azuredatalakestore.net/Path/To/File/file1.ext;impersonate')
  with (format='csv')

Amazon S3 with an access key

The following example ingests a single file from Amazon S3 using an access key ID and a secret access key.

.ingest into table T ('https://bucketname.s3.us-east-1.amazonaws.com/path/to/file.csv;AwsCredentials=AKIAIOSFODNN7EXAMPLE,wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY')
  with (format='csv')

Amazon S3 with a presigned URL

The following example ingests a single file from Amazon S3 using a preSigned URL.

  with (format='csv')

7 - Streaming ingestion

7.1 - Clearing cached schema for streaming ingestion

This article describes management command for clearing cached database schema.

Nodes cache schema of the databases that receive data via streaming ingestion. This process optimizes performance and utilization of resources, but can cause propagation delays when the schema change.

Clear the cache to guarantee that subsequent streaming ingestion requests incorporate database or table schema changes. For more information, see Streaming ingestion and schema changes.

Permissions

You must have at least Database Ingestor permissions to run this command.

Syntax

.clear table TableName cache streamingingestion schema

.clear database cache streamingingestion schema

Parameters

Name	Type	Required	Description
TableName	`string`	✔️	The name of the table for which to clear the cache.

Returns

This command returns a table with the following columns:

Column	Type	Description
NodeId	`string`	Identifier of the node
Status	`string`	Succeeded/Failed

Example

.clear database cache streamingingestion schema

.clear table T1 cache streamingingestion schema

NodeId	Status
Node1	Succeeded
Node2	Failed

7.2 - Streaming ingestion and schema changes

This article discusses options of handling schema changes with streaming ingestion.

Cluster nodes cache the schema of databases that get data through streaming ingestion, boosting performance and resource use. However, when there are schema changes, it can lead to delays in updates.

Eventhouse nodes cache the schema of databases that get data through streaming ingestion, boosting performance and resource use. However, when there are schema changes, it can lead to delays in updates.

If schema changes and streaming ingestion aren’t synchronized, you can encounter failures like schema-related errors or incomplete and distorted data in the table.

This article outlines typical schema changes and provides guidance on avoiding problems with streaming ingestion during these changes.

Schema changes

The following list covers key examples of schema changes:

Coordinate schema changes with streaming ingestion

The schema cache is kept while the database is online. If there are schema changes, the system automatically refreshes the cache, but this refresh can take several minutes. If you rely on the automatic refresh, you can experience uncoordinated ingestion failures.

You can reduce the effects of propagation delay by explicitly clearing the schema cache on the nodes. If the streaming ingestion flow and schema changes are coordinated, you can completely eliminate failures and their associated data distortion.

To coordinate the streaming ingestion flow with schema changes:

Suspend streaming ingestion.
Wait until all outstanding streaming ingestion requests are complete.
Do schema changes.
Issue one or several .clear cache streaming ingestion schema commands.
- Repeat until successful and all rows in the command output indicate success
Resume streaming ingestion.

Data ingestion

1 - .ingest inline command (push)

Permissions

Syntax

Parameters

Returns

Examples

Ingest with <| syntax

Ingest with bracket syntax

Related content

2 - .show data operations

Permissions

Syntax

Returns

Example

3 - Data formats supported for ingestion

Supported data compression formats

Related content

4 - Data ingestion properties

Related content

5 - Ingest from query

5.1 - .cancel operation command

Syntax

Parameters

Returns

Example

5.2 - Kusto query ingestion (set, append, replace)

Permissions

Syntax

Parameters

Performance tips

Supported ingestion properties

Schema considerations

Character limitation

Returns

Examples

Create and update table from query source

Create and update table from query source using the distributed flag

Append data to table

Create or append a table with possibly existing tagged data

Create table or replace data with associated data

Append data with associated data

Related content

6 - Kusto.ingest into command (pull data from storage)

Permissions

Syntax

Parameters

Authentication and authorization

Returns

Examples

Azure Blob Storage with shared access signature

Azure Blob Storage with managed identity

Azure Data Lake Storage Gen 2

Azure Data Lake Storage

Amazon S3 with an access key

Amazon S3 with a presigned URL

Related content

7 - Streaming ingestion

7.1 - Clearing cached schema for streaming ingestion

Permissions

Syntax

Parameters

Returns

Example

7.2 - Streaming ingestion and schema changes

Schema changes

Coordinate schema changes with streaming ingestion

Ingest with `<|` syntax