1 - .ingest inline command (push)

This article describes the .ingest inline command (push).

This command inserts data into a table by pushing the data included within the command to the table.

Permissions

You must have at least Table Ingestor permissions to run this command.

Syntax

.ingest inline into table TableName [with ( IngestionPropertyName = IngestionPropertyValue [, …] )] <| Data

.ingest inline into table TableName [with ( IngestionPropertyName = IngestionPropertyValue [, …] )] [ Data ]

Parameters

NameTypeRequiredDescription
TableNamestring✔️The name of the table into which to ingest data. The table name is always relative to the database in context. Its schema is the default schema assumed for the data if no schema mapping object is provided.
Datastring✔️The data content to ingest. Unless otherwise modified by the ingestion properties, this content is parsed as CSV.
IngestionPropertyName, IngestionPropertyValuestringAny number of ingestion properties that affect the ingestion process.

Returns

The result is a table with as many records as the number of generated data shards (“extents”). If no data shards are generated, a single record is returned with an empty (zero-valued) extent ID.

NameTypeDescription
ExtentIdguidThe unique identifier for the data shard that’s generated by the command.

Examples

Ingest with <| syntax

The following command ingests data into a table Purchases with two columns: SKU (of type string) and Quantity (of type long).

.ingest inline into table Purchases <|
    Shoes,1000
    Wide Shoes,50
    "Coats black",20
    "Coats with ""quotes""",5

Ingest with bracket syntax

The following command ingests data into a table Logs with two columns: Date (of type datetime) and EventDetails (of type dynamic).

.ingest inline into table Logs
    [2015-01-01,"{""EventType"":""Read"", ""Count"":""12""}"]
    [2015-01-01,"{""EventType"":""Write"", ""EventValue"":""84""}"]

2 - .show data operations

Learn how to use the .show data operations command to return data operations that reached a final state.

Returns a table with data operations that reached a final state. Data operations are available for 30 days from when they ran.

Any operation that results in new extents (data shards) added to the system is considered a data operation.

Permissions

You must have Database Admin or Database Monitor permissions to see any data operations invoked on your database.

Any user can see their own data operations.

For more information, see Kusto role-based access control.

Syntax

.show data operations

Returns

This command returns a table with the following columns:

Output parameterTypeDescription
TimestampdatetimeThe time when the operation reached its final state.
DatabasestringThe database name.
TablestringThe table name.
ClientActivityIdstringThe operation client activity ID.
OperationKindstringOne of BatchIngest, SetOrAppend, RowStoreSeal, MaterializedView, QueryAcceleration, and UpdatePolicy.
OriginalSizelongThe original size of the ingested data.
ExtentSizelongThe extent size.
RowCountlongThe number of rows in the extent.
ExtentCountintThe number of extents.
TotalCputimespanThe total CPU time used by the data operation.
DurationtimespanThe duration of the operation.
PrincipalstringThe identity that initiated the data operation.
PropertiesdynamicAdditional information about the data operation.

Example

The following example returns information about UpdatePolicy, BatchIngest, and SetOrAppend operations.

.show data operations

Output

TimestampDatabaseTableClientActivityIdOperationKindOriginalSizeExtentSizeRowCountExtentCountTotalCpuDurationPrincipalProperties
2024-07-18 15:21:10.5432134TestLogsUTResultsDM.IngestionExecutor;abcd1234-1234-1234-abcd-1234abcdce;1UpdatePolicy100,82975,578279100:00:00.265625000:00:28.9101535aadapp=xxx{“SourceTable”: “UTLogs”}
2024-07-18 15:21:12.9481819TestLogsUTLogsDM.IngestionExecutor;abcd1234-1234-1234-abcd-1234abcdce;1BatchIngest1,045,027,298123,067,9471,688,705200:00:22.984375000:00:29.9745733aadapp=xxx{“Format”: “Csv”,“NumberOfInputStreams”:2}
2024-07-18 15:21:16.1095441KustoAutoIncidentKustoGPTSummarycdef12345-6789-ghij-0123-klmn45678SetOrAppend1,4203,1901100:00:00.015625000:00:00.0638211aaduser=xxx

3 - Data formats supported for ingestion

Learn about the various data and compression formats supported for ingestion.

Data ingestion is the process by which data is added to a table and is made available for query. For all ingestion methods, other than ingest-from-query, the data must be in one of the supported formats. The following table lists and describes the formats that is supported for data ingestion.

For more information about why ingestion might fail, see Ingestion failures and Ingestion error codes in Azure Data Explorer.

FormatExtensionDescription
ApacheAvro.avroAn AVRO format with support for logical types. The following compression codecs are supported: null, deflate, and snappy. Reader implementation of the apacheavro format is based on the official Apache Avro library. For information about ingesting Event Hub Capture Avro files, see Ingesting Event Hub Capture Avro files.
Avro.avroA legacy implementation for AVRO format based on .NET library. The following compression codecs are supported: null, deflate (for snappy - use ApacheAvro data format).
CSV.csvA text file with comma-separated values (,). See RFC 4180: Common Format and MIME Type for Comma-Separated Values (CSV) Files.
JSON.jsonA text file with JSON objects delimited by \n or \r\n. See JSON Lines (JSONL).
MultiJSON.multijsonA text file with a JSON array of property bags (each representing a record), or any number of property bags delimited by whitespace, \n or \r\n. Each property bag can be spread on multiple lines.
ORC.orcAn ORC file.
Parquet.parquetA Parquet file.
PSV.psvA text file with pipe-separated values (|).
RAW.rawA text file whose entire contents is a single string value.
SCsv.scsvA text file with semicolon-separated values (;).
SOHsv.sohsvA text file with SOH-separated values. (SOH is ASCII codepoint 1; this format is used by Hive on HDInsight.)
TSV.tsvA text file with tab-separated values (\t).
TSVE.tsvA text file with tab-separated values (\t). A backslash character (\) is used for escaping.
TXT.txtA text file with lines delimited by \n. Empty lines are skipped.
W3CLOGFILE.logWeb log file format standardized by the W3C.

For more info on ingesting data using json or multijson formats, see ingest json formats.

Supported data compression formats

Blobs and files can be compressed through any of the following compression algorithms:

CompressionExtension
gzip.gz
zip.zip

Indicate compression by appending the extension to the name of the blob or file.

For example:

  • MyData.csv.zip indicates a blob or a file formatted as CSV, compressed with zip (archive or a single file)
  • MyData.json.gz indicates a blob or a file formatted as JSON, compressed with gzip.

Blob or file names that don’t include the format extensions but just compression (for example, MyData.zip) is also supported. In this case, the file format must be specified as an ingestion property because it cannot be inferred.

4 - Data ingestion properties

Learn about the various data ingestion properties.

Data ingestion is the process by which data is added to a table and is made available for query. You add properties to the ingestion command after the with keyword.

5 - Ingest from query

5.1 - .cancel operation command

Learn how to use the .cancel operation command to cancel a long-running operation.

This command cancels a long-running ingest from query operation. This command is useful when the operation is taking too long and you would like to abort it while running.

The cancel operation command isn’t guaranteed to succeed. The output of the .cancel operation command indicates whether or not cancellation was successful.

Syntax

.cancel operation OperationId [with ( reason = ReasonPhrase )]

Parameters

NameTypeRequiredDescription
OperationIdguid✔️A guid of the operation ID returned from the running command.
ReasonPhrasestringThe reason for canceling the running command.

Returns

Output parameterTypeDescription
OperationIdguidThe operation ID of the operation that was canceled.
OperationstringThe operation kind that was canceled.
StartedOndatetimeThe start time of the operation that was canceled.
CancellationStatestringReturns one of the following options:
Cancelled successfully: the operation was canceled
Cancel failed: the operation can’t be canceled at this point. The operation may still be running or may have completed.
ReasonPhrasestringReason why cancellation wasn’t successful.

Example

.cancel operation 078b2641-f10d-4694-96f8-1ee2b75dda48 with(Reason="Command canceled by me")
OperationIdOperationStartedOnCancellationStateReasonPhrase
c078b2641-f10d-4694-96f8-1ee2b75dda48TableSetOrAppend2022-07-18 09:03:55.1387320Canceled successfullyCommand canceled by me

5.2 - Kusto query ingestion (set, append, replace)

Learn how to use the .set, .append, .set-or-append, and .set-or-replace commands to ingest data from a query.

These commands execute a query or a management command and ingest the results of the query into a table. The difference between these commands is how they treat existing or nonexistent tables and data.

CommandIf table existsIf table doesn’t exist
.setThe command fails.The table is created and data is ingested.
.appendData is appended to the table.The command fails.
.set-or-appendData is appended to the table.The table is created and data is ingested.
.set-or-replaceData replaces the data in the table.The table is created and data is ingested.

To cancel an ingest from query command, see cancel operation.

Permissions

To perform different actions on a table, you need specific permissions:

  • To add rows to an existing table using the .append command, you need a minimum of Table Ingestor permissions.
  • To create a new table using the various .set commands, you need a minimum of Database User permissions.
  • To replace rows in an existing table using the .set-or-replace command, you need a minimum of Table Admin permissions.

For more information on permissions, see Kusto role-based access control.

Syntax

(.set | .append | .set-or-append | .set-or-replace) [async] tableName [with (propertyName = propertyValue [, …])] <| queryOrCommand

Parameters

NameTypeRequiredDescription
asyncstringIf specified, the command returns immediately and continues ingestion in the background. Use the returned OperationId with the .show operations command to retrieve the ingestion completion status and results.
tableNamestring✔️The name of the table to ingest data into. The tableName is always related to the database in context.
propertyName, propertyValuestringOne or more supported ingestion properties used to control the ingestion process.
queryOrCommandstring✔️The text of a query or a management command whose results are used as data to ingest. Only .show management commands are supported.

Performance tips

  • Set the distributed property to true if the amount of data produced by the query is large, exceeds one gigabyte (GB), and doesn’t require serialization. Then, multiple nodes can produce output in parallel. Don’t use this flag when query results are small, since it might needlessly generate many small data shards.
  • Data ingestion is a resource-intensive operation that might affect concurrent activities on the database, including running queries. Avoid running too many ingestion commands at the same time.
  • Limit the data for ingestion to less than one GB per ingestion operation. If necessary, use multiple ingestion commands.

Supported ingestion properties

PropertyTypeDescription
distributedboolIf true, the command ingests from all nodes executing the query in parallel. Default is false. See performance tips.
creationTimestringThe datetime value, formatted as an ISO8601 string, to use at the creation time of the ingested data extents. If unspecified, now() is used. When specified, make sure the Lookback property in the target table’s effective Extents merge policy is aligned with the specified value.
extend_schemaboolIf true, the command might extend the schema of the table. Default is false. This option applies only to .append, .set-or-append, and set-or-replace commands. This option requires at least Table Admin permissions.
recreate_schemaboolIf true, the command might recreate the schema of the table. Default is false. This option applies only to the .set-or-replace command. This option takes precedence over the extend_schema property if both are set. This option requires at least Table Admin permissions.
folderstringThe folder to assign to the table. If the table already exists, this property overwrites the table’s folder.
ingestIfNotExistsstringIf specified, ingestion fails if the table already has data tagged with an ingest-by: tag with the same value. For more information, see ingest-by: tags.
policy_ingestiontimeboolIf true, the Ingestion Time Policy is enabled on the table. The default is true.
tagsstringA JSON string that represents a list of tags to associate with the created extent.
docstringstringA description used to document the table.
persistDetailsA Boolean value that, if specified, indicates that the command should persist the detailed results for retrieval by the .show operation details command. Defaults to false.with (persistDetails=true)

Schema considerations

  • .set-or-replace preserves the schema unless one of extend_schema or recreate_schema ingestion properties is set to true.
  • .set-or-append and .append commands preserve the schema unless the extend_schema ingestion property is set to true.
  • Matching the result set schema to that of the target table is based on the column types. There’s no matching of column names. Make sure that the query result schema columns are in the same order as the table, otherwise data is ingested into the wrong columns.

Character limitation

The command fails if the query generates an entity name with the $ character. The entity names must comply with the naming rules, so the $ character must be removed for the ingest command to succeed.

For example, in the following query, the search operator generates a column $table. To store the query results, use project-rename to rename the column.

.set Texas <| search State has 'Texas' | project-rename tableName=$table

Returns

Returns information on the extents created because of the .set or .append command.

Examples

Create and update table from query source

The following query creates the :::no-loc text=“RecentErrors”::: table with the same schema as :::no-loc text=“LogsTable”:::. It updates :::no-loc text=“RecentErrors”::: with all error logs from :::no-loc text=“LogsTable”::: over the last hour.

.set RecentErrors <|
   LogsTable
   | where Level == "Error" and Timestamp > now() - time(1h)

Create and update table from query source using the distributed flag

The following example creates a new table called OldExtents in the database, asynchronously. The dataset is expected to be bigger than one GB (more than ~one million rows) so the distributed flag is used. It updates OldExtents with ExtentId entries from the MyExtents table that were created more than 30 days ago.

.set async OldExtents with(distributed=true) <|
   MyExtents 
   | where CreatedOn < now() - time(30d)
   | project ExtentId

Append data to table

The following example filters ExtentId entries in the MyExtents table that were created more than 30 days ago and appends the entries to the OldExtents table with associated tags.

.append OldExtents with(tags='["TagA","TagB"]') <| 
   MyExtents 
   | where CreatedOn < now() - time(30d) 
   | project ExtentId

Create or append a table with possibly existing tagged data

The following example either appends to or creates the OldExtents table asynchronously. It filters ExtentId entries in the MyExtents table that were created more than 30 days ago and specifies the tags to append to the new extents with ingest-by:myTag. The ingestIfNotExists parameter ensures that the ingestion only occurs if the data doesn’t already exist in the table with the specified tag.

.set-or-append async OldExtents with(tags='["ingest-by:myTag"]', ingestIfNotExists='["myTag"]') <|
   MyExtents
   | where CreatedOn < now() - time(30d)
   | project ExtentId

Create table or replace data with associated data

The following query replaces the data in the OldExtents table, or creates the table if it doesn’t already exist, with ExtentId entries in the MyExtents table that were created more than 30 days ago. Tag the new extent with ingest-by:myTag if the data doesn’t already exist in the table with the specified tag.

.set-or-replace async OldExtents with(tags='["ingest-by:myTag"]', ingestIfNotExists='["myTag"]') <| 
   MyExtents 
   | where CreatedOn < now() - time(30d) 
   | project ExtentId

Append data with associated data

The following example appends data to the OldExtents table asynchronously, using ExtentId entries from the MyExtents table that were created more than 30 days ago. It sets a specific creation time for the new extents.

.append async OldExtents with(creationTime='2017-02-13T11:09:36.7992775Z') <| 
   MyExtents 
   | where CreatedOn < now() - time(30d) 
   | project ExtentId     

Sample output

The following is a sample of the type of output you may see from your queries.

ExtentIdOriginalSizeExtentSizeCompressedSizeIndexSizeRowCount
23a05ed6-376d-4119-b1fc-6493bcb05563129158821568431410

6 - Kusto.ingest into command (pull data from storage)

This article describes The .ingest into command (pull data from storage).

The .ingest into command ingests data into a table by “pulling” the data from one or more cloud storage files. For example, the command can retrieve 1,000 CSV-formatted blobs from Azure Blob Storage, parse them, and ingest them together into a single target table. Data is appended to the table without affecting existing records, and without modifying the table’s schema.

Permissions

You must have at least Table Ingestor permissions to run this command.

Syntax

.ingest [async] into table TableName SourceDataLocator [with ( IngestionPropertyName = IngestionPropertyValue [, …] )]

Parameters

NameTypeRequiredDescription
asyncstringIf specified, the command returns immediately and continues ingestion in the background. The results of the command include an OperationId value that can then be used with the .show operation command to retrieve the ingestion completion status and results.
TableNamestring✔️The name of the table into which to ingest data. The table name is always relative to the database in context. If no schema mapping object is provided, the schema of the database in context is used.
SourceDataLocatorstring✔️A single or comma-separated list of storage connection strings. A single connection string must refer to a single file hosted by a storage account. Ingestion of multiple files can be done by specifying multiple connection strings, or by ingesting from a query of an external table.

Authentication and authorization

Each storage connection string indicates the authorization method to use for access to the storage. Depending on the authorization method, the principal might need to be granted permissions on the external storage to perform the ingestion.

The following table lists the supported authentication methods and the permissions needed for ingesting data from external storage.

Authentication methodAzure Blob Storage / Data Lake Storage Gen2Data Lake Storage Gen1
ImpersonationStorage Blob Data ReaderReader
Shared Access (SAS) tokenList + ReadThis authentication method isn’t supported in Gen1.
Microsoft Entra access token
Storage account access keyThis authentication method isn’t supported in Gen1.
Managed identityStorage Blob Data ReaderReader

Returns

The result of the command is a table with as many records as there are data shards (“extents”) generated by the command. If no data shards were generated, a single record is returned with an empty (zero-valued) extent ID.

NameTypeDescription
ExtentIdguidThe unique identifier for the data shard that was generated by the command.
ItemLoadedstringOne or more storage files that are related to this record.
DurationtimespanHow long it took to perform ingestion.
HasErrorsboolWhether or not this record represents an ingestion failure.
OperationIdguidA unique ID representing the operation. Can be used with the .show operation command.

Examples

Azure Blob Storage with shared access signature

The following example instructs your database to read two blobs from Azure Blob Storage as CSV files, and ingest their contents into table T. The ... represents an Azure Storage shared access signature (SAS) which gives read access to each blob. Obfuscated strings (the h in front of the string values) are used to ensure that the SAS is never recorded.

.ingest into table T (
    h'https://contoso.blob.core.windows.net/container/file1.csv?...',
    h'https://contoso.blob.core.windows.net/container/file2.csv?...'
)

Azure Blob Storage with managed identity

The following example shows how to read a CSV file from Azure Blob Storage and ingest its contents into table T using managed identity authentication. Authentication uses the managed identity ID (object ID) assigned to the Azure Blob Storage in Azure. For more information, see Create a managed identity for storage containers.

.ingest into table T ('https://StorageAccount.blob.core.windows.net/Container/file.csv;managed_identity=802bada6-4d21-44b2-9d15-e66b29e4d63e')

Azure Data Lake Storage Gen 2

The following example is for ingesting data from Azure Data Lake Storage Gen 2 (ADLSv2). The credentials used here (...) are the storage account credentials (shared key), and we use string obfuscation only for the secret part of the connection string.

.ingest into table T (
  'abfss://myfilesystem@contoso.dfs.core.windows.net/path/to/file1.csv;...'
)

Azure Data Lake Storage

The following example ingests a single file from Azure Data Lake Storage (ADLS). It uses the user’s credentials to access ADLS (so there’s no need to treat the storage URI as containing a secret). It also shows how to specify ingestion properties.

.ingest into table T ('adl://contoso.azuredatalakestore.net/Path/To/File/file1.ext;impersonate')
  with (format='csv')

Amazon S3 with an access key

The following example ingests a single file from Amazon S3 using an access key ID and a secret access key.

.ingest into table T ('https://bucketname.s3.us-east-1.amazonaws.com/path/to/file.csv;AwsCredentials=AKIAIOSFODNN7EXAMPLE,wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY')
  with (format='csv')

Amazon S3 with a presigned URL

The following example ingests a single file from Amazon S3 using a preSigned URL.

  with (format='csv')

7 - Streaming ingestion

7.1 - Clearing cached schema for streaming ingestion

This article describes management command for clearing cached database schema.

Nodes cache schema of the databases that receive data via streaming ingestion. This process optimizes performance and utilization of resources, but can cause propagation delays when the schema change.

Clear the cache to guarantee that subsequent streaming ingestion requests incorporate database or table schema changes. For more information, see Streaming ingestion and schema changes.

Permissions

You must have at least Database Ingestor permissions to run this command.

Syntax

.clear table TableName cache streamingingestion schema

.clear database cache streamingingestion schema

Parameters

NameTypeRequiredDescription
TableNamestring✔️The name of the table for which to clear the cache.

Returns

This command returns a table with the following columns:

ColumnTypeDescription
NodeIdstringIdentifier of the node
StatusstringSucceeded/Failed

Example

.clear database cache streamingingestion schema

.clear table T1 cache streamingingestion schema
NodeIdStatus
Node1Succeeded
Node2Failed

7.2 - Streaming ingestion and schema changes

This article discusses options of handling schema changes with streaming ingestion.

Cluster nodes cache the schema of databases that get data through streaming ingestion, boosting performance and resource use. However, when there are schema changes, it can lead to delays in updates.

Eventhouse nodes cache the schema of databases that get data through streaming ingestion, boosting performance and resource use. However, when there are schema changes, it can lead to delays in updates.

If schema changes and streaming ingestion aren’t synchronized, you can encounter failures like schema-related errors or incomplete and distorted data in the table.

This article outlines typical schema changes and provides guidance on avoiding problems with streaming ingestion during these changes.

Schema changes

The following list covers key examples of schema changes:

Coordinate schema changes with streaming ingestion

The schema cache is kept while the database is online. If there are schema changes, the system automatically refreshes the cache, but this refresh can take several minutes. If you rely on the automatic refresh, you can experience uncoordinated ingestion failures.

You can reduce the effects of propagation delay by explicitly clearing the schema cache on the nodes. If the streaming ingestion flow and schema changes are coordinated, you can completely eliminate failures and their associated data distortion.

To coordinate the streaming ingestion flow with schema changes:

  1. Suspend streaming ingestion.
  2. Wait until all outstanding streaming ingestion requests are complete.
  3. Do schema changes.
  4. Issue one or several .clear cache streaming ingestion schema commands.
    • Repeat until successful and all rows in the command output indicate success
  5. Resume streaming ingestion.