This is the multi-page printable view of this section. Click here to print.
Data ingestion
1 - .ingest inline command (push)
.ingest
inline
command (push).This command inserts data into a table by pushing the data included within the command to the table.
Permissions
You must have at least Table Ingestor permissions to run this command.
Syntax
.ingest
inline
into
table
TableName
[with
(
IngestionPropertyName =
IngestionPropertyValue [,
…] )
]
<|
Data
.ingest
inline
into
table
TableName
[with
(
IngestionPropertyName =
IngestionPropertyValue [,
…] )
]
[
Data ]
Parameters
Name | Type | Required | Description |
---|---|---|---|
TableName | string | ✔️ | The name of the table into which to ingest data. The table name is always relative to the database in context. Its schema is the default schema assumed for the data if no schema mapping object is provided. |
Data | string | ✔️ | The data content to ingest. Unless otherwise modified by the ingestion properties, this content is parsed as CSV. |
IngestionPropertyName, IngestionPropertyValue | string | Any number of ingestion properties that affect the ingestion process. |
Returns
The result is a table with as many records as the number of generated data shards (“extents”). If no data shards are generated, a single record is returned with an empty (zero-valued) extent ID.
Name | Type | Description |
---|---|---|
ExtentId | guid | The unique identifier for the data shard that’s generated by the command. |
Examples
Ingest with <|
syntax
The following command ingests data into a table Purchases
with two columns: SKU
(of type string
) and Quantity
(of type long
).
.ingest inline into table Purchases <|
Shoes,1000
Wide Shoes,50
"Coats black",20
"Coats with ""quotes""",5
Ingest with bracket syntax
The following command ingests data into a table Logs
with two columns: Date
(of type datetime
) and EventDetails
(of type dynamic
).
.ingest inline into table Logs
[2015-01-01,"{""EventType"":""Read"", ""Count"":""12""}"]
[2015-01-01,"{""EventType"":""Write"", ""EventValue"":""84""}"]
Related content
2 - .show data operations
.show data operations
command to return data operations that reached a final state.Returns a table with data operations that reached a final state. Data operations are available for 30 days from when they ran.
Any operation that results in new extents (data shards) added to the system is considered a data operation.
Permissions
You must have Database Admin or Database Monitor permissions to see any data operations invoked on your database.
Any user can see their own data operations.
For more information, see Kusto role-based access control.
Syntax
.show
data
operations
Returns
This command returns a table with the following columns:
Output parameter | Type | Description |
---|---|---|
Timestamp | datetime | The time when the operation reached its final state. |
Database | string | The database name. |
Table | string | The table name. |
ClientActivityId | string | The operation client activity ID. |
OperationKind | string | One of BatchIngest , SetOrAppend , RowStoreSeal , MaterializedView , QueryAcceleration , and UpdatePolicy . |
OriginalSize | long | The original size of the ingested data. |
ExtentSize | long | The extent size. |
RowCount | long | The number of rows in the extent. |
ExtentCount | int | The number of extents. |
TotalCpu | timespan | The total CPU time used by the data operation. |
Duration | timespan | The duration of the operation. |
Principal | string | The identity that initiated the data operation. |
Properties | dynamic | Additional information about the data operation. |
Example
The following example returns information about UpdatePolicy
, BatchIngest
, and SetOrAppend
operations.
.show data operations
Output
Timestamp | Database | Table | ClientActivityId | OperationKind | OriginalSize | ExtentSize | RowCount | ExtentCount | TotalCpu | Duration | Principal | Properties |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2024-07-18 15:21:10.5432134 | TestLogs | UTResults | DM.IngestionExecutor;abcd1234-1234-1234-abcd-1234abcdce;1 | UpdatePolicy | 100,829 | 75,578 | 279 | 1 | 00:00:00.2656250 | 00:00:28.9101535 | aadapp=xxx | {“SourceTable”: “UTLogs”} |
2024-07-18 15:21:12.9481819 | TestLogs | UTLogs | DM.IngestionExecutor;abcd1234-1234-1234-abcd-1234abcdce;1 | BatchIngest | 1,045,027,298 | 123,067,947 | 1,688,705 | 2 | 00:00:22.9843750 | 00:00:29.9745733 | aadapp=xxx | {“Format”: “Csv”,“NumberOfInputStreams”:2} |
2024-07-18 15:21:16.1095441 | KustoAuto | IncidentKustoGPTSummary | cdef12345-6789-ghij-0123-klmn45678 | SetOrAppend | 1,420 | 3,190 | 1 | 1 | 00:00:00.0156250 | 00:00:00.0638211 | aaduser=xxx |
3 - Data formats supported for ingestion
Data ingestion is the process by which data is added to a table and is made available for query. For all ingestion methods, other than ingest-from-query, the data must be in one of the supported formats. The following table lists and describes the formats that is supported for data ingestion.
For more information about why ingestion might fail, see Ingestion failures and Ingestion error codes in Azure Data Explorer.
Format | Extension | Description |
---|---|---|
ApacheAvro | .avro | An AVRO format with support for logical types. The following compression codecs are supported: null , deflate , and snappy . Reader implementation of the apacheavro format is based on the official Apache Avro library. For information about ingesting Event Hub Capture Avro files, see Ingesting Event Hub Capture Avro files. |
Avro | .avro | A legacy implementation for AVRO format based on .NET library. The following compression codecs are supported: null , deflate (for snappy - use ApacheAvro data format). |
CSV | .csv | A text file with comma-separated values (, ). See RFC 4180: Common Format and MIME Type for Comma-Separated Values (CSV) Files. |
JSON | .json | A text file with JSON objects delimited by \n or \r\n . See JSON Lines (JSONL). |
MultiJSON | .multijson | A text file with a JSON array of property bags (each representing a record), or any number of property bags delimited by whitespace, \n or \r\n . Each property bag can be spread on multiple lines. |
ORC | .orc | An ORC file. |
Parquet | .parquet | A Parquet file. |
PSV | .psv | A text file with pipe-separated values (| ). |
RAW | .raw | A text file whose entire contents is a single string value. |
SCsv | .scsv | A text file with semicolon-separated values (; ). |
SOHsv | .sohsv | A text file with SOH-separated values. (SOH is ASCII codepoint 1; this format is used by Hive on HDInsight.) |
TSV | .tsv | A text file with tab-separated values (\t ). |
TSVE | .tsv | A text file with tab-separated values (\t ). A backslash character (\ ) is used for escaping. |
TXT | .txt | A text file with lines delimited by \n . Empty lines are skipped. |
W3CLOGFILE | .log | Web log file format standardized by the W3C. |
For more info on ingesting data using json
or multijson
formats, see ingest json formats.
Supported data compression formats
Blobs and files can be compressed through any of the following compression algorithms:
Compression | Extension |
---|---|
gzip | .gz |
zip | .zip |
Indicate compression by appending the extension to the name of the blob or file.
For example:
MyData.csv.zip
indicates a blob or a file formatted as CSV, compressed with zip (archive or a single file)MyData.json.gz
indicates a blob or a file formatted as JSON, compressed with gzip.
Blob or file names that don’t include the format extensions but just compression (for example, MyData.zip
) is also supported. In this case, the file format
must be specified as an ingestion property because it cannot be inferred.
Related content
- Learn more about supported data formats
- Learn more about Data ingestion properties
- Learn more about data ingestion
4 - Data ingestion properties
Data ingestion is the process by which data is added to a table and is made available for query. You add properties to the ingestion command after the with
keyword.
Related content
- Learn more about supported data formats
- Learn more about data ingestion
5 - Ingest from query
5.1 - .cancel operation command
.cancel operation
command to cancel a long-running operation.This command cancels a long-running ingest from query operation. This command is useful when the operation is taking too long and you would like to abort it while running.
The cancel operation command isn’t guaranteed to succeed. The output of the .cancel operation
command indicates whether or not cancellation was successful.
Syntax
.cancel
operation
OperationId [with
(
reason
=
ReasonPhrase )
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
OperationId | guid | ✔️ | A guid of the operation ID returned from the running command. |
ReasonPhrase | string | The reason for canceling the running command. |
Returns
Output parameter | Type | Description |
---|---|---|
OperationId | guid | The operation ID of the operation that was canceled. |
Operation | string | The operation kind that was canceled. |
StartedOn | datetime | The start time of the operation that was canceled. |
CancellationState | string | Returns one of the following options:Cancelled successfully : the operation was canceledCancel failed : the operation can’t be canceled at this point. The operation may still be running or may have completed. |
ReasonPhrase | string | Reason why cancellation wasn’t successful. |
Example
.cancel operation 078b2641-f10d-4694-96f8-1ee2b75dda48 with(Reason="Command canceled by me")
OperationId | Operation | StartedOn | CancellationState | ReasonPhrase |
---|---|---|---|---|
c078b2641-f10d-4694-96f8-1ee2b75dda48 | TableSetOrAppend | 2022-07-18 09:03:55.1387320 | Canceled successfully | Command canceled by me |
5.2 - Kusto query ingestion (set, append, replace)
These commands execute a query or a management command and ingest the results of the query into a table. The difference between these commands is how they treat existing or nonexistent tables and data.
Command | If table exists | If table doesn’t exist |
---|---|---|
.set | The command fails. | The table is created and data is ingested. |
.append | Data is appended to the table. | The command fails. |
.set-or-append | Data is appended to the table. | The table is created and data is ingested. |
.set-or-replace | Data replaces the data in the table. | The table is created and data is ingested. |
To cancel an ingest from query command, see cancel operation
.
Permissions
To perform different actions on a table, you need specific permissions:
- To add rows to an existing table using the
.append
command, you need a minimum of Table Ingestor permissions. - To create a new table using the various
.set
commands, you need a minimum of Database User permissions. - To replace rows in an existing table using the
.set-or-replace
command, you need a minimum of Table Admin permissions.
For more information on permissions, see Kusto role-based access control.
Syntax
(.set
| .append
| .set-or-append
| .set-or-replace
) [async
] tableName [with
(
propertyName =
propertyValue [,
…])
] <|
queryOrCommand
Parameters
Name | Type | Required | Description |
---|---|---|---|
async | string | If specified, the command returns immediately and continues ingestion in the background. Use the returned OperationId with the .show operations command to retrieve the ingestion completion status and results. | |
tableName | string | ✔️ | The name of the table to ingest data into. The tableName is always related to the database in context. |
propertyName, propertyValue | string | One or more supported ingestion properties used to control the ingestion process. | |
queryOrCommand | string | ✔️ | The text of a query or a management command whose results are used as data to ingest. Only .show management commands are supported. |
Performance tips
- Set the
distributed
property totrue
if the amount of data produced by the query is large, exceeds one gigabyte (GB), and doesn’t require serialization. Then, multiple nodes can produce output in parallel. Don’t use this flag when query results are small, since it might needlessly generate many small data shards. - Data ingestion is a resource-intensive operation that might affect concurrent activities on the database, including running queries. Avoid running too many ingestion commands at the same time.
- Limit the data for ingestion to less than one GB per ingestion operation. If necessary, use multiple ingestion commands.
Supported ingestion properties
Property | Type | Description |
---|---|---|
distributed | bool | If true , the command ingests from all nodes executing the query in parallel. Default is false . See performance tips. |
creationTime | string | The datetime value, formatted as an ISO8601 string , to use at the creation time of the ingested data extents. If unspecified, now() is used. When specified, make sure the Lookback property in the target table’s effective Extents merge policy is aligned with the specified value. |
extend_schema | bool | If true , the command might extend the schema of the table. Default is false . This option applies only to .append , .set-or-append , and set-or-replace commands. This option requires at least Table Admin permissions. |
recreate_schema | bool | If true , the command might recreate the schema of the table. Default is false . This option applies only to the .set-or-replace command. This option takes precedence over the extend_schema property if both are set. This option requires at least Table Admin permissions. |
folder | string | The folder to assign to the table. If the table already exists, this property overwrites the table’s folder. |
ingestIfNotExists | string | If specified, ingestion fails if the table already has data tagged with an ingest-by: tag with the same value. For more information, see ingest-by: tags. |
policy_ingestiontime | bool | If true , the Ingestion Time Policy is enabled on the table. The default is true . |
tags | string | A JSON string that represents a list of tags to associate with the created extent. |
docstring | string | A description used to document the table. |
persistDetails | A Boolean value that, if specified, indicates that the command should persist the detailed results for retrieval by the .show operation details command. Defaults to false . | with (persistDetails=true) |
Schema considerations
.set-or-replace
preserves the schema unless one ofextend_schema
orrecreate_schema
ingestion properties is set totrue
..set-or-append
and.append
commands preserve the schema unless theextend_schema
ingestion property is set totrue
.- Matching the result set schema to that of the target table is based on the column types. There’s no matching of column names. Make sure that the query result schema columns are in the same order as the table, otherwise data is ingested into the wrong columns.
Character limitation
The command fails if the query generates an entity name with the $
character. The entity names must comply with the naming rules, so the $
character must be removed for the ingest command to succeed.
For example, in the following query, the search
operator generates a column $table
. To store the query results, use project-rename to rename the column.
.set Texas <| search State has 'Texas' | project-rename tableName=$table
Returns
Returns information on the extents created because of the .set
or .append
command.
Examples
Create and update table from query source
The following query creates the :::no-loc text=“RecentErrors”::: table with the same schema as :::no-loc text=“LogsTable”:::. It updates :::no-loc text=“RecentErrors”::: with all error logs from :::no-loc text=“LogsTable”::: over the last hour.
.set RecentErrors <|
LogsTable
| where Level == "Error" and Timestamp > now() - time(1h)
Create and update table from query source using the distributed flag
The following example creates a new table called OldExtents
in the database, asynchronously. The dataset is expected to be bigger than one GB (more than ~one million rows) so the distributed flag is used. It updates OldExtents
with ExtentId
entries from the MyExtents
table that were created more than 30 days ago.
.set async OldExtents with(distributed=true) <|
MyExtents
| where CreatedOn < now() - time(30d)
| project ExtentId
Append data to table
The following example filters ExtentId
entries in the MyExtents
table that were created more than 30 days ago and appends the entries to the OldExtents
table with associated tags.
.append OldExtents with(tags='["TagA","TagB"]') <|
MyExtents
| where CreatedOn < now() - time(30d)
| project ExtentId
Create or append a table with possibly existing tagged data
The following example either appends to or creates the OldExtents
table asynchronously. It filters ExtentId
entries in the MyExtents
table that were created more than 30 days ago and specifies the tags to append to the new extents with ingest-by:myTag
. The ingestIfNotExists
parameter ensures that the ingestion only occurs if the data doesn’t already exist in the table with the specified tag.
.set-or-append async OldExtents with(tags='["ingest-by:myTag"]', ingestIfNotExists='["myTag"]') <|
MyExtents
| where CreatedOn < now() - time(30d)
| project ExtentId
Create table or replace data with associated data
The following query replaces the data in the OldExtents
table, or creates the table if it doesn’t already exist, with ExtentId
entries in the MyExtents
table that were created more than 30 days ago. Tag the new extent with ingest-by:myTag
if the data doesn’t already exist in the table with the specified tag.
.set-or-replace async OldExtents with(tags='["ingest-by:myTag"]', ingestIfNotExists='["myTag"]') <|
MyExtents
| where CreatedOn < now() - time(30d)
| project ExtentId
Append data with associated data
The following example appends data to the OldExtents
table asynchronously, using ExtentId
entries from the MyExtents
table that were created more than 30 days ago. It sets a specific creation time for the new extents.
.append async OldExtents with(creationTime='2017-02-13T11:09:36.7992775Z') <|
MyExtents
| where CreatedOn < now() - time(30d)
| project ExtentId
Sample output
The following is a sample of the type of output you may see from your queries.
ExtentId | OriginalSize | ExtentSize | CompressedSize | IndexSize | RowCount |
---|---|---|---|---|---|
23a05ed6-376d-4119-b1fc-6493bcb05563 | 1291 | 5882 | 1568 | 4314 | 10 |
Related content
6 - Kusto.ingest into command (pull data from storage)
The .ingest into
command ingests data into a table by “pulling” the data
from one or more cloud storage files.
For example, the command
can retrieve 1,000 CSV-formatted blobs from Azure Blob Storage, parse
them, and ingest them together into a single target table.
Data is appended to the table
without affecting existing records, and without modifying the table’s schema.
Permissions
You must have at least Table Ingestor permissions to run this command.
Syntax
.ingest
[async
] into
table
TableName SourceDataLocator [with
(
IngestionPropertyName =
IngestionPropertyValue [,
…] )
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
async | string | If specified, the command returns immediately and continues ingestion in the background. The results of the command include an OperationId value that can then be used with the .show operation command to retrieve the ingestion completion status and results. | |
TableName | string | ✔️ | The name of the table into which to ingest data. The table name is always relative to the database in context. If no schema mapping object is provided, the schema of the database in context is used. |
SourceDataLocator | string | ✔️ | A single or comma-separated list of storage connection strings. A single connection string must refer to a single file hosted by a storage account. Ingestion of multiple files can be done by specifying multiple connection strings, or by ingesting from a query of an external table. |
Authentication and authorization
Each storage connection string indicates the authorization method to use for access to the storage. Depending on the authorization method, the principal might need to be granted permissions on the external storage to perform the ingestion.
The following table lists the supported authentication methods and the permissions needed for ingesting data from external storage.
Authentication method | Azure Blob Storage / Data Lake Storage Gen2 | Data Lake Storage Gen1 |
---|---|---|
Impersonation | Storage Blob Data Reader | Reader |
Shared Access (SAS) token | List + Read | This authentication method isn’t supported in Gen1. |
Microsoft Entra access token | ||
Storage account access key | This authentication method isn’t supported in Gen1. | |
Managed identity | Storage Blob Data Reader | Reader |
Returns
The result of the command is a table with as many records as there are data shards (“extents”) generated by the command. If no data shards were generated, a single record is returned with an empty (zero-valued) extent ID.
Name | Type | Description |
---|---|---|
ExtentId | guid | The unique identifier for the data shard that was generated by the command. |
ItemLoaded | string | One or more storage files that are related to this record. |
Duration | timespan | How long it took to perform ingestion. |
HasErrors | bool | Whether or not this record represents an ingestion failure. |
OperationId | guid | A unique ID representing the operation. Can be used with the .show operation command. |
Examples
Azure Blob Storage with shared access signature
The following example instructs your database to read two blobs from Azure Blob Storage as CSV files, and ingest their contents into table T
. The ...
represents an Azure Storage shared access signature (SAS) which gives read access to each blob. Obfuscated strings (the h
in front of the string values) are used to ensure that the SAS is never recorded.
.ingest into table T (
h'https://contoso.blob.core.windows.net/container/file1.csv?...',
h'https://contoso.blob.core.windows.net/container/file2.csv?...'
)
Azure Blob Storage with managed identity
The following example shows how to read a CSV file from Azure Blob Storage and ingest its contents into table T
using managed identity authentication. Authentication uses the managed identity ID (object ID) assigned to the Azure Blob Storage in Azure. For more information, see Create a managed identity for storage containers.
.ingest into table T ('https://StorageAccount.blob.core.windows.net/Container/file.csv;managed_identity=802bada6-4d21-44b2-9d15-e66b29e4d63e')
Azure Data Lake Storage Gen 2
The following example is for ingesting data from Azure Data Lake Storage Gen 2
(ADLSv2). The credentials used here (...
) are the storage account credentials
(shared key), and we use string obfuscation only for the secret part of the
connection string.
.ingest into table T (
'abfss://myfilesystem@contoso.dfs.core.windows.net/path/to/file1.csv;...'
)
Azure Data Lake Storage
The following example ingests a single file from Azure Data Lake Storage (ADLS). It uses the user’s credentials to access ADLS (so there’s no need to treat the storage URI as containing a secret). It also shows how to specify ingestion properties.
.ingest into table T ('adl://contoso.azuredatalakestore.net/Path/To/File/file1.ext;impersonate')
with (format='csv')
Amazon S3 with an access key
The following example ingests a single file from Amazon S3 using an access key ID and a secret access key.
.ingest into table T ('https://bucketname.s3.us-east-1.amazonaws.com/path/to/file.csv;AwsCredentials=AKIAIOSFODNN7EXAMPLE,wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY')
with (format='csv')
Amazon S3 with a presigned URL
The following example ingests a single file from Amazon S3 using a preSigned URL.
with (format='csv')
Related content
7 - Streaming ingestion
7.1 - Clearing cached schema for streaming ingestion
Nodes cache schema of the databases that receive data via streaming ingestion. This process optimizes performance and utilization of resources, but can cause propagation delays when the schema change.
Clear the cache to guarantee that subsequent streaming ingestion requests incorporate database or table schema changes. For more information, see Streaming ingestion and schema changes.
Permissions
You must have at least Database Ingestor permissions to run this command.
Syntax
.clear
table
TableName cache
streamingingestion
schema
.clear
database
cache
streamingingestion
schema
Parameters
Name | Type | Required | Description |
---|---|---|---|
TableName | string | ✔️ | The name of the table for which to clear the cache. |
Returns
This command returns a table with the following columns:
Column | Type | Description |
---|---|---|
NodeId | string | Identifier of the node |
Status | string | Succeeded/Failed |
Example
.clear database cache streamingingestion schema
.clear table T1 cache streamingingestion schema
NodeId | Status |
---|---|
Node1 | Succeeded |
Node2 | Failed |
7.2 - Streaming ingestion and schema changes
Cluster nodes cache the schema of databases that get data through streaming ingestion, boosting performance and resource use. However, when there are schema changes, it can lead to delays in updates.
Eventhouse nodes cache the schema of databases that get data through streaming ingestion, boosting performance and resource use. However, when there are schema changes, it can lead to delays in updates.
If schema changes and streaming ingestion aren’t synchronized, you can encounter failures like schema-related errors or incomplete and distorted data in the table.
This article outlines typical schema changes and provides guidance on avoiding problems with streaming ingestion during these changes.
Schema changes
The following list covers key examples of schema changes:
- Creation of tables
- Deletion of tables
- Adding a column to a table
- Removing a column from a table
- Retyping the columns of a table
- Renaming the columns of a table
- Adding precreated ingestion mappings
- Removing precreated ingestion mappings
- Adding, removing, or altering policies
Coordinate schema changes with streaming ingestion
The schema cache is kept while the database is online. If there are schema changes, the system automatically refreshes the cache, but this refresh can take several minutes. If you rely on the automatic refresh, you can experience uncoordinated ingestion failures.
You can reduce the effects of propagation delay by explicitly clearing the schema cache on the nodes. If the streaming ingestion flow and schema changes are coordinated, you can completely eliminate failures and their associated data distortion.
To coordinate the streaming ingestion flow with schema changes:
- Suspend streaming ingestion.
- Wait until all outstanding streaming ingestion requests are complete.
- Do schema changes.
- Issue one or several .clear cache streaming ingestion schema commands.
- Repeat until successful and all rows in the command output indicate success
- Resume streaming ingestion.