This is the multi-page printable view of this section. Click here to print.
Continuous data export
- 1: .export to SQL
- 2: .export to storage
- 3: .export to table
- 4: Data export
- 5: Continuous data export
- 5.1: .create or alter continuous-export
- 5.2: .drop continuous-export
- 5.3: .show continuous data-export failures
- 5.4: .show continuous-export
- 5.5: .show continuous-export exported-artifacts
- 5.6: Continuous data export
- 5.7: Enable or disable continuous data export
- 5.8: Use a managed identity to run a continuous export job
1 - .export to SQL
Export data to SQL allows you to run a query and have its results sent to a table in an SQL database, such as an SQL database hosted by the Azure SQL Database service.
Permissions
You must have at least Table Admin permissions to run this command.
Syntax
.export
[async
] to
sql
sqlTableName sqlConnectionString [with
(
propertyName =
propertyValue [,
…])
] <|
query
Parameters
Name | Type | Required | Description |
---|---|---|---|
async | string | If specified, the command runs asynchronously. | |
SqlTableName | string | ✔️ | The name of the SQL database table into which to insert the data. To protect against injection attacks, this name is restricted. |
SqlConnectionString | string | ✔️ | The connection string for the SQL endpoint and database. The string must follow the ADO.NET connection string format. For security reasons, the connection string is restricted. |
PropertyName, PropertyValue | string | A list of optional properties. |
Supported properties
Name | Values | Description |
---|---|---|
firetriggers | true or false | If true , instructs the target system to fire INSERT triggers defined on the SQL table. The default is false . For more information, see BULK INSERT and System.Data.SqlClient.SqlBulkCopy. |
createifnotexists | true or false | If true , the target SQL table is created if it doesn’t already exist; the primarykey property must be provided in this case to indicate the result column that is the primary key. The default is false . |
primarykey | If createifnotexists is true , this property indicates the name of the column in the result that is used as the SQL table’s primary key if it’s created by this command. | |
persistDetails | bool | Indicates that the command should persist its results (see async flag). Defaults to true in async runs, but can be turned off if the caller doesn’t require the results. Defaults to false in synchronous executions, but can be turned on. |
token | string | The Microsoft Entra access token that Kusto forwards to the SQL endpoint for authentication. When set, the SQL connection string shouldn’t include authentication information like Authentication , User ID , or Password . |
Authentication and authorization
The authentication method is based on the connection string provided, and the permissions required to access the SQL database vary depending on the authentication method.
The supported authentication methods for exporting data to SQL are Microsoft Entra integrated (impersonation) authentication and username/password authentication. For impersonation authentication, be sure that the principal has the following permissions on the database:
- Existing table: table UPDATE and INSERT
- New table: CREATE, UPDATE, and INSERT
Limitations and restrictions
There are some limitations and restrictions when exporting data to an SQL database:
Kusto is a cloud service, so the connection string must point to a database that is accessible from the cloud. (In particular, one can’t export to an on-premises database since it’s not accessible from the public cloud.)
Kusto supports Active Directory Integrated authentication when the calling principal is a Microsoft Entra principal (
aaduser=
oraadapp=
). Alternatively, Kusto also supports providing the credentials for the SQL database as part of the connection string. Other methods of authentication aren’t supported. The identity being presented to the SQL database always emanates from the command caller not the Kusto service identity itself.If the target table in the SQL database exists, it must match the query result schema. In some cases, such as Azure SQL Database, this means that the table has one column marked as an identity column.
Exporting large volumes of data might take a long time. It’s recommended that the target SQL table is set for minimal logging during bulk import. See SQL Server Database Engine > … > Database Features > Bulk Import and Export of Data.
Data export is performed using SQL bulk copy and provides no transactional guarantees on the target SQL database. See Transaction and Bulk Copy Operations.
The SQL table name is restricted to a name consisting of letters, digits, spaces, underscores (
_
), dots (.
) and hyphens (-
).The SQL connection string is restricted as follows:
Persist Security Info
is explicitly set tofalse
,Encrypt
is set totrue
, andTrust Server Certificate
is set tofalse
.The primary key property on the column can be specified when creating a new SQL table. If the column is of type
string
, then SQL might refuse to create the table due to other limitations on the primary key column. The workaround is to manually create the table in SQL before exporting the data. This limitation exists because primary key columns in SQL can’t be of unlimited size, but Kusto table columns don’t have declared size limitations.
Azure database Microsoft Entra integrated authentication Documentation
Examples
Asynchronous export to SQL table
In the following example, Kusto runs the query and then exports the first record set produced by the query to the MySqlTable
table in the MyDatabase
database in server myserver
.
.export async to sql MySqlTable
h@"Server=tcp:myserver.database.windows.net,1433;Authentication=Active Directory Integrated;Initial Catalog=MyDatabase;Connection Timeout=30;"
<| print Id="d3b68d12-cbd3-428b-807f-2c740f561989", Name="YSO4", DateOfBirth=datetime(2017-10-15)
Export to SQL table if it doesn’t exist
In the following example, Kusto runs the query and then exports the first record set produced by the query to the MySqlTable
table in the MyDatabase
database in server myserver
.
The target table is created if it doesn’t exist in the target database.
.export async to sql ['dbo.MySqlTable']
h@"Server=tcp:myserver.database.windows.net,1433;Authentication=Active Directory Integrated;Initial Catalog=MyDatabase;Connection Timeout=30;"
with (createifnotexists="true", primarykey="Id")
<| print Message = "Hello World!", Timestamp = now(), Id=12345678
Related content
2 - .export to storage
Executes a query and writes the first result set to an external cloud storage, specified by a storage connection string.
Permissions
You must have at least Database Viewer permissions to run this command.
Syntax
.export
[async
] [compressed
] to
OutputDataFormat (
StorageConnectionString [,
…] )
[with
(
PropertyName =
PropertyValue [,
…] )
] <|
Query
Parameters
Name | Type | Required | Description |
---|---|---|---|
async | string | If specified, the command runs in asynchronous mode. See asynchronous mode. | |
compressed | bool | If specified, the output storage artifacts are compressed in the format specified by the compressionType supported property. | |
OutputDataFormat | string | ✔️ | The data format of the storage artifacts written by the command. Supported values are: csv , tsv , json , and parquet . |
StorageConnectionString | string | One or more storage connection strings that specify which storage to write the data to. More than one storage connection string might be specified for scalable writes. Each such connection string must specify the credentials to use when writing to storage. For example, when writing to Azure Blob Storage, the credentials can be the storage account key, or a shared access key (SAS) with the permissions to read, write, and list blobs. When you export data to CSV files using a DFS endpoint, the data goes through a DFS managed private endpoint. When you export data to parquet files, the data goes through a blob managed private endpoint. | |
PropertyName, PropertyValue | string | A comma-separated list of key-value property pairs. See supported properties. |
Supported properties
Property | Type | Description |
---|---|---|
includeHeaders | string | For csv /tsv output, controls the generation of column headers. Can be one of none (default; no header lines emitted), all (emit a header line into every storage artifact), or firstFile (emit a header line into the first storage artifact only). |
fileExtension | string | The “extension” part of the storage artifact (for example, .csv or .tsv ). If compression is used, .gz is appended as well. |
namePrefix | string | The prefix to add to each generated storage artifact name. A random prefix is used if left unspecified. |
encoding | string | The encoding for text. Possible values include: UTF8NoBOM (default) or UTF8BOM . |
compressionType | string | The type of compression to use. For non-Parquet files, only gzip is allowed. For Parquet files, possible values include gzip , snappy , lz4_raw , brotli , and zstd . Default is gzip . |
distribution | string | Distribution hint (single , per_node , per_shard ). If value equals single , a single thread writes to storage. Otherwise, export writes from all nodes executing the query in parallel. See evaluate plugin operator. Defaults to per_shard . |
persistDetails | bool | If true , the command persists its results (see async flag). Defaults to true in async runs, but can be turned off if the caller doesn’t require the results. Defaults to false in synchronous executions, but can be turned on. |
sizeLimit | long | The size limit in bytes of a single storage artifact written before compression. Valid range: 100 MB (default) to 4 GB. |
parquetRowGroupSize | int | Relevant only when data format is Parquet. Controls the row group size in the exported files. Default row group size is 100,000 records. |
distributed | bool | Disable or enable distributed export. Setting to false is equivalent to single distribution hint. Default is true. |
parquetDatetimePrecision | string | The precision to use when exporting datetime values to Parquet. Possible values are millisecond and microsecond. Default is millisecond. |
Authentication and authorization
The authentication method is based on the connection string provided, and the permissions required vary depending on the authentication method.
The following table lists the supported authentication methods and the permissions needed for exporting data to external storage by storage type.
Authentication method | Azure Blob Storage / Data Lake Storage Gen2 | Data Lake Storage Gen1 |
---|---|---|
Impersonation | Storage Blob Data Contributor | Contributor |
Shared Access (SAS) token | Write | Write |
Microsoft Entra access token | No extra permissions required | No extra permissions required |
Storage account access key | No extra permissions required | No extra permissions required |
Returns
The commands return a table that describes the generated storage artifacts. Each record describes a single artifact and includes the storage path to the artifact and how many records it holds.
Asynchronous mode
If the async
flag is specified, the command executes in asynchronous mode.
In this mode, the command returns immediately with an operation ID, and data
export continues in the background until completion. The operation ID returned
by the command can be used to track its progress and ultimately its results
via the following commands:
.show operations
: Track progress..show operation details
: Get completion results.
For example, after a successful completion, you can retrieve the results using:
.show operation f008dc1e-2710-47d8-8d34-0d562f5f8615 details
Examples
In this example, Kusto runs the query and then exports the first recordset produced by the query to one or more compressed CSV blobs, up to 1 GB before compression. Column name labels are added as the first row for each blob.
.export
async compressed
to csv (
h@"https://storage1.blob.core.windows.net/containerName;secretKey",
h@"https://storage1.blob.core.windows.net/containerName2;secretKey"
) with (
sizeLimit=1000000000,
namePrefix="export",
includeHeaders="all",
encoding="UTF8NoBOM"
)
<|
Logs | where id == "1234"
Failures during export commands
Export commands can transiently fail during execution. Continuous export automatically retries the command. Regular export commands (export to storage, export to external table) don’t perform any retries.
- When the export command fails, artifacts already written to storage aren’t deleted. These artifacts remain in storage. If the command fails, assume the export is incomplete, even if some artifacts were written.
- The best way to track both completion of the command and the artifacts exported upon successful completion is by using the
.show operations
and.show operation details
commands.
Storage failures
By default, export commands are distributed such that there might be many concurrent writes to storage. The level of distribution depends on the type of export command:
The default distribution for regular
.export
command isper_shard
, which means all extents that contain data to export write to storage concurrently.The default distribution for export to external table commands is
per_node
, which means the concurrency is the number of nodes.
When the number of extents/nodes is large, this might lead to high load on storage that results in storage throttling, or transient storage errors. The following suggestions might overcome these errors (by order of priority):
Increase the number of storage accounts provided to the export command or to the external table definition. The load is evenly distributed between the accounts.
Reduce the concurrency by setting the distribution hint to
per_node
(see command properties).Reduce concurrency of number of nodes exporting by setting the client request property
query_fanout_nodes_percent
to the desired concurrency (percent of nodes). The property can be set as part of the export query. For example, the following command limits the number of nodes writing to storage concurrently to 50% of the nodes:.export async to csv ( h@"https://storage1.blob.core.windows.net/containerName;secretKey" ) with ( distribution="per_node" ) <| set query_fanout_nodes_percent = 50; ExportQuery
Reduce concurrency of number of threads exporting in each node when using per shard export, by setting the client request property
query_fanout_threads_percent
to the desired concurrency (percent of threads). The property can be set as part of the export query. For example, the following command limits the number of threads writing to storage concurrently to 50% on each of the nodes:.export async to csv ( h@"https://storage1.blob.core.windows.net/containerName;secretKey" ) with ( distribution="per_shard" ) <| set query_fanout_threads_percent = 50; ExportQuery
If exporting to a partitioned external table, setting the
spread
/concurrency
properties can reduce concurrency (see details in the command properties.If neither of the previous recommendations work, you can completely disable distribution by setting the
distributed
property to false. However, we don’t recommend doing so, as it might significantly affect the command performance.
Authorization failures
Authentication or authorization failures during export commands can occur when the credentials provided in the storage connection string aren’t permitted to write to storage. If you’re using impersonate
or a user-delegated SAS token for the export command, the Storage Blob Data Contributor role is required to write to the storage account. For more information, see Storage connection strings.
Data types mapping
Parquet data types mapping
On export, Kusto data types are mapped to Parquet data types using the following rules:
Kusto Data Type | Parquet Data Type | Parquet Annotation | Comments |
---|---|---|---|
bool | BOOLEAN | ||
datetime | INT64 | TIMESTAMP_MICROS | |
dynamic | BYTE_ARRAY | UTF-8 | Serialized as JSON string |
guid | BYTE_ARRAY | UTF-8 | |
int | INT32 | ||
long | INT64 | ||
real | DOUBLE | ||
string | BYTE_ARRAY | UTF-8 | |
timespan | INT64 | Stored as ticks (100-nanosecond units) count | |
decimal | FIXED_LENGTH_BYTE_ARRAY | DECIMAL |
Related content
3 - .export to table
You can export data by defining an external table and exporting data to it. The table properties are specified when creating the external table. The export command references the external table by name.
Permissions
You must have at least Table Admin permissions to run this command.
Syntax
.export
[async
] to
table
externalTableName
[with
(
propertyName =
propertyValue [,
…])
] <|
query
Parameters
Name | Type | Required | Description |
---|---|---|---|
externalTableName | string | ✔️ | The name of the external table to which to export. |
propertyName, propertyValue | string | A comma-separated list of optional properties. | |
query | string | ✔️ | The export query. |
Supported properties
The following properties are supported as part of the export to external table command.
Property | Type | Description | Default |
---|---|---|---|
sizeLimit | long | The size limit in bytes of a single storage artifact written before compression. A full row group of size parquetRowGroupSize is written before checking whether this row group reaches the size limit and should start a new artifact. Valid range: 100 MB (default) to 1 GB. | |
distributed | bool | Disable or enable distributed export. Setting to false is equivalent to single distribution hint. | true |
distribution | string | Distribution hint (single , per_node , per_shard ). See more details in Distribution settings | per_node |
distributionKind | string | Optionally switches to uniform distribution when the external table is partitioned by string partition. Valid values are uniform or default . See more details in Distribution settings | |
concurrency | Number | Hints the system how many partitions to run in parallel. See more details in Distribution settings | 16 |
spread | Number | Hints the system how to distribute the partitions among nodes. See more details in Distribution settings | Min(64, number-of-nodes) |
parquetRowGroupSize | int | Relevant only when data format is Parquet. Controls the row group size in the exported files. This value takes precedence over sizeLimit , meaning a full row group will be exported before checking whether this row group reaches the size limit and should start a new artifact. | 100,000 |
Distribution settings
The distribution of an export to external table operation indicates the number of nodes and threads that are writing to storage concurrently. The default distribution depends on the external table partitioning:
External table partitioning | Default distribution |
---|---|
External table isn’t partitioned, or partitioned by datetime column only | Export is distributed per_node - all nodes are exporting concurrently. Each node writes the data assigned to that node. The number of files exported by a node is greater than one, only if the size of the data from that node exceeds sizeLimit . |
External table is partitioned by a string column | The data to export is moved between the nodes, such that each node writes a subset of the partition values. A single partition is always written by a single node. The number of files written per partition should be greater than one only if the data exceeds sizeLimit . If the external table includes several string partitions, then data is partitioned between the node based on the first partition. Therefore, the recommendation is to define the partition with most uniform distribution as the first one. |
Change the default distribution settings
Changing the default distribution settings can be useful in the following cases:
Use case | Description | Recommendation |
---|---|---|
Reduce the number of exported files | Export is creating too many small files, and you would like it to create a smaller number of larger files. | Set distribution =single or distributed =false (both are equivalent) in the command properties. Only a single thread performs the export. The downside of this is that the export operation can be slower, as concurrency is much reduced. |
Reduce the export duration | Increasing the concurrency of the export operation, to reduce its duration. | Set distribution =per_shard in the command properties. Doing so means concurrency of the write operations is per data shard, instead of per node. This is only relevant when exporting to an external table that isn’t partitioned by string partition. This might create too much load on storage, potentially resulting in throttling. See Storage failures. |
Reduce the export duration for external tables that are partitioned by a string partition | If the partitions aren’t uniformly distributed between the nodes, export might take a longer time to run. If one partition is much larger than the others, the node assigned to that partition does most of the export work, while the other nodes remain mostly idle. For more information, see Distribution settings. | There are several settings you can change: * If there’s more than one string partition, define the one with best distribution first. * Set distributionKind =uniform in the command properties. This setting disables the default distribution settings for string-partitioned external tables. Export runs with per-node distribution and each node exports the data assigned to the node. A single partition might be written by several nodes, and the number of files increases accordingly. To increase concurrency even further, set distributionKind =uniform along with distribution =per_shard for highest concurrency (at the cost of potentially many more files written)* If the cause for slow export isn’t outliers in the data, reduce duration by increasing concurrency, without changing partitioning settings. Use the hint.spread and hint.concurrency properties, which determine the concurrency of the partitioning. See partition operator. By default, the number of nodes exporting concurrently (the spread ) is the minimum value between 64 and the number of nodes. Setting spread to a higher number than number of nodes increases the concurrency on each node (max value for spread is 64). |
Authentication and authorization
In order to export to an external table, you must set up write permissions. For more information, see the Write permissions for Azure Storage external table or SQL Server external table.
Output
Output parameter | Type | Description |
---|---|---|
ExternalTableName | string | The name of the external table. |
Path | string | Output path. |
NumRecords | string | Number of records exported to path. |
Notes
The export query output schema must match the schema of the external table, including all columns defined by the partitions. For example, if the table is partitioned by DateTime, the query output schema must have a Timestamp column matching the TimestampColumnName. This column name is defined in the external table partitioning definition.
It isn’t possible to override the external table properties using the export command. For example, you can’t export data in Parquet format to an external table whose data format is CSV.
If the external table is partitioned, exported artifacts are written to their respective directories according to the partition definitions. For an example, see partitioned external table example.
- If a partition value is null/empty or is an invalid directory value, per the definitions of the target storage, the partition value is replaced with a default value of
__DEFAULT_PARTITION__
.
- If a partition value is null/empty or is an invalid directory value, per the definitions of the target storage, the partition value is replaced with a default value of
For suggestions to overcome storage errors during export commands, see failures during export commands.
External table columns are mapped to suitable target format data types, according to data types mapping rules.
Parquet native export is a more performant, resource light export mechanism. An exported
datetime
column is currently unsupported by Synapse SQLCOPY
.
Number of files
The number of files written per partition depends on the distribution settings of the export operation:
If the external table includes
datetime
partitions only, or no partitions at all, the number of files written for each partition that exists, should be similar to the number of nodes (or more, ifsizeLimit
is reached). When the export operation is distributed, all nodes export concurrently. To disable distribution, so that only a single node does the writes, setdistributed
to false. This process creates fewer files, but reduces the export performance.If the external table includes a partition by a string column, the number of exported files should be a single file per partition (or more, if
sizeLimit
is reached). All nodes still participate in the export (operation is distributed), but each partition is assigned to a specific node. Settingdistributed
to false, causes only a single node to do the export, but behavior remains the same (a single file written per partition).
Examples
Non-partitioned external table example
The following example exports data from table T
to the ExternalBlob
table. ExternalBlob
is a non-partitioned external table.
.export to table ExternalBlob <| T
Output
ExternalTableName | Path | NumRecords |
---|---|---|
ExternalBlob | http://storage1.blob.core.windows.net/externaltable1cont1/1_58017c550b384c0db0fea61a8661333e.csv | 10 |
Partitioned external table example
The following example first creates a partitioned external table, PartitionedExternalBlob
with a specified blob storage location. The data is stored in CSV format with a path format which organizes the data by customer name and date.
.create external table PartitionedExternalBlob (Timestamp:datetime, CustomerName:string)
kind=blob
partition by (CustomerName:string=CustomerName, Date:datetime=startofday(Timestamp))
pathformat = ("CustomerName=" CustomerName "/" datetime_pattern("yyyy/MM/dd", Date))
dataformat=csv
(
h@'http://storageaccount.blob.core.windows.net/container1;secretKey'
)
It then exports data from table T
to the PartitionedExternalBlob
external table.
.export to table PartitionedExternalBlob <| T
Output
If the command is executed asynchronously by using the async
keyword, the output is available using the show operation details command.
Related content
4 - Data export
Data export involves executing a Kusto query and saving its results. This process can be carried out either on the client side or the service side.
For examples on data export, see Related content.
Client-side export
Client-side export gives you control over saving query results either to the local file system or pushing them to a preferred storage location. This flexibility is facilitated by using Kusto client libraries. You can create an app to run queries, read the desired data, and implement an export process tailored to your requirements.
Alternatively, you can use a client tool like the Azure Data Explorer web UI to export data from your Kusto cluster. For more information, see Share queries.
Service-side export (pull)
Use the ingest from query commands to pull query results into a table in the same or different database. See the performance tips before using these commands.
Service-side export (push)
For scalable data export, the service offers various .export
management commands to push query results to cloud storage, an external table, or an SQL table. This approach enhances scalability by avoiding the bottleneck of streaming through a single network connection.
Continuous data export is supported for export to external tables.
Related content
5 - Continuous data export
5.1 - .create or alter continuous-export
Creates or alters a continuous export job.
Permissions
You must have at least Database Admin permissions to run this command.
Syntax
.create-or-alter
continuous-export
continuousExportName [over
(
T1, T2 )
] to
table
externalTableName [with
(
propertyName =
propertyValue [,
…])
] <|
query
Parameters
Name | Type | Required | Description |
---|---|---|---|
continuousExportName | string | ✔️ | The name of the continuous export. Must be unique within the database. |
externalTableName | string | ✔️ | The name of the external table export target. |
query | string | ✔️ | The query to export. |
T1, T2 | string | A comma-separated list of fact tables in the query. If not specified, all tables referenced in the query are assumed to be fact tables. If specified, tables not in this list are treated as dimension tables and aren’t scoped, so all records participate in all exports. See continuous data export overview for details. | |
propertyName, propertyValue | string | A comma-separated list of optional properties. |
Supported properties
Property | Type | Description |
---|---|---|
intervalBetweenRuns | Timespan | The time span between continuous export executions. Must be greater than 1 minute. |
forcedLatency | Timespan | An optional period of time to limit the query to records ingested before a specified period relative to the current time. This property is useful if, for example, the query performs some aggregations or joins, and you want to make sure all relevant records have been ingested before running the export. |
sizeLimit | long | The size limit in bytes of a single storage artifact written before compression. Valid range: 100 MB (default) to 1 GB. |
distributed | bool | Disable or enable distributed export. Setting to false is equivalent to single distribution hint. Default is true. |
parquetRowGroupSize | int | Relevant only when data format is Parquet. Controls the row group size in the exported files. Default row group size is 100,000 records. |
managedIdentity | string | The managed identity for which the continuous export job runs. The managed identity can be an object ID, or the system reserved word. For more information, see Use a managed identity to run a continuous export job. |
isDisabled | bool | Disable or enable the continuous export. Default is false. |
Example
The following example creates or alters a continuous export MyExport
that exports data from the T
table to ExternalBlob
. The data exports occur every hour, and have a defined forced latency and size limit per storage artifact.
.create-or-alter continuous-export MyExport
over (T)
to table ExternalBlob
with
(intervalBetweenRuns=1h,
forcedLatency=10m,
sizeLimit=104857600)
<| T
Name | ExternalTableName | Query | ForcedLatency | IntervalBetweenRuns | CursorScopedTables | ExportProperties |
---|---|---|---|---|---|---|
MyExport | ExternalBlob | S | 00:10:00 | 01:00:00 | [ “[‘DB’].[‘S’]" ] | { “SizeLimit”: 104857600 } |
Related content
5.2 - .drop continuous-export
Drops a continuous-export job.
Permissions
You must have at least Database Admin permissions to run this command.
Syntax
.drop
continuous-export
ContinuousExportName
Parameters
Name | Type | Required | Description |
---|---|---|---|
ContinuousExportName | string | ✔️ | The name of the continuous export. |
Returns
The remaining continuous exports in the database (post deletion). Output schema as in the show continuous export command.
Related content
5.3 - .show continuous data-export failures
Returns all failures logged as part of the continuous export within the past 14 days. To view only a specific time range, filter the results by the Timestamp column.
The command doesn’t return any results if executed on a follower database, it must be executed against the leader database.
The command doesn’t return any results if executed on a database shortcut, it must be executed against the leader database.
Permissions
You must have at least Database Monitor or Database Admin permissions to run this command. For more information, see role-based access control.
Syntax
.show
continuous-export
ContinuousExportName failures
Parameters
Name | Type | Required | Description |
---|---|---|---|
ContinuousExportName | string | ✔️ | The name of the continuous export. |
Returns
Output parameter | Type | Description |
---|---|---|
Timestamp | datetime | Timestamp of the failure. |
OperationId | string | Operation ID of the failure. |
Name | string | Continuous export name. |
LastSuccessRun | Timestamp | The last successful run of the continuous export. |
FailureKind | string | Failure/PartialFailure. PartialFailure indicates some artifacts were exported successfully before the failure occurred. |
Details | string | Failure error details. |
Example
The following example shows failures from the continuous export MyExport
.
.show continuous-export MyExport failures
Output
Timestamp | OperationId | Name | LastSuccessRun | FailureKind | Details |
---|---|---|---|---|---|
2019-01-01 11:07:41.1887304 | ec641435-2505-4532-ba19-d6ab88c96a9d | MyExport | 2019-01-01 11:06:35.6308140 | Failure | Details… |
Related content
5.4 - .show continuous-export
Returns the properties of a specified continuous export or all continuous exports in the database.
Permissions
You must have at least Database User, Database Viewer, or Database Monitor permissions to run this command. For more information, see role-based access control.
Syntax
.show
continuous-export
ContinuousExportName
.show
continuous-exports
Parameters
Name | Type | Required | Description |
---|---|---|---|
ContinuousExportName | string | ✔️ | The name of the continuous export. |
Returns
Output parameter | Type | Description |
---|---|---|
CursorScopedTables | string | The list of explicitly scoped (fact) tables (JSON serialized). |
ExportProperties | string | The export properties (JSON serialized). |
ExportedTo | datetime | The last datetime (ingestion time) that was exported successfully. |
ExternalTableName | string | The external table name. |
ForcedLatency | timeSpan | The forced latency timespan, if defined. Returns Null if no timespan is defined. |
IntervalBetweenRuns | timeSpan | The interval between runs. |
IsDisabled | bool | A boolean value indicating whether the continuous export is disabled. |
IsRunning | bool | A boolean value indicating whether the continuous export is currently running. |
LastRunResult | string | The results of the last continuous-export run (Completed or Failed ). |
LastRunTime | datetime | The last time the continuous export was executed (start time) |
Name | string | The name of the continuous export. |
Query | string | The export query. |
StartCursor | string | The starting point of the first execution of this continuous export. |
Related content
5.5 - .show continuous-export exported-artifacts
Returns all artifacts exported by the continuous-export in all runs. Filter the results by the Timestamp column in the command to view only records of interest. The history of exported artifacts is retained for 14 days.
The command doesn’t return any results if executed on a follower database, it must be executed against the leader database.
The command doesn’t return any results if executed on a database shortcut, it must be executed against the leader database.
Permissions
You must have at least Database Monitor or Database Admin permissions to run this command. For more information, see role-based access control.
Syntax
.show
continuous-export
ContinuousExportName exported-artifacts
Parameters
Name | Type | Required | Description |
---|---|---|---|
ContinuousExportName | string | ✔️ | The name of the continuous export. |
Returns
Output parameter | Type | Description |
---|---|---|
Timestamp | datetime | THe tTimestamp of the continuous export run |
ExternalTableName | string | Name of the external table |
Path | string | Output path |
NumRecords | long | Number of records exported to path |
Example
The following example shows retrieved artifacts from the continuous export MyExport
that were exported within the last hour.
.show continuous-export MyExport exported-artifacts | where Timestamp > ago(1h)
Output
Timestamp | ExternalTableName | Path | NumRecords | SizeInBytes |
---|---|---|---|---|
2018-12-20 07:31:30.2634216 | ExternalBlob | http://storageaccount.blob.core.windows.net/container1/1_6ca073fd4c8740ec9a2f574eaa98f579.csv | 10 | 1024 |
Related content
5.6 - Continuous data export
This article describes continuous export of data from Kusto to an external table with a periodically run query. The results are stored in the external table, which defines the destination, such as Azure Blob Storage, and the schema of the exported data. This process guarantees that all records are exported “exactly once”, with some exceptions.
By default, continuous export runs in a distributed mode, where all nodes export concurrently, so the number of artifacts depends on the number of nodes. Continuous export isn’t designed for low-latency streaming data.
To enable continuous data export, create an external table and then create a continuous export definition pointing to the external table.
In some cases, you must use a managed identity to successfully configure a continuous export job. For more information, see Use a managed identity to run a continuous export job.
Permissions
All continuous export commands require at least Database Admin permissions.
Continuous export guidelines
Output schema:
- The output schema of the export query must match the schema of the external table to which you export.
Frequency:
Continuous export runs according to the time period configured for it in the
intervalBetweenRuns
property. The recommended value for this interval is at least several minutes, depending on the latencies you’re willing to accept. The time interval can be as low as one minute, if the ingestion rate is high.[!NOTE] The
intervalBetweenRuns
serves as a recommendation only, and isn’t guaranteed to be precise. Continuous export isn’t suitable for exporting periodic aggregations. For example, a configuration ofintervalBetweenRuns
=1h
with an hourly aggregation (T | summarize by bin(Timestamp, 1h)
) won’t work as expected, since the continuous export won’t run exactly on-the-hour. Therefore, each hourly bin will receive multiple entries in the exported data.
Number of files:
- The number of files exported in each continuous export iteration depends on how the external table is partitioned. For more information, see export to external table command. Each continuous export iteration always writes to new files, and never appends to existing ones. As a result, the number of exported files also depends on the frequency in which the continuous export runs. The frequency parameter is
intervalBetweenRuns
.
- The number of files exported in each continuous export iteration depends on how the external table is partitioned. For more information, see export to external table command. Each continuous export iteration always writes to new files, and never appends to existing ones. As a result, the number of exported files also depends on the frequency in which the continuous export runs. The frequency parameter is
External table storage accounts:
- For best performance, the database and the storage accounts should be colocated in the same Azure region.
- Continuous export works in a distributed manner, such that all nodes are exporting concurrently. On large databases, and if the exported data volume is large, this might lead to storage throttling. The recommendation is to configure multiple storage accounts for the external table. For more information, see storage failures during export commands.
Exactly once export
To guarantee “exactly once” export, continuous export uses database cursors. The continuous export query shouldn’t include a timestamp filter - the database cursors mechanism ensures that records aren’t processed more than once. Adding a timestamp filter in the query can lead to missing data in exported data.
IngestionTime policy must be enabled on all tables referenced in the query that should be processed “exactly once” in the export. The policy is enabled by default on all newly created tables.
The guarantee for “exactly once” export is only for files reported in the show exported artifacts command. Continuous export doesn’t guarantee that each record is written only once to the external table. If a failure occurs after export begins and some of the artifacts were already written to the external table, the external table might contain duplicates. If a write operation was aborted before completion, the external table might contain corrupted files. In such cases, artifacts aren’t deleted from the external table, but they aren’t reported in the show exported artifacts command. Consuming the exported files using the show exported artifacts command
guarantees no duplications and no corruptions.
Export from fact and dimension tables
By default, all tables referenced in the export query are assumed to be fact tables. As such, they’re scoped to the database cursor. The syntax explicitly declares which tables are scoped (fact) and which aren’t scoped (dimension). See the over
parameter in the create command for details.
The export query includes only the records that joined since the previous export execution. The export query might contain dimension tables in which all records of the dimension table are included in all export queries. When using joins between fact and dimension tables in continuous-export, keep in mind that records in the fact table are only processed once. If the export runs while records in the dimension tables are missing for some keys, records for the respective keys are either missed or include null values for the dimension columns in the exported files. Returning missed or null records depends on whether the query uses inner or outer join. The forcedLatency
property in the continuous-export definition can be useful in such cases, where the fact and dimensions tables are ingested during the same time for matching records.
Monitor continuous export
Monitor the health of your continuous export jobs using the following export metrics:
Continuous export max lateness
- Max lateness (in minutes) of continuous exports in the database. This is the time between now and the minExportedTo
time of all continuous export jobs in database. For more information, see.show continuous export
command.Continuous export result
- Success/failure result of each continuous export execution. This metric can be split by the continuous export name.
Use the .show continuous export failures
command to see the specific failures of a continuous export job.
Resource consumption
- The impact of the continuous export on the database depends on the query the continuous export is running. Most resources, such as CPU and memory, are consumed by the query execution.
- The number of export operations that can run concurrently is limited by the database’s data export capacity. For more information, see Management commands throttling. If the database doesn’t have sufficient capacity to handle all continuous exports, some start lagging behind.
- The show commands-and-queries command can be used to estimate the resources consumption.
- Filter on
| where ClientActivityId startswith "RunContinuousExports"
to view the commands and queries associated with continuous export.
- Filter on
Export historical data
Continuous export starts exporting data only from the point of its creation. Records ingested before that time should be exported separately using the non-continuous export command. Historical data might be too large to be exported in a single export command. If needed, partition the query into several smaller batches.
To avoid duplicates with data exported by continuous export, use StartCursor
returned by the show continuous export command and export only records where cursor_before_or_at
the cursor value. For example:
.show continuous-export MyExport | project StartCursor
StartCursor |
---|
636751928823156645 |
Followed by:
.export async to table ExternalBlob
<| T | where cursor_before_or_at("636751928823156645")
Continuous export from a table with Row Level Security
To create a continuous export job with a query that references a table with Row Level Security policy, you must:
- Provide a managed identity as part of the continuous export configuration. For more information, see Use a managed identity to run a continuous export job.
- Use impersonation authentication for the external table to which the data is exported.
Continuous export to delta table - Preview
Continuous export to a delta table is currently in preview.
To define continuous export to a delta table, do the following steps:
Create an external delta table, as described in Create and alter delta external tables on Azure Storage.
[!NOTE] If the schema isn’t provided, Kusto will try infer it automatically if there is already a delta table defined in the target storage container.
Delta table partitioning isn’t supported.Define continuous export to this table using the commands described in Create or alter continuous export.
[!IMPORTANT] The schema of the delta table must be in sync with the continuous export query. If the underlying delta table changes, the export might start failing with unexpected behavior.
Limitations
General:
- The following formats are allowed on target tables:
CSV
,TSV
,JSON
, andParquet
. - Continuous export isn’t designed to work over materialized views, since a materialized view might be updated, while data exported to storage is always appended and never updated.
- Continuous export can’t be created on follower databases since follower databases are read-only and continuous export requires write operations.
- Records in source table must be ingested to the table directly, using an update policy, or ingest from query commands. If records are moved into the table using .move extents or using .rename table, continuous export might not process these records. See the limitations described in the Database Cursors page.
- If the artifacts used by continuous export are intended to trigger Event Grid notifications, see the known issues section in the Event Grid documentation.
Cross-database and cross-cluster:
- Continuous export doesn’t support cross-cluster calls.
- Continuous export supports cross-database calls only for dimension tables. All fact tables must reside in the local database. See more details in Export from fact and dimension tables.
- If the continuous export includes cross-database calls, it must be configured with a managed identity.
Cross-database and cross-Eventhouse:
- Continuous export doesn’t support cross-Eventhouse calls.
- Continuous export supports cross-database calls only for dimension tables. All fact tables must reside in the local database. See more details in Export from fact and dimension tables.
Policies:
- Continuous export can’t be enabled on a table with Row Level Security policy unless specific conditions are met. For more information, see Continuous export from a table with Row Level Security.
- Continuous export can’t be configured on a table with restricted view access policy.
Related content
5.7 - Enable or disable continuous data export
Disables or enables the continuous-export job. A disabled continuous export isn’t executed, but its current state is persisted and can be resumed when the continuous export is enabled.
When enabling a continuous export that was disabled for a long time, exporting continues from where it last stopped when the exporting was disabled. This continuation might result in a long running export, blocking other exports from running, if there isn’t sufficient database capacity to serve all processes. Continuous exports are executed by last run time in ascending order so the oldest export runs first, until catch up is complete.
Permissions
You must have at least Database Admin permissions to run these commands.
Syntax
.enable
continuous-export
ContinuousExportName
.disable
continuous-export
ContinuousExportName
Parameters
Name | Type | Required | Description |
---|---|---|---|
ContinuousExportName | string | ✔️ | The name of the continuous export. |
Returns
The result of the show continuous export command of the altered continuous export.
Related content
5.8 - Use a managed identity to run a continuous export job
A continuous export job exports data to an external table with a periodically run query.
The continuous export job should be configured with a managed identity in the following scenarios:
- When the external table uses impersonation authentication
- When the query references tables in other databases
- When the query references tables with an enabled row level security policy
A continuous export job configured with a managed identity is performed on behalf of the managed identity.
In this article, you learn how to configure a system-assigned or user-assigned managed identity and create a continuous export job using that identity.
Prerequisites
- A cluster and database. Create a cluster and database.
- All Databases Admin permissions on the database.
Configure a managed identity
There are two types of managed identities:
System-assigned: A system-assigned identity is connected to your cluster and is removed when the cluster is removed. Only one system-assigned identity is allowed per cluster.
User-assigned: A user-assigned managed identity is a standalone Azure resource. Multiple user-assigned identities can be assigned to your cluster.
Select one of the following tabs to set up your preferred managed identity type.
User-assigned
Follow the steps to Add a user-assigned identity.
In the Azure portal, in the left menu of your managed identity resource, select Properties. Copy and save the Tenant Id and Principal Id for use in the following steps.
Run the following .alter-merge policy managed_identity command, replacing
<objectId>
with the managed identity object ID from the previous step. This command sets a managed identity policy on the cluster that allows the managed identity to be used with continuous export..alter-merge cluster policy managed_identity ```[ { "ObjectId": "<objectId>", "AllowedUsages": "AutomatedFlows" } ]```
[!NOTE] To set the policy on a specific database, use
database <DatabaseName>
instead ofcluster
.Run the following command to grant the managed identity Database Viewer permissions over all databases used for the continuous export, such as the database that contains the external table.
.add database <DatabaseName> viewers ('aadapp=<objectId>;<tenantId>')
Replace
<DatabaseName>
with the relevant database,<objectId>
with the managed identity Principal Id from step 2, and<tenantId>
with the Microsoft Entra ID Tenant Id from step 2.
System-assigned
Follow the steps to Add a system-assigned identity.
Copy and save the Object (principal) ID for use in a later step.
Run the following .alter-merge policy managed_identity command. This command sets a managed identity policy on the cluster that allows the managed identity to be used with continuous export.
.alter-merge cluster policy managed_identity ```[ { "ObjectId": "system", "AllowedUsages": "AutomatedFlows" } ]```
[!NOTE] To set the policy on a specific database, use
database <DatabaseName>
instead ofcluster
.Run the following command to grant the managed identity Database Viewer permissions over all databases used for the continuous export, such as the database that contains the external table.
.add database <DatabaseName> viewers ('aadapp=<objectId>')
Replace
<DatabaseName>
with the relevant database and<objectId>
with the managed identity Object (principal) ID from step 2.
Set up an external table
External tables refer to data located in Azure Storage, such as Azure Blob Storage, Azure Data Lake Storage Gen1, Azure Data Lake Storage Gen2, or SQL Server.
Select one of the following tabs to set up an Azure Storage or SQL Server external table.
Azure Storage
Create a connection string based on the storage connection string templates. This string indicates the resource to access and its authentication information. For continuous export flows, we recommend impersonation authentication.
Run the .create or .alter external table command to create the table. Use the connection string from the previous step as the storageConnectionString argument.
For example, the following command creates
MyExternalTable
that refers to CSV-formatted data inmycontainer
ofmystorageaccount
in Azure Blob Storage. The table has two columns, one for an integerx
and one for a strings
. The connection string ends with;impersonate
, which indicates to use impersonation authentication to access the data store..create external table MyExternalTable (x:int, s:string) kind=storage dataformat=csv ( h@'https://mystorageaccount.blob.core.windows.net/mycontainer;impersonate' )
Grant the managed identity write permissions over the relevant external data store. The managed identity needs write permissions because the continuous export job exports data to the data store on behalf of the managed identity.
External data store Required permissions Grant the permissions Azure Blob Storage Storage Blob Data Contributor Assign an Azure role Data Lake Storage Gen2 Storage Blob Data Contributor Assign an Azure role Data Lake Storage Gen1 Contributor Assign an Azure role
SQL Server
Create a SQL Server connection string. This string indicates the resource to access and its authentication information. For continuous export flows, we recommend Microsoft Entra integrated authentication, which is impersonation authentication.
Run the .create or .alter external table command to create the table. Use the connection string from the previous step as the sqlServerConnectionString argument.
For example, the following command creates
MySqlExternalTable
that refers toMySqlTable
table inMyDatabase
of SQL Server. The table has two columns, one for an integerx
and one for a strings
. The connection string contains;Authentication=Active Directory Integrated
, which indicates to use impersonation authentication to access the table..create external table MySqlExternalTable (x:int, s:string) kind=sql table=MySqlTable ( h@'Server=tcp:myserver.database.windows.net,1433;Authentication=Active Directory Integrated;Initial Catalog=MyDatabase;' )
Grant the managed identity CREATE, UPDATE, and INSERT permissions over the SQL Server database. The managed identity needs write permissions because the continuous export job exports data to the database on behalf of the managed identity. To learn more, see Permissions.
Create a continuous export job
Select one of the following tabs to create a continuous export job that runs on behalf of a user-assigned or system-assigned managed identity.
User-assigned
Run the .create-or-alter continuous-export command with the managedIdentity
property set to the managed identity object ID.
For example, the following command creates a continuous export job named MyExport
to export the data in MyTable
to MyExternalTable
on behalf of a user-assigned managed identity. <objectId>
should be a managed identity object ID.
.create-or-alter continuous-export MyExport over (MyTable) to table MyExternalTable with (managedIdentity=<objectId>, intervalBetweenRuns=5m) <| MyTable
System-assigned
Run the .create-or-alter continuous-export command with the managedIdentity
property set to system
.
For example, the following command creates a continuous export job named MyExport
to export the data in MyTable
to MyExternalTable
on behalf of your system-assigned managed identity.
.create-or-alter continuous-export MyExport over (MyTable) to table MyExternalTable with (managedIdentity="system", intervalBetweenRuns=5m) <| MyTable