This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Management

1 - Advanced data management

1.1 - .clear cluster cache external-artifacts command

Learn how to use the .clear cluster cache external-artifacts command to clear cached external-artifacts of language plugins.

Clears cached external-artifacts of language plugins.

This command is useful when you update external-artifact files stored in external storage, as the cache may retain the previous versions. In such scenarios, executing this command will clear the cache entries and ensure that subsequent queries run with the latest version of the artifacts.

Permissions

You must have at least Database Admin permissions to run this command.

Syntax

.clear cluster cache external-artifacts ( ArtifactURI [, … ] )

Parameters

NameTypeRequiredDescription
ArtifactURIstring✔️The URI for the external-artifact to clear from the cache.

Returns

This command returns a table with the following columns:

ColumnTypeDescription
ExternalArtifactUristringThe external artifact URI.
StatestringThe result of the clear operation on the external artifact.

Example

.clear cluster cache external-artifacts ("https://kustoscriptsamples.blob.core.windows.net/samples/R/sample_script.r", "https://kustoscriptsamples.blob.core.windows.net/samples/python/sample_script.py")
ExternalArtifactUriState
https://kustoscriptsamples.blob.core.windows.net/samples/R/sample_script.rCleared successfully on all nodes
https://kustoscriptsamples.blob.core.windows.net/samples/python/sample_script.pyCleared successfully on all nodes

1.2 - Data purge

This article describes Data purge.

The data platform supports the ability to delete individual records, by using Kusto .purge and related commands. You can also purge an entire table or purge records in a materialized view.

Purge guidelines

Carefully design your data schema and investigate relevant policies before storing personal data.

  1. In a best-case scenario, the retention period on this data is sufficiently short and data is automatically deleted.
  2. If retention period usage isn’t possible, isolate all data that is subject to privacy rules in a few tables. Optimally, use just one table and link to it from all other tables. This isolation allows you to run the data purge process on a few tables holding sensitive data, and avoid all other tables.
  3. The caller should make every attempt to batch the execution of .purge commands to 1-2 commands per table per day. Don’t issue multiple commands with unique user identity predicates. Instead, send a single command whose predicate includes all user identities that require purging.

Purge process

The process of selectively purging data happens in the following steps:

  1. Phase 1: Give an input with a table name and a per-record predicate, indicating which records to delete. Kusto scans the table looking to identify data extents that would participate in the data purge. The extents identified are those having one or more records for which the predicate returns true.

  2. Phase 2: (Soft Delete) Replace each data extent in the table (identified in step (1)) with a reingested version. The reingested version shouldn’t have the records for which the predicate returns true. If new data isn’t being ingested into the table, then by the end of this phase, queries will no longer return data for which the predicate returns true. The duration of the purge soft delete phase depends on the following parameters:

    • The number of records that must be purged
    • Record distribution across the data extents in the cluster
    • The number of nodes in the cluster
    • The spare capacity it has for purge operations
    • Several other factors

    The duration of phase 2 can vary between a few seconds to many hours.

  3. Phase 3: (Hard Delete) Work back all storage artifacts that may have the “poison” data, and delete them from storage. This phase is done at least five days after the completion of the previous phase, but no longer than 30 days after the initial command. These timelines are set to follow data privacy requirements.

Issuing a .purge command triggers this process, which takes a few days to complete. If the density of records for which the predicate applies is sufficiently large, the process will effectively reingest all the data in the table. This reingestion has a significant impact on performance and COGS (cost of goods sold).

Purge limitations and considerations

  • The purge process is final and irreversible. It isn’t possible to undo this process or recover data that has been purged. Commands such as undo table drop can’t recover purged data. Rollback of the data to a previous version can’t go to before the latest purge command.

  • Before running the purge, verify the predicate by running a query and checking that the results match the expected outcome. You can also use the two-step process that returns the expected number of records that will be purged.

  • The .purge command is executed against the Data Management endpoint: https://ingest-[YourClusterName].[region].kusto.windows.net. The command requires database admin permissions on the relevant databases.

  • Due to the purge process performance impact, and to guarantee that purge guidelines have been followed, the caller is expected to modify the data schema so that minimal tables include relevant data, and batch commands per table to reduce the significant COGS impact of the purge process.

  • The predicate parameter of the .purge command is used to specify which records to purge. Predicate size is limited to 1 MB. When constructing the predicate:

    • Use the ‘in’ operator, for example, where [ColumnName] in ('Id1', 'Id2', .. , 'Id1000').
    • Note the limits of the ‘in’ operator (list can contain up to 1,000,000 values).
    • If the query size is large, use externaldata operator, for example where UserId in (externaldata(UserId:string) ["https://...blob.core.windows.net/path/to/file?..."]). The file stores the list of IDs to purge.
    • The total query size, after expanding all externaldata blobs (total size of all blobs), can’t exceed 64 MB.

Purge performance

Only one purge request can be executed on the cluster, at any given time. All other requests are queued in Scheduled state. Monitor the purge request queue size, and keep within adequate limits to match the requirements applicable for your data.

To reduce purge execution time:

Trigger the purge process

Purge table TableName records command

Purge command may be invoked in two ways for differing usage scenarios:

  • Programmatic invocation: A single step that is intended to be invoked by applications. Calling this command directly triggers purge execution sequence.

    Syntax

    // Connect to the Data Management service
    #connect "https://ingest-[YourClusterName].[region].kusto.windows.net"
    
    // To purge table records
    .purge table [TableName] records in database [DatabaseName] with (noregrets='true') <| [Predicate]
    
     // To purge materialized view records
    .purge materialized-view [MaterializedViewName] records in database [DatabaseName] with (noregrets='true') <| [Predicate]
    
  • Human invocation: A two-step process that requires an explicit confirmation as a separate step. First invocation of the command returns a verification token, which should be provided to run the actual purge. This sequence reduces the risk of inadvertently deleting incorrect data.

[!NOTE] The first step in the two-step invocation requires running a query on the entire dataset, to identify records to be purged. This query may time-out or fail on large tables, especially with significant amount of cold cache data. In case of failures, validate the predicate yourself and after verifying correctness use the single-step purge with the noregrets option.

Syntax

// Connect to the Data Management service - this command only works in Kusto.Explorer
#connect "https://ingest-[YourClusterName].[region].kusto.windows.net"

// Step #1 - retrieve a verification token (no records will be purged until step #2 is executed)
.purge table [TableName] records in database [DatabaseName] <| [Predicate]

// Step #2 - input the verification token to execute purge
.purge table [TableName] records in database [DatabaseName] with (verificationtoken=h'<verification token from step #1>') <| [Predicate]

To purge a materialized view, replace the table keyword with materialized-view, and replace TableName with the MaterializedViewName.

ParametersDescription
DatabaseNameName of the database
TableName / MaterializedViewNameName of the table / materialized view to purge.
PredicateIdentifies the records to purge. See purge predicate limitations.
noregretsIf set, triggers a single-step activation.
verificationtokenIn the two-step activation scenario (noregrets isn’t set), this token can be used to execute the second step and commit the action. If verificationtoken isn’t specified, it will trigger the command’s first step. Information about the purge will be returned with a token that should be passed back to the command to do step #2.

Purge predicate limitations

  • The predicate must be a simple selection (for example, where [ColumnName] == ‘X’ / where [ColumnName] in (‘X’, ‘Y’, ‘Z’) and [OtherColumn] == ‘A’).
  • Multiple filters must be combined with an ‘and’, rather than separate where clauses (for example, where [ColumnName] == 'X' and OtherColumn] == 'Y' and not where [ColumnName] == 'X' | where [OtherColumn] == 'Y').
  • The predicate can’t reference tables other than the table being purged (TableName). The predicate can only include the selection statement (where). It can’t project specific columns from the table (output schema when running ‘table | Predicate’ must match table schema).
  • System functions (such as, ingestion_time(), extent_id()) aren’t supported.

Example: Two-step purge

To start purge in a two-step activation scenario, run step #1 of the command:

   // Connect to the Data Management service
   #connect "https://ingest-[YourClusterName].[region].kusto.windows.net"

   .purge table MyTable records in database MyDatabase <| where CustomerId in ('X', 'Y')

   .purge materialized-view MyView records in database MyDatabase <| where CustomerId in ('X', 'Y')

Output

NumRecordsToPurgeEstimatedPurgeExecutionTimeVerificationToken
1,59600:00:02e43c7184ed22f4f23c7a9d7b124d196be2e570096987e5baadf65057fa65736b

Then, validate the NumRecordsToPurge before running step #2.

To complete a purge in a two-step activation scenario, use the verification token returned from step #1 to run step #2:

.purge table MyTable records in database MyDatabase
 with(verificationtoken=h'e43c7....')
<| where CustomerId in ('X', 'Y')

.purge materialized-view MyView records in database MyDatabase
 with(verificationtoken=h'e43c7....')
<| where CustomerId in ('X', 'Y')

Output

OperationIdDatabaseNameTableNameScheduledTimeDurationLastUpdatedOnEngineOperationIdStateStateDetailsEngineStartTimeEngineDurationRetriesClientRequestIdPrincipal
c9651d74-3b80-4183-90bb-bbe9e42eadc4MyDatabaseMyTable2019-01-20 11:41:05.439168600:00:00.14062112019-01-20 11:41:05.4391686Scheduled0KE.RunCommand;1d0ad28b-f791-4f5a-a60f-0e32318367b7AAD app id=…

Example: Single-step purge

To trigger a purge in a single-step activation scenario, run the following command:

// Connect to the Data Management service
 #connect "https://ingest-[YourClusterName].[region].kusto.windows.net"

.purge table MyTable records in database MyDatabase with (noregrets='true') <| where CustomerId in ('X', 'Y')

.purge materialized-view MyView records in database MyDatabase with (noregrets='true') <| where CustomerId in ('X', 'Y')

Output

OperationIdDatabaseNameTableNameScheduledTimeDurationLastUpdatedOnEngineOperationIdStateStateDetailsEngineStartTimeEngineDurationRetriesClientRequestIdPrincipal
c9651d74-3b80-4183-90bb-bbe9e42eadc4MyDatabaseMyTable2019-01-20 11:41:05.439168600:00:00.14062112019-01-20 11:41:05.4391686Scheduled0KE.RunCommand;1d0ad28b-f791-4f5a-a60f-0e32318367b7AAD app id=…

Cancel purge operation command

If needed, you can cancel pending purge requests.

Syntax

 // Cancel of a single purge operation
 .cancel purge <OperationId>

  // Cancel of all pending purge requests in a database
 .cancel all purges in database <DatabaseName>

 // Cancel of all pending purge requests, for all databases
 .cancel all purges

Example: Cancel a single purge operation

 .cancel purge aa894210-1c60-4657-9d21-adb2887993e1

Output

The output of this command is the same as the ‘show purges OperationId’ command output, showing the updated status of the purge operation being canceled. If the attempt is successful, the operation state is updated to Canceled. Otherwise, the operation state isn’t changed.

OperationIdDatabaseNameTableNameScheduledTimeDurationLastUpdatedOnEngineOperationIdStateStateDetailsEngineStartTimeEngineDurationRetriesClientRequestIdPrincipal
c9651d74-3b80-4183-90bb-bbe9e42eadc4MyDatabaseMyTable2019-01-20 11:41:05.439168600:00:00.14062112019-01-20 11:41:05.4391686Canceled0KE.RunCommand;1d0ad28b-f791-4f5a-a60f-0e32318367b7AAD app id=…

Example: Cancel all pending purge operations in a database

 .cancel all purges in database MyDatabase

Output

The output of this command is the same as the show purges command output, showing all operations in the database with their updated status. Operations that were canceled successfully will have their status updated to Canceled. Otherwise, the operation state isn’t changed.

OperationIdDatabaseNameTableNameScheduledTimeDurationLastUpdatedOnEngineOperationIdStateStateDetailsEngineStartTimeEngineDurationRetriesClientRequestIdPrincipal
5a34169e-8730-49f5-9694-7fde3a7a0139MyDatabaseMyTable2021-03-03 05:07:29.705019800:00:00.29713312021-03-03 05:07:30.0021529Canceled0KE.RunCommand;1d0ad28b-f791-4f5a-a60f-0e32318367b7AAD app id=…
2fa7c04c-6364-4ce1-a5e5-1ab921f518f5MyDatabaseMyTable2021-03-03 05:05:03.503547800:00:00.14062112021-03-03 05:05:03.6441689InProgress0KE.RunCommand;1d0ad28b-f791-4f5a-a60f-0e32318367b7AAD app id=…

Track purge operation status

Status = ‘Completed’ indicates successful completion of the first phase of the purge operation, that is records are soft-deleted and are no longer available for querying. Customers aren’t expected to track and verify the second phase (hard-delete) completion. This phase is monitored internally.

Show purges command

Show purges command shows purge operation status by specifying the operation ID within the requested time period.

.show purges <OperationId>
.show purges [in database <DatabaseName>]
.show purges from '<StartDate>' [in database <DatabaseName>]
.show purges from '<StartDate>' to '<EndDate>' [in database <DatabaseName>]
PropertiesDescriptionMandatory/Optional
OperationIdThe Data Management operation ID outputted after executing single phase or second phase.Mandatory
StartDateLower time limit for filtering operations. If omitted, defaults to 24 hours before current time.Optional
EndDateUpper time limit for filtering operations. If omitted, defaults to current time.Optional
DatabaseNameDatabase name to filter results.Optional

Examples

.show purges
.show purges c9651d74-3b80-4183-90bb-bbe9e42eadc4
.show purges from '2018-01-30 12:00'
.show purges from '2018-01-30 12:00' to '2018-02-25 12:00'
.show purges from '2018-01-30 12:00' to '2018-02-25 12:00' in database MyDatabase

Output

OperationIdDatabaseNameTableNameScheduledTimeDurationLastUpdatedOnEngineOperationIdStateStateDetailsEngineStartTimeEngineDurationRetriesClientRequestIdPrincipal
c9651d74-3b80-4183-90bb-bbe9e42eadc4MyDatabaseMyTable2019-01-20 11:41:05.439168600:00:33.67821302019-01-20 11:42:34.6169153a0825d4d-6b0f-47f3-a499-54ac5681ab78CompletedPurge completed successfully (storage artifacts pending deletion)2019-01-20 11:41:34.648650600:00:04.46873100KE.RunCommand;1d0ad28b-f791-4f5a-a60f-0e32318367b7AAD app id=…
  • OperationId - the DM operation ID returned when executing purge.
  • DatabaseName** - database name (case sensitive).
  • TableName - table name (case sensitive).
  • ScheduledTime - time of executing purge command to the DM service.
  • Duration - total duration of the purge operation, including the execution DM queue wait time.
  • EngineOperationId - the operation ID of the actual purge executing in the engine.
  • State - purge state, can be one of the following values:
    • Scheduled - purge operation is scheduled for execution. If job remains Scheduled, there’s probably a backlog of purge operations. See purge performance to clear this backlog. If a purge operation fails on a transient error, it will be retried by the DM and set to Scheduled again (so you may see an operation transition from Scheduled to InProgress and back to Scheduled).
    • InProgress - the purge operation is in-progress in the engine.
    • Completed - purge completed successfully.
    • BadInput - purge failed on bad input and won’t be retried. This failure may be due to various issues such as a syntax error in the predicate, an illegal predicate for purge commands, a query that exceeds limits (for example, over 1M entities in an externaldata operator or over 64 MB of total expanded query size), and 404 or 403 errors for externaldata blobs.
    • Failed - purge failed and won’t be retried. This failure may happen if the operation was waiting in the queue for too long (over 14 days), due to a backlog of other purge operations or a number of failures that exceed the retry limit. The latter will raise an internal monitoring alert and will be investigated by the team.
  • StateDetails - a description of the State.
  • EngineStartTime - the time the command was issued to the engine. If there’s a large difference between this time and ScheduledTime, there’s usually a significant backlog of purge operations and the cluster isn’t keeping up with the pace.
  • EngineDuration - time of actual purge execution in the engine. If purge was retried several times, it’s the sum of all the execution durations.
  • Retries - number of times the operation was retried by the DM service due to a transient error.
  • ClientRequestId - client activity ID of the DM purge request.
  • Principal - identity of the purge command issuer.

Purging an entire table

Purging a table includes dropping the table, and marking it as purged so that the hard delete process described in Purge process runs on it. Dropping a table without purging it doesn’t delete all its storage artifacts. These artifacts are deleted according to the hard retention policy initially set on the table. The purge table allrecords command is quick and efficient and is preferable to the purge records process, if applicable for your scenario.

Purge table TableName allrecords command

Similar to ‘.purge table records ’ command, this command can be invoked in a programmatic (single-step) or in a manual (two-step) mode.

  1. Programmatic invocation (single-step):

    Syntax

    // Connect to the Data Management service
    #connect "https://ingest-[YourClusterName].[Region].kusto.windows.net"
    
    .purge table [TableName] in database [DatabaseName] allrecords with (noregrets='true')
    
  2. Human invocation (two-steps):

    Syntax

    
    // Connect to the Data Management service
    #connect "https://ingest-[YourClusterName].[Region].kusto.windows.net"
    
    // Step #1 - retrieve a verification token (the table will not be purged until step #2 is executed)
    
    .purge table [TableName] in database [DatabaseName] allrecords
    
    // Step #2 - input the verification token to execute purge
    .purge table [TableName] in database [DatabaseName] allrecords with (verificationtoken=h'<verification token from step #1>')
    
    ParametersDescription
    DatabaseNameName of the database.
    TableNameName of the table.
    noregretsIf set, triggers a single-step activation.
    verificationtokenIn two-step activation scenario (noregrets isn’t set), this token can be used to execute the second step and commit the action. If verificationtoken isn’t specified, it will trigger the command’s first step. In this step, a token is returned to pass back to the command and do step #2.

Example: Two-step purge

  1. To start purge in a two-step activation scenario, run step #1 of the command:

    // Connect to the Data Management service
     #connect "https://ingest-[YourClusterName].[Region].kusto.windows.net"
    
    .purge table MyTable in database MyDatabase allrecords
    

    Output

    VerificationToken
    e43c7184ed22f4f23c7a9d7b124d196be2e570096987e5baadf65057fa65736b
  2. To complete a purge in a two-step activation scenario, use the verification token returned from step #1 to run step #2:

    .purge table MyTable in database MyDatabase allrecords
    with (verificationtoken=h'eyJT.....')
    

    The output is the same as the ‘.show tables’ command output (returned without the purged table).

    Output

    TableNameDatabaseNameFolderDocString
    OtherTableMyDatabase

Example: Single-step purge

To trigger a purge in a single-step activation scenario, run the following command:

// Connect to the Data Management service
#connect "https://ingest-[YourClusterName].[Region].kusto.windows.net"

.purge table MyTable in database MyDatabase allrecords with (noregrets='true')

The output is the same as the ‘.show tables’ command output (returned without the purged table).

Output

TableNameDatabaseNameFolderDocString
OtherTableMyDatabase

1.3 - Delete data

This article describes delete scenarios, including purge, dropping extents and retention based deletes.

Delete data from a table is supported in several ways. Use the following information to help you choose which deletion method is best for your use case.

Use caseConsiderationsMethod
Delete all data from a table.Use the .clear table data command
Routinely delete old data.Use if you need an automated deletion solution.Use a retention policy
Bulk delete specific data by extents.Only use if you’re an expert user.Use the .drop extents command
Delete records based on their content.- Storage artifacts that contain the deleted records aren’t necessarily deleted.
- Deleted records can’t be recovered (regardless of any retention or recoverability settings).
- Use if you need a quick way to delete records.
Use soft delete
Delete records based on their content.- Storage artifacts that contain the deleted records are deleted.
- Deleted records can’t be recovered (regardless of any retention or recoverability settings).
- Requires significant system resources and time to complete.
Use purge
Use caseConsiderationsMethod
Delete all data from a table.Use the .clear table data command
Routinely delete old data.Use if you need an automated deletion solution.Use a retention policy
Bulk delete specific data by extents.Only use if you’re an expert user.Use the .drop extents command
Delete records based on their content.- Storage artifacts that contain the deleted records aren’t necessarily deleted.
- Deleted records can’t be recovered (regardless of any retention or recoverability settings).
- Use if you need a quick way to delete records.
Use soft delete

The following sections describe the different deletion methods.

Delete all data in a table

To delete all data in a table, use the .clear table data command. This command is the most efficient way to remove all data from a table.

Syntax:

.clear table <TableName> data

Delete data using a retention policy

Automatically delete data based on a retention policy. You can set the retention policy at the database or table level. There’s no guarantee as to when the deletion occurs, but it will not be deleted before the retention period. This is an efficient and convenient way to remove old data.

Consider a database or table that is set for 90 days of retention. If only 60 days of data are needed, delete the older data as follows:

.alter-merge database <DatabaseName> policy retention softdelete = 60d

.alter-merge table <TableName> policy retention softdelete = 60d

Delete data by dropping extents

Extent (data shard) is the internal structure where data is stored. Each extent can hold up to millions of records. Extents can be deleted individually or as a group using drop extent(s) commands.

Examples

You can delete all rows in a table or just a specific extent.

  • Delete all rows in a table:

    .drop extents from TestTable
    
  • Delete a specific extent:

    .drop extent e9fac0d2-b6d5-4ce3-bdb4-dea052d13b42
    

Delete individual rows

Both purge and soft delete can be used for deleting individual rows. Soft delete doesn’t necessarily delete the storage artifacts that contain records to delete, and purge does delete all such storage artifacts.

Both methods prevent deleted records from being recovered, regardless of any retention or recoverability settings. The deletion process is final and irreversible.

Soft delete

With soft delete, data isn’t necessarily deleted from storage artifacts. This method marks all matching records as deleted, so that they’ll be filtered out in queries, and doesn’t require significant system resources.

Purge

With purge, extents that have one or more records to be deleted, are replaced with new extents in which those records don’t exist. This deletion process isn’t immediate, requires significant system resources, and can take a whole day to complete.

Soft delete can be used for deleting individual rows. Data isn’t necessarily deleted from storage artifacts. Soft delete prevent deleted records from being recovered, regardless of any retention or recoverability settings. The deletion process is final and irreversible. This method marks all matching records as deleted, so that they’ll be filtered out in queries, and doesn’t require significant system resources.

1.4 - Follower commands

Learn how to use follower commands to manage your follower configuration.

Management commands for managing your follower configuration. These commands run synchronously but are applied on the next periodic schema refresh, which may result in a short delay until the new configuration is applied.

The follower commands include database level commands and table level commands.

Permissions

You must have at least Database Admin permissions to run this command.

Database policy overrides

A leader database can override the following database-level policies in the follower cluster: Caching policy and Authorized principals.

Caching policy

The default caching policy for the follower cluster uses the leader cluster database and table-level caching policies.

OptionDescription
NoneThe caching policies used are those policies defined in the source database in the leader cluster.
replaceThe source database in the leader cluster database and table-level caching policies are removed (set to null). These policies are replaced by the database and table-level override policies, if defined.
union(default)The source database in the leader cluster database and table-level caching policies are combined with the policies defined in the database and table-level override policies.

Authorized principals

OptionDescription
NoneThe authorized principals are defined in the source database of the leader cluster.
replaceThe override authorized principals replace the authorized principals from the source database in the leader cluster.
union(default)The override authorized principals are combined with the authorized principals from the source database in the leader cluster.

Table and materialized views policy overrides

By default, tables and materialized views in a database that is being followed by a follower cluster keep the source entity’s caching policy. However, table and materialized view caching policies can be overridden in the follower cluster. Use the replace option to override the source entity’s caching policy.

Database level commands

.show follower database

Shows a database (or databases) followed from other leader cluster, which have one or more database-level overrides configured.

Syntax

.show follower database DatabaseName

.show follower databases (DatabaseName1,,DatabaseNameN)

Output

Output parameterTypeDescription
DatabaseNamestringThe name of the database being followed.
LeaderClusterMetadataPathstringThe path to the leader cluster’s metadata container.
CachingPolicyOverridestringAn override caching policy for the database, serialized as JSON, or null.
AuthorizedPrincipalsOverridestringAn override collection of authorized principals for the database, serialized as JSON, or null.
AuthorizedPrincipalsModificationKindstringThe modification kind to apply using AuthorizedPrincipalsOverride (none, union, or replace).
CachingPoliciesModificationKindstringThe modification kind to apply using database or table-level caching policy overrides (none, union, or replace).
IsAutoPrefetchEnabledboolWhether new data is pre-fetched upon each schema refresh.
TableMetadataOverridesstringIf defined, A JSON serialization of table-level property overrides.

.alter follower database policy caching

Alters a follower database caching policy, to override the one set on the source database in the leader cluster.

Notes

Syntax

.alter follower database DatabaseName policy caching hot = HotDataSpan

Example

.alter follower database MyDb policy caching hot = 7d

.delete follower database policy caching

Deletes a follower database override caching policy. This deletion causes the policy set on the source database in the leader cluster the effective one.

Notes

Syntax

.delete follower database DatabaseName policy caching

Example

.delete follower database MyDB policy caching

.add follower database principals

Adds authorized principal(s) to the follower database collection of override authorized principals. Notes

Syntax

.add follower database DatabaseName (admins | users | viewers | monitors) Role (principal1,,principalN) ['notes']

Example

.add follower database MyDB viewers ('aadgroup=mygroup@microsoft.com') 'My Group'

.drop follower database principals

Drops authorized principal(s) from the follower database collection of override authorized principals.

Syntax

.drop follower database DatabaseName (admins | users | viewers | monitors) (principal1,,principalN)

Example

.drop follower database MyDB viewers ('aadgroup=mygroup@microsoft.com')

.alter follower database principals-modification-kind

Alters the follower database authorized principals modification kind.

Syntax

.alter follower database DatabaseName principals-modification-kind = (none | union | replace)

Example

.alter follower database MyDB principals-modification-kind = union

.alter follower database caching-policies-modification-kind

Alters the caching policies modification kind for the follower database, table, and materialized views.

Syntax

.alter follower database DatabaseName caching-policies-modification-kind = (none | union | replace)

Example

.alter follower database MyDB caching-policies-modification-kind = union

.alter follower database prefetch-extents

The follower cluster can wait for new data to be fetched from the underlying storage to the nodes’ SSD (cache) before making this data queryable.

The following command alters the follower database configuration of pre-fetching new extents upon each schema refresh.

Syntax

.alter follower database DatabaseName prefetch-extents = (true | false)

Example

.alter follower database MyDB prefetch-extents = false

Tables and materialized views commands

Alter follower table or materialized view caching policy

Alters a table’s or a materialized view’s caching policy on the follower database, to override the policy set on the source database in the leader cluster.

Syntax

.alter follower database DatabaseName table TableName policy caching hot = HotDataSpan

.alter follower database DatabaseName tables (TableName1,,TableNameN) policy caching hot = HotDataSpan

.alter follower database DatabaseName materialized-view ViewName policy caching hot = HotDataSpan

.alter follower database DatabaseName materialized-views (ViewName1,,ViewNameN) policy caching hot = HotDataSpan

Examples

.alter follower database MyDb tables (Table1, Table2) policy caching hot = 7d

.alter follower database MyDb materialized-views (View1, View2) policy caching hot = 7d

Delete follower table or materialized view caching policy

Deletes an override for a table’s or a materialized-view’s caching policy on the follower database. The policy set on the source database in the leader cluster will now be the effective policy.

Syntax

.delete follower database DatabaseName table TableName policy caching

.delete follower database DatabaseName tables (TableName1,,TableNameN) policy caching

.delete follower database DatabaseName materialized-view ViewName policy caching

.delete follower database DatabaseName materialized-views (ViewName1,,ViewNameN) policy caching

Example

.delete follower database MyDB tables (Table1, Table2) policy caching

.delete follower database MyDB materialized-views (View1, View2) policy caching

Sample configuration

The following are sample steps to configure a follower database.

In this example:

  • Our follower cluster, MyFollowerCluster will be following database MyDatabase from the leader cluster, MyLeaderCluster.

    • MyDatabase has N tables: MyTable1, MyTable2, MyTable3, … MyTableN (N > 3).
    • On MyLeaderCluster:
    MyTable1 caching policyMyTable2 caching policyMyTable3MyTableN caching policyMyDatabase Authorized principals
    hot data span = 7dhot data span = 30dhot data span = 365dViewers = aadgroup=scubadivers@contoso.com; Admins = aaduser=jack@contoso.com
    • On MyFollowerCluster we want:
    MyTable1 caching policyMyTable2 caching policyMyTable3MyTableN caching policyMyDatabase Authorized principals
    hot data span = 1dhot data span = 3dhot data span = 0d (nothing is cached)Admins = aaduser=jack@contoso.com, Viewers = aaduser=jill@contoso.com

Steps to execute

Prerequisite: Set up cluster MyFollowerCluster to follow database MyDatabase from cluster MyLeaderCluster.

Show the current configuration

See the current configuration according to which MyDatabase is being followed on MyFollowerCluster:

.show follower database MyDatabase
| evaluate narrow() // just for presentation purposes
ColumnValue
DatabaseNameMyDatabase
LeaderClusterMetadataPathhttps://storageaccountname.blob.core.windows.net/cluster
CachingPolicyOverridenull
AuthorizedPrincipalsOverride[]
AuthorizedPrincipalsModificationKindNone
IsAutoPrefetchEnabledFalse
TableMetadataOverrides
CachingPoliciesModificationKindUnion

Override authorized principals

Replace the collection of authorized principals for MyDatabase on MyFollowerCluster with a collection that includes only one Microsoft Entra user as the database admin, and one Microsoft Entra user as a database viewer:

.add follower database MyDatabase admins ('aaduser=jack@contoso.com')

.add follower database MyDatabase viewers ('aaduser=jill@contoso.com')

.alter follower database MyDatabase principals-modification-kind = replace

Only those two specific principals are authorized to access MyDatabase on MyFollowerCluster

.show database MyDatabase principals
RolePrincipalTypePrincipalDisplayNamePrincipalObjectIdPrincipalFQNNotes
Database MyDatabase AdminMicrosoft Entra userJack Kusto (upn: jack@contoso.com)12345678-abcd-efef-1234-350bf486087baaduser=87654321-abcd-efef-1234-350bf486087b;55555555-4444-3333-2222-2d7cd011db47
Database MyDatabase ViewerMicrosoft Entra userJill Kusto (upn: jack@contoso.com)abcdefab-abcd-efef-1234-350bf486087baaduser=54321789-abcd-efef-1234-350bf486087b;55555555-4444-3333-2222-2d7cd011db47
.show follower database MyDatabase
| mv-expand parse_json(AuthorizedPrincipalsOverride)
| project AuthorizedPrincipalsOverride.Principal.FullyQualifiedName
AuthorizedPrincipalsOverride_Principal_FullyQualifiedName
aaduser=87654321-abcd-efef-1234-350bf486087b;55555555-4444-3333-2222-2d7cd011db47
aaduser=54321789-abcd-efef-1234-350bf486087b;55555555-4444-3333-2222-2d7cd011db47

Override Caching policies

Replace the collection of database and table-level caching policies for MyDatabase on MyFollowerCluster by setting all tables to not have their data cached, excluding two specific tables - MyTable1, MyTable2 - that will have their data cached for periods of 1d and 3d, respectively:

.alter follower database MyDatabase policy caching hot = 0d

.alter follower database MyDatabase table MyTable1 policy caching hot = 1d

.alter follower database MyDatabase table MyTable2 policy caching hot = 3d

.alter follower database MyDatabase caching-policies-modification-kind = replace

Only those two specific tables have data cached, and the rest of the tables have a hot data period of 0d:

.show tables details
| summarize TableNames = make_list(TableName) by CachingPolicy
CachingPolicyTableNames
{“DataHotSpan”:{“Value”:“1.00:00:00”},“IndexHotSpan”:{“Value”:“1.00:00:00”}}[“MyTable1”]
{“DataHotSpan”:{“Value”:“3.00:00:00”},“IndexHotSpan”:{“Value”:“3.00:00:00”}}[“MyTable2”]
{“DataHotSpan”:{“Value”:“0.00:00:00”},“IndexHotSpan”:{“Value”:“0.00:00:00”}}[“MyTable3”,…,“MyTableN”]
.show follower database MyDatabase
| mv-expand parse_json(TableMetadataOverrides)
| project TableMetadataOverrides
TableMetadataOverrides
{“MyTable1”:{“CachingPolicyOverride”:{“DataHotSpan”:{“Value”:“1.00:00:00”},“IndexHotSpan”:{“Value”:“1.00:00:00”}}}}
{“MyTable2”:{“CachingPolicyOverride”:{“DataHotSpan”:{“Value”:“3.00:00:00”},“IndexHotSpan”:{“Value”:“3.00:00:00”}}}}

Summary

See the current configuration where MyDatabase is being followed on MyFollowerCluster:

.show follower database MyDatabase
| evaluate narrow() // just for presentation purposes
ColumnValue
DatabaseNameMyDatabase
LeaderClusterMetadataPathhttps://storageaccountname.blob.core.windows.net/cluster
CachingPolicyOverride{“DataHotSpan”:{“Value”:“00:00:00”},“IndexHotSpan”:{“Value”:“00:00:00”}}
AuthorizedPrincipalsOverride[{“Principal”:{“FullyQualifiedName”:“aaduser=87654321-abcd-efef-1234-350bf486087b”,…},{“Principal”:{“FullyQualifiedName”:“aaduser=54321789-abcd-efef-1234-350bf486087b”,…}]
AuthorizedPrincipalsModificationKindReplace
IsAutoPrefetchEnabledFalse
TableMetadataOverrides{“MyTargetTable”:{“CachingPolicyOverride”:{“DataHotSpan”:{“Value”:“3.00:00:00”}…},“MySourceTable”:{“CachingPolicyOverride”:{“DataHotSpan”:{“Value”:“1.00:00:00”},…}}}
CachingPoliciesModificationKindReplace

1.5 - Data soft delete

1.5.1 - Data soft delete

This article describes data soft delete.

The ability to delete individual records is supported. Record deletion is commonly achieved using one of the following methods:

  • To delete records with a system guarantee that the storage artifacts containing these records are deleted as well, use .purge
  • To delete records without such a guarantee, use .delete as described in this article - this command marks records as deleted but doesn’t necessarily delete the data from storage artifacts. This deletion method is faster than purge.

For information on how to use the command, see Syntax

Use cases

This deletion method should only be used for the unplanned deletion of individual records. For example, if you discover that an IoT device is reporting corrupt telemetry for some time, you should consider using this method to delete the corrupt data.

If you need to frequently delete records for deduplication or updates, we recommend using materialized views. See choose between materialized views and soft delete for data deduplication.

Deletion process

The soft delete process is performed using the following steps:

  1. Run predicate query: The table is scanned to identify data extents that contain records to be deleted. The extents identified are those with one or more records returned by the predicate query.
  2. Extents replacement: The identified extents are replaced with new extents that point to the original data blobs, and also have a new hidden column of type bool that indicates per record whether it was deleted or not. Once completed, if no new data is ingested, the predicate query won’t return any records if run again.

Limitations and considerations

  • The deletion process is final and irreversible. It isn’t possible to undo this process or recover data that has been deleted, even though the storage artifacts aren’t necessarily deleted following the operation.

  • Soft delete is supported for native tables and materialized views. It isn’t supported for external tables.

  • Before running soft delete, verify the predicate by running a query and checking that the results match the expected outcome. You can also run the command in whatif mode, which returns the number of records that are expected to be deleted.

  • Don’t run multiple parallel soft delete operations on the same table, as this may result in failures of some or all the commands. However, it’s possible to run multiple parallel soft delete operations on different tables.

  • Don’t run soft delete and purge commands on the same table in parallel. First wait for one command to complete and only then run the other command.

  • Soft delete is executed against your cluster URI: https://[YourClusterName].[region].kusto.windows.net. The command requires database admin permissions on the relevant database.

  • Deleting records from a table that is a source table of a materialized view, can have an impact on the materialized view. If records being deleted were not yet processed by the materialization cycle, these records will be missing in the view, since they will never be processed. Similarly, the deletion will not have an impact on the materialized view if the records have already been processed.

  • Limitations on the predicate:

    • It must contain at least one where operator.
    • It can only reference the table from which records are to be deleted.
    • Only the following operators are allowed: extend, order, project, take and where. Within toscalar(), the summarize operator is also allowed.

Deletion performance

The main considerations that can impact the deletion process performance are:

  • Run predicate query: The performance of this step is very similar to the performance of the predicate itself. It might be slightly faster or slower depending on the predicate, but the difference is expected to be insignificant.
  • Extents replacement: The performance of this step depends on the following:
    • Record distribution across the data extents in the cluster
    • The number of nodes in the cluster

Unlike .purge, the .delete command doesn’t reingest the data. It just marks records that are returned by the predicate query as deleted and is therefore much faster.

Query performance after deletion

Query performance isn’t expected to noticeably change following the deletion of records.

Performance degradation isn’t expected because the filter that is automatically added on all queries that filter out records that were deleted is efficient.

However, query performance is also not guaranteed to improve. While performance improvement may happen for some types of queries, it may not happen for some others. In order to improve query performance, extents in which most of the records are deleted are periodically compacted by replacing them with new extents that only contain the records that haven’t been deleted.

Impact on COGS (cost of goods sold)

In most cases, the deletion of records won’t result in a change of COGS.

  • There will be no decrease, because no records are actually deleted. Records are only marked as deleted using a hidden column of type bool, the size of which is negligible.
  • In most cases, there will be no increase because the .delete operation doesn’t require the provisioning of extra resources.
  • In some cases, extents in which the majority of the records are deleted are periodically compacted by replacing them with new extents that only contain the records that haven’t been deleted. This causes the deletion of the old storage artifacts that contain a large number of deleted records. The new extents are smaller and therefore consume less space in both the Storage account and in the hot cache. However, in most cases, the effect of this on COGS is negligible.

1.5.2 - Data soft delete command

This article describes the data soft delete commands.

To soft delete individual records without a system guarantee that the storage artifacts containing these records are deleted as well, use the following command. This command marks records as deleted but doesn’t necessarily delete the data from storage artifacts. For more information, see Soft delete.

To delete individual records with a system guarantee that the storage artifacts containing these records are deleted as well, see Data purge.

Syntax

.delete [async] table TableName records [with ( propertyName = propertyValue [, …])] <| Predicate

Parameters

NameTypeRequiredDescription
asyncstringIf specified, indicates that the command runs in asynchronous mode.
TableNamestring✔️The name of the table from which to delete records.
propertyName, propertyValuestringA comma-separated list of key-value property pairs. See supported properties.
Predicatestring✔️The predicate that returns records to delete, which is specified as a query. See note.

Supported properties

NameTypeDescription
whatifboolIf true, returns the number of records that will be deleted in every shard, without actually deleting any records. The default is false.

Returns

The output of the command contains information about which extents were replaced.

Example: delete records of a given user

To delete all the records that contain data of a given user:

.delete table MyTable records <| MyTable | where UserId == 'X'

Example: check how many records would be deleted from a table

To determine the number of records that would be deleted by the operation without actually deleting them, check the value in the RecordsMatchPredicate column when running the command in whatif mode:

.delete table MyTable records with (whatif=true) <| MyTable | where UserId == 'X'

.delete materialized-view records - soft delete command

When soft delete is executed on materialized views, the same concepts and limitations apply.

Syntax - materialized views

.delete [async] materialized-view MaterializedViewName records [with ( propertyName = propertyValue [, …])] <| Predicate

Parameters - materialized views

NameTypeRequiredDescription
asyncstringIf specified, indicates that the command runs in asynchronous mode.
MaterializedViewNamestring✔️The name of the materialized view from which to delete records.
propertyName, propertyValuestringA comma-separated list of key-value property pairs. See supported properties.
Predicatestring✔️The predicate that returns records to delete. Specified as a query.

Supported properties - materialized views

NameTypeDescription
whatifboolIf true, returns the number of records that will be deleted in every shard, without actually deleting any records. The default is false.

Example - materialized views

To delete all the materialized view records that contain data of a given user:

.delete materialized-view MyMaterializedView records <| MyMaterializedView | where UserId == 'X'

Example: check how many records would be deleted from a materialized view

To determine the number of records that would be deleted by the operation without actually deleting them, check the value in the RecordsMatchPredicate column while running the command in whatif mode:

.delete materialized-view MyMaterializedView records with (whatif=true) <| MyMaterializedView | where UserId == 'X'

1.6 - Extents (data shards)

1.6.1 - Extent tags

Learn how to create and use extent tags.

An extent tag is a string that describes properties common to all data in an extent. For example, during data ingestion, you can append an extent tag to signify the source of the ingested data. Then, you can use this tag for analysis.

Extents can hold multiple tags as part of their metadata. When extents merge, their tags also merge, ensuring consistent metadata representation.

To see the tags associated with an extent, use the .show extents command. For a granular view of tags associated with records within an extent, use the extent-tags() function.

drop-by extent tags

Tags that start with a drop-by: prefix can be used to control which other extents to merge with. Extents that have the same set of drop-by: tags can be merged together, but they won’t be merged with other extents if they have a different set of drop-by: tags.

Examples

Determine which extents can be merged together

If:

  • Extent 1 has the following tags: drop-by:blue, drop-by:red, green.
  • Extent 2 has the following tags: drop-by:red, yellow.
  • Extent 3 has the following tags: purple, drop-by:red, drop-by:blue.

Then:

  • Extents 1 and 2 won’t be merged together, as they have a different set of drop-by tags.
  • Extents 2 and 3 won’t be merged together, as they have a different set of drop-by tags.
  • Extents 1 and 3 can be merged together, as they have the same set of drop-by tags.

Use drop-by tags as part of extent-level operations

The following query issues a command to drop extents according to their drop-by: tag.

.ingest ... with @'{"tags":"[\"drop-by:2016-02-17\"]"}'

.drop extents <| .show table MyTable extents where tags has "drop-by:2016-02-17" 

ingest-by extent tags

Tags with the prefix ingest-by: can be used together with the ingestIfNotExists property to ensure that data is ingested only once.

The ingestIfNotExists property prevents duplicate ingestion by checking if an extent with the specified ingest-by: tag already exists. Typically, an ingest command contains an ingest-by: tag and the ingestIfNotExists property with the same value.

Examples

Add a tag on ingestion

The following command ingests the data and adds the tag ingest-by:2016-02-17.

.ingest ... with (tags = '["ingest-by:2016-02-17"]')

Prevent duplicate ingestion

The following command ingests the data so long as no extent in the table has the ingest-by:2016-02-17 tag.

.ingest ... with (ingestIfNotExists = '["2016-02-17"]')

Prevent duplicate ingestion and add a tag to any new data

The following command ingests the data so long as no extent in the table has the ingest-by:2016-02-17 tag. Any newly ingested data gets the ingest-by:2016-02-17 tag.

.ingest ... with (ingestIfNotExists = '["2016-02-17"]', tags = '["ingest-by:2016-02-17"]')

Limitations

  • Extent tags can only be applied to records within an extent. Consequently, tags can’t be set on streaming ingestion data before it is stored in extents.
  • Extent tags can’t be stored on data in external tables or materialized views.

1.6.2 - Extents (data shards)

This article describes Extents (data shards).

Tables are partitioned into extents, or data shards. Each extent is a horizontal segment of the table that contains data and metadata such as its creation time and optional tags. The union of all these extents contains the entire dataset of the table. Extents are evenly distributed across nodes in the cluster, and they’re cached in both local SSD and memory for optimized performance.

Extents are immutable, meaning they can be queried, reassigned to a different node, or dropped out of the table but never modified. Data modification happens by creating new extents and transactionally swapping old extents with the new ones. The immutability of extents provides benefits such as increased robustness and easy reversion to previous snapshots.

Extents hold a collection of records that are physically arranged in columns, enabling efficient encoding and compression of the data. To maintain query efficiency, smaller extents are merged into larger extents according to the configured merge policy and sharding policy. Merging extents reduces management overhead and leads to index optimization and improved compression.

The common extent lifecycle is as follows:

  1. The extent is created by an ingestion operation.
  2. The extent is merged with other extents.
  3. The merged extent (possibly one that tracks its lineage to other extents) is eventually dropped because of a retention policy.

Extent creation time

Two datetime values are tracked per extent: MinCreatedOn and MaxCreatedOn. These values are initially the same but may change when the extent is merged with other extents. When the extent is merged with other extents, the new values are according to the original minimum and maximum values of the merged extents.

The creation time of an extent is used for the following purposes:

  • Retention: Extents created earlier are dropped earlier.
  • Caching: Extents created recently are kept in hot cache.
  • Sampling: Recent extents are preferred when using query operations such as take.

To overwrite the creation time of an extent, provide an alternate creationTime in the data ingestion properties. This can be useful for retention purposes, such as if you want to reingest data but don’t want it to appear as if it arrived late.

1.6.3 - Extents commands

2 - Cross-cluster schema

3 - Continuous data export

3.1 - .export to SQL

This article describes Export data to SQL.

Export data to SQL allows you to run a query and have its results sent to a table in an SQL database, such as an SQL database hosted by the Azure SQL Database service.

Permissions

You must have at least Table Admin permissions to run this command.

Syntax

.export [async] to sql sqlTableName sqlConnectionString [with (propertyName = propertyValue [, …])] <| query

Parameters

NameTypeRequiredDescription
asyncstringIf specified, the command runs asynchronously.
SqlTableNamestring✔️The name of the SQL database table into which to insert the data. To protect against injection attacks, this name is restricted.
SqlConnectionStringstring✔️The connection string for the SQL endpoint and database. The string must follow the ADO.NET connection string format. For security reasons, the connection string is restricted.
PropertyName, PropertyValuestringA list of optional properties.

Supported properties

NameValuesDescription
firetriggerstrue or falseIf true, instructs the target system to fire INSERT triggers defined on the SQL table. The default is false. For more information, see BULK INSERT and System.Data.SqlClient.SqlBulkCopy.
createifnotexiststrue or falseIf true, the target SQL table is created if it doesn’t already exist; the primarykey property must be provided in this case to indicate the result column that is the primary key. The default is false.
primarykeyIf createifnotexists is true, this property indicates the name of the column in the result that is used as the SQL table’s primary key if it’s created by this command.
persistDetailsboolIndicates that the command should persist its results (see async flag). Defaults to true in async runs, but can be turned off if the caller doesn’t require the results. Defaults to false in synchronous executions, but can be turned on.
tokenstringThe Microsoft Entra access token that Kusto forwards to the SQL endpoint for authentication. When set, the SQL connection string shouldn’t include authentication information like Authentication, User ID, or Password.

Authentication and authorization

The authentication method is based on the connection string provided, and the permissions required to access the SQL database vary depending on the authentication method.

The supported authentication methods for exporting data to SQL are Microsoft Entra integrated (impersonation) authentication and username/password authentication. For impersonation authentication, be sure that the principal has the following permissions on the database:

  • Existing table: table UPDATE and INSERT
  • New table: CREATE, UPDATE, and INSERT

Limitations and restrictions

There are some limitations and restrictions when exporting data to an SQL database:

  1. Kusto is a cloud service, so the connection string must point to a database that is accessible from the cloud. (In particular, one can’t export to an on-premises database since it’s not accessible from the public cloud.)

  2. Kusto supports Active Directory Integrated authentication when the calling principal is a Microsoft Entra principal (aaduser= or aadapp=). Alternatively, Kusto also supports providing the credentials for the SQL database as part of the connection string. Other methods of authentication aren’t supported. The identity being presented to the SQL database always emanates from the command caller not the Kusto service identity itself.

  3. If the target table in the SQL database exists, it must match the query result schema. In some cases, such as Azure SQL Database, this means that the table has one column marked as an identity column.

  4. Exporting large volumes of data might take a long time. It’s recommended that the target SQL table is set for minimal logging during bulk import. See SQL Server Database Engine > … > Database Features > Bulk Import and Export of Data.

  5. Data export is performed using SQL bulk copy and provides no transactional guarantees on the target SQL database. See Transaction and Bulk Copy Operations.

  6. The SQL table name is restricted to a name consisting of letters, digits, spaces, underscores (_), dots (.) and hyphens (-).

  7. The SQL connection string is restricted as follows: Persist Security Info is explicitly set to false, Encrypt is set to true, and Trust Server Certificate is set to false.

  8. The primary key property on the column can be specified when creating a new SQL table. If the column is of type string, then SQL might refuse to create the table due to other limitations on the primary key column. The workaround is to manually create the table in SQL before exporting the data. This limitation exists because primary key columns in SQL can’t be of unlimited size, but Kusto table columns don’t have declared size limitations.

Azure database Microsoft Entra integrated authentication Documentation

Examples

Asynchronous export to SQL table

In the following example, Kusto runs the query and then exports the first record set produced by the query to the MySqlTable table in the MyDatabase database in server myserver.

.export async to sql MySqlTable
    h@"Server=tcp:myserver.database.windows.net,1433;Authentication=Active Directory Integrated;Initial Catalog=MyDatabase;Connection Timeout=30;"
    <| print Id="d3b68d12-cbd3-428b-807f-2c740f561989", Name="YSO4", DateOfBirth=datetime(2017-10-15)

Export to SQL table if it doesn’t exist

In the following example, Kusto runs the query and then exports the first record set produced by the query to the MySqlTable table in the MyDatabase database in server myserver. The target table is created if it doesn’t exist in the target database.

.export async to sql ['dbo.MySqlTable']
    h@"Server=tcp:myserver.database.windows.net,1433;Authentication=Active Directory Integrated;Initial Catalog=MyDatabase;Connection Timeout=30;"
    with (createifnotexists="true", primarykey="Id")
    <| print Message = "Hello World!", Timestamp = now(), Id=12345678

3.2 - .export to storage

Learn how to export data to cloud storage.

Executes a query and writes the first result set to an external cloud storage, specified by a storage connection string.

Permissions

You must have at least Database Viewer permissions to run this command.

Syntax

.export [async] [compressed] to OutputDataFormat ( StorageConnectionString [, …] ) [with ( PropertyName = PropertyValue [, …] )] <| Query

Parameters

NameTypeRequiredDescription
asyncstringIf specified, the command runs in asynchronous mode. See asynchronous mode.
compressedboolIf specified, the output storage artifacts are compressed in the format specified by the compressionType supported property.
OutputDataFormatstring✔️The data format of the storage artifacts written by the command. Supported values are: csv, tsv, json, and parquet.
StorageConnectionStringstringOne or more storage connection strings that specify which storage to write the data to. More than one storage connection string might be specified for scalable writes. Each such connection string must specify the credentials to use when writing to storage. For example, when writing to Azure Blob Storage, the credentials can be the storage account key, or a shared access key (SAS) with the permissions to read, write, and list blobs.
When you export data to CSV files using a DFS endpoint, the data goes through a DFS managed private endpoint.
When you export data to parquet files, the data goes through a blob managed private endpoint.
PropertyName, PropertyValuestringA comma-separated list of key-value property pairs. See supported properties.

Supported properties

PropertyTypeDescription
includeHeadersstringFor csv/tsv output, controls the generation of column headers. Can be one of none (default; no header lines emitted), all (emit a header line into every storage artifact), or firstFile (emit a header line into the first storage artifact only).
fileExtensionstringThe “extension” part of the storage artifact (for example, .csv or .tsv). If compression is used, .gz is appended as well.
namePrefixstringThe prefix to add to each generated storage artifact name. A random prefix is used if left unspecified.
encodingstringThe encoding for text. Possible values include: UTF8NoBOM (default) or UTF8BOM.
compressionTypestringThe type of compression to use. For non-Parquet files, only gzip is allowed. For Parquet files, possible values include gzip, snappy, lz4_raw, brotli, and zstd. Default is gzip.
distributionstringDistribution hint (single, per_node, per_shard). If value equals single, a single thread writes to storage. Otherwise, export writes from all nodes executing the query in parallel. See evaluate plugin operator. Defaults to per_shard.
persistDetailsboolIf true, the command persists its results (see async flag). Defaults to true in async runs, but can be turned off if the caller doesn’t require the results. Defaults to false in synchronous executions, but can be turned on.
sizeLimitlongThe size limit in bytes of a single storage artifact written before compression. Valid range: 100 MB (default) to 4 GB.
parquetRowGroupSizeintRelevant only when data format is Parquet. Controls the row group size in the exported files. Default row group size is 100,000 records.
distributedboolDisable or enable distributed export. Setting to false is equivalent to single distribution hint. Default is true.
parquetDatetimePrecisionstringThe precision to use when exporting datetime values to Parquet. Possible values are millisecond and microsecond. Default is millisecond.

Authentication and authorization

The authentication method is based on the connection string provided, and the permissions required vary depending on the authentication method.

The following table lists the supported authentication methods and the permissions needed for exporting data to external storage by storage type.

Authentication methodAzure Blob Storage / Data Lake Storage Gen2Data Lake Storage Gen1
ImpersonationStorage Blob Data ContributorContributor
Shared Access (SAS) tokenWriteWrite
Microsoft Entra access tokenNo extra permissions requiredNo extra permissions required
Storage account access keyNo extra permissions requiredNo extra permissions required

Returns

The commands return a table that describes the generated storage artifacts. Each record describes a single artifact and includes the storage path to the artifact and how many records it holds.

PathNumRecords
http://storage1.blob.core.windows.net/containerName/export_1_d08afcae2f044c1092b279412dcb571b.csv10
http://storage1.blob.core.windows.net/containerName/export_2_454c0f1359e24795b6529da8a0101330.csv15

Asynchronous mode

If the async flag is specified, the command executes in asynchronous mode. In this mode, the command returns immediately with an operation ID, and data export continues in the background until completion. The operation ID returned by the command can be used to track its progress and ultimately its results via the following commands:

For example, after a successful completion, you can retrieve the results using:

.show operation f008dc1e-2710-47d8-8d34-0d562f5f8615 details

Examples

In this example, Kusto runs the query and then exports the first recordset produced by the query to one or more compressed CSV blobs, up to 1 GB before compression. Column name labels are added as the first row for each blob.

.export
  async compressed
  to csv (
    h@"https://storage1.blob.core.windows.net/containerName;secretKey",
    h@"https://storage1.blob.core.windows.net/containerName2;secretKey"
  ) with (
    sizeLimit=1000000000,
    namePrefix="export",
    includeHeaders="all",
    encoding="UTF8NoBOM"
  )
  <| 
  Logs | where id == "1234" 

Failures during export commands

Export commands can transiently fail during execution. Continuous export automatically retries the command. Regular export commands (export to storage, export to external table) don’t perform any retries.

  • When the export command fails, artifacts already written to storage aren’t deleted. These artifacts remain in storage. If the command fails, assume the export is incomplete, even if some artifacts were written.
  • The best way to track both completion of the command and the artifacts exported upon successful completion is by using the .show operations and .show operation details commands.

Storage failures

By default, export commands are distributed such that there might be many concurrent writes to storage. The level of distribution depends on the type of export command:

  • The default distribution for regular .export command is per_shard, which means all extents that contain data to export write to storage concurrently.

  • The default distribution for export to external table commands is per_node, which means the concurrency is the number of nodes.

When the number of extents/nodes is large, this might lead to high load on storage that results in storage throttling, or transient storage errors. The following suggestions might overcome these errors (by order of priority):

  • Increase the number of storage accounts provided to the export command or to the external table definition. The load is evenly distributed between the accounts.

  • Reduce the concurrency by setting the distribution hint to per_node (see command properties).

  • Reduce concurrency of number of nodes exporting by setting the client request property query_fanout_nodes_percent to the desired concurrency (percent of nodes). The property can be set as part of the export query. For example, the following command limits the number of nodes writing to storage concurrently to 50% of the nodes:

    .export async  to csv
        ( h@"https://storage1.blob.core.windows.net/containerName;secretKey" ) 
        with
        (
            distribution="per_node"
        ) 
        <| 
        set query_fanout_nodes_percent = 50;
        ExportQuery
    
  • Reduce concurrency of number of threads exporting in each node when using per shard export, by setting the client request property query_fanout_threads_percent to the desired concurrency (percent of threads). The property can be set as part of the export query. For example, the following command limits the number of threads writing to storage concurrently to 50% on each of the nodes:

    .export async  to csv
        ( h@"https://storage1.blob.core.windows.net/containerName;secretKey" ) 
        with
        (
            distribution="per_shard"
        ) 
        <| 
        set query_fanout_threads_percent = 50;
        ExportQuery
    
  • If exporting to a partitioned external table, setting the spread/concurrency properties can reduce concurrency (see details in the command properties.

  • If neither of the previous recommendations work, you can completely disable distribution by setting the distributed property to false. However, we don’t recommend doing so, as it might significantly affect the command performance.

Authorization failures

Authentication or authorization failures during export commands can occur when the credentials provided in the storage connection string aren’t permitted to write to storage. If you’re using impersonate or a user-delegated SAS token for the export command, the Storage Blob Data Contributor role is required to write to the storage account. For more information, see Storage connection strings.

Data types mapping

Parquet data types mapping

On export, Kusto data types are mapped to Parquet data types using the following rules:

Kusto Data TypeParquet Data TypeParquet AnnotationComments
boolBOOLEAN
datetimeINT64TIMESTAMP_MICROS
dynamicBYTE_ARRAYUTF-8Serialized as JSON string
guidBYTE_ARRAYUTF-8
intINT32
longINT64
realDOUBLE
stringBYTE_ARRAYUTF-8
timespanINT64Stored as ticks (100-nanosecond units) count
decimalFIXED_LENGTH_BYTE_ARRAYDECIMAL

3.3 - .export to table

This article describes Export data to an external table.

You can export data by defining an external table and exporting data to it. The table properties are specified when creating the external table. The export command references the external table by name.

Permissions

You must have at least Table Admin permissions to run this command.

Syntax

.export [async] to table externalTableName
[with (propertyName = propertyValue [, …])] <| query

Parameters

NameTypeRequiredDescription
externalTableNamestring✔️The name of the external table to which to export.
propertyName, propertyValuestringA comma-separated list of optional properties.
querystring✔️The export query.

Supported properties

The following properties are supported as part of the export to external table command.

PropertyTypeDescriptionDefault
sizeLimitlongThe size limit in bytes of a single storage artifact written before compression. A full row group of size parquetRowGroupSize is written before checking whether this row group reaches the size limit and should start a new artifact. Valid range: 100 MB (default) to 1 GB.
distributedboolDisable or enable distributed export. Setting to false is equivalent to single distribution hint.true
distributionstringDistribution hint (single, per_node, per_shard). See more details in Distribution settingsper_node
distributionKindstringOptionally switches to uniform distribution when the external table is partitioned by string partition. Valid values are uniform or default. See more details in Distribution settings
concurrencyNumberHints the system how many partitions to run in parallel. See more details in Distribution settings16
spreadNumberHints the system how to distribute the partitions among nodes. See more details in Distribution settingsMin(64, number-of-nodes)
parquetRowGroupSizeintRelevant only when data format is Parquet. Controls the row group size in the exported files. This value takes precedence over sizeLimit, meaning a full row group will be exported before checking whether this row group reaches the size limit and should start a new artifact.100,000

Distribution settings

The distribution of an export to external table operation indicates the number of nodes and threads that are writing to storage concurrently. The default distribution depends on the external table partitioning:

External table partitioningDefault distribution
External table isn’t partitioned, or partitioned by datetime column onlyExport is distributed per_node - all nodes are exporting concurrently. Each node writes the data assigned to that node. The number of files exported by a node is greater than one, only if the size of the data from that node exceeds sizeLimit.
External table is partitioned by a string columnThe data to export is moved between the nodes, such that each node writes a subset of the partition values. A single partition is always written by a single node. The number of files written per partition should be greater than one only if the data exceeds sizeLimit. If the external table includes several string partitions, then data is partitioned between the node based on the first partition. Therefore, the recommendation is to define the partition with most uniform distribution as the first one.

Change the default distribution settings

Changing the default distribution settings can be useful in the following cases:

Use caseDescriptionRecommendation
Reduce the number of exported filesExport is creating too many small files, and you would like it to create a smaller number of larger files.Set distribution=single or distributed=false (both are equivalent) in the command properties. Only a single thread performs the export. The downside of this is that the export operation can be slower, as concurrency is much reduced.
Reduce the export durationIncreasing the concurrency of the export operation, to reduce its duration.Set distribution=per_shard in the command properties. Doing so means concurrency of the write operations is per data shard, instead of per node. This is only relevant when exporting to an external table that isn’t partitioned by string partition. This might create too much load on storage, potentially resulting in throttling. See Storage failures.
Reduce the export duration for external tables that are partitioned by a string partitionIf the partitions aren’t uniformly distributed between the nodes, export might take a longer time to run. If one partition is much larger than the others, the node assigned to that partition does most of the export work, while the other nodes remain mostly idle. For more information, see Distribution settings.There are several settings you can change:
* If there’s more than one string partition, define the one with best distribution first.

* Set distributionKind=uniform in the command properties. This setting disables the default distribution settings for string-partitioned external tables. Export runs with per-node distribution and each node exports the data assigned to the node. A single partition might be written by several nodes, and the number of files increases accordingly. To increase concurrency even further, set distributionKind=uniform along with distribution=per_shard for highest concurrency (at the cost of potentially many more files written)

* If the cause for slow export isn’t outliers in the data, reduce duration by increasing concurrency, without changing partitioning settings. Use the hint.spread and hint.concurrency properties, which determine the concurrency of the partitioning. See partition operator. By default, the number of nodes exporting concurrently (the spread) is the minimum value between 64 and the number of nodes. Setting spread to a higher number than number of nodes increases the concurrency on each node (max value for spread is 64).

Authentication and authorization

In order to export to an external table, you must set up write permissions. For more information, see the Write permissions for Azure Storage external table or SQL Server external table.

Output

Output parameterTypeDescription
ExternalTableNamestringThe name of the external table.
PathstringOutput path.
NumRecordsstringNumber of records exported to path.

Notes

  • The export query output schema must match the schema of the external table, including all columns defined by the partitions. For example, if the table is partitioned by DateTime, the query output schema must have a Timestamp column matching the TimestampColumnName. This column name is defined in the external table partitioning definition.

  • It isn’t possible to override the external table properties using the export command. For example, you can’t export data in Parquet format to an external table whose data format is CSV.

  • If the external table is partitioned, exported artifacts are written to their respective directories according to the partition definitions. For an example, see partitioned external table example.

    • If a partition value is null/empty or is an invalid directory value, per the definitions of the target storage, the partition value is replaced with a default value of __DEFAULT_PARTITION__.
  • For suggestions to overcome storage errors during export commands, see failures during export commands.

  • External table columns are mapped to suitable target format data types, according to data types mapping rules.

  • Parquet native export is a more performant, resource light export mechanism. An exported datetime column is currently unsupported by Synapse SQL COPY.

Number of files

The number of files written per partition depends on the distribution settings of the export operation:

  • If the external table includes datetime partitions only, or no partitions at all, the number of files written for each partition that exists, should be similar to the number of nodes (or more, if sizeLimit is reached). When the export operation is distributed, all nodes export concurrently. To disable distribution, so that only a single node does the writes, set distributed to false. This process creates fewer files, but reduces the export performance.

  • If the external table includes a partition by a string column, the number of exported files should be a single file per partition (or more, if sizeLimit is reached). All nodes still participate in the export (operation is distributed), but each partition is assigned to a specific node. Setting distributed to false, causes only a single node to do the export, but behavior remains the same (a single file written per partition).

Examples

Non-partitioned external table example

The following example exports data from table T to the ExternalBlob table. ExternalBlob is a non-partitioned external table.

.export to table ExternalBlob <| T

Output

ExternalTableNamePathNumRecords
ExternalBlobhttp://storage1.blob.core.windows.net/externaltable1cont1/1_58017c550b384c0db0fea61a8661333e.csv10

Partitioned external table example

The following example first creates a partitioned external table, PartitionedExternalBlob with a specified blob storage location. The data is stored in CSV format with a path format which organizes the data by customer name and date.

.create external table PartitionedExternalBlob (Timestamp:datetime, CustomerName:string) 
kind=blob
partition by (CustomerName:string=CustomerName, Date:datetime=startofday(Timestamp))   
pathformat = ("CustomerName=" CustomerName "/" datetime_pattern("yyyy/MM/dd", Date))   
dataformat=csv
( 
   h@'http://storageaccount.blob.core.windows.net/container1;secretKey'
)

It then exports data from table T to the PartitionedExternalBlob external table.

.export to table PartitionedExternalBlob <| T

Output

ExternalTableNamePathNumRecords
ExternalBlobhttp://storageaccount.blob.core.windows.net/container1/CustomerName=customer1/2019/01/01/fa36f35c-c064-414d-b8e2-e75cf157ec35_1_58017c550b384c0db0fea61a8661333e.csv10
ExternalBlobhttp://storageaccount.blob.core.windows.net/container1/CustomerName=customer2/2019/01/01/fa36f35c-c064-414d-b8e2-e75cf157ec35_2_b785beec2c004d93b7cd531208424dc9.csv10

If the command is executed asynchronously by using the async keyword, the output is available using the show operation details command.

3.4 - Data export

Learn how to export data.

Data export involves executing a Kusto query and saving its results. This process can be carried out either on the client side or the service side.

For examples on data export, see Related content.

Client-side export

Client-side export gives you control over saving query results either to the local file system or pushing them to a preferred storage location. This flexibility is facilitated by using Kusto client libraries. You can create an app to run queries, read the desired data, and implement an export process tailored to your requirements.

Alternatively, you can use a client tool like the Azure Data Explorer web UI to export data from your Kusto cluster. For more information, see Share queries.

Service-side export (pull)

Use the ingest from query commands to pull query results into a table in the same or different database. See the performance tips before using these commands.

Service-side export (push)

For scalable data export, the service offers various .export management commands to push query results to cloud storage, an external table, or an SQL table. This approach enhances scalability by avoiding the bottleneck of streaming through a single network connection.

Continuous data export is supported for export to external tables.

3.5 - Continuous data export

3.5.1 - .create or alter continuous-export

This article describes how to create or alter continuous data export.

Creates or alters a continuous export job.

Permissions

You must have at least Database Admin permissions to run this command.

Syntax

.create-or-alter continuous-export continuousExportName [over (T1, T2 )] to table externalTableName [with (propertyName = propertyValue [, …])] <| query

Parameters

NameTypeRequiredDescription
continuousExportNamestring✔️The name of the continuous export. Must be unique within the database.
externalTableNamestring✔️The name of the external table export target.
querystring✔️The query to export.
T1, T2stringA comma-separated list of fact tables in the query. If not specified, all tables referenced in the query are assumed to be fact tables. If specified, tables not in this list are treated as dimension tables and aren’t scoped, so all records participate in all exports. See continuous data export overview for details.
propertyName, propertyValuestringA comma-separated list of optional properties.

Supported properties

PropertyTypeDescription
intervalBetweenRunsTimespanThe time span between continuous export executions. Must be greater than 1 minute.
forcedLatencyTimespanAn optional period of time to limit the query to records ingested before a specified period relative to the current time. This property is useful if, for example, the query performs some aggregations or joins, and you want to make sure all relevant records have been ingested before running the export.
sizeLimitlongThe size limit in bytes of a single storage artifact written before compression. Valid range: 100 MB (default) to 1 GB.
distributedboolDisable or enable distributed export. Setting to false is equivalent to single distribution hint. Default is true.
parquetRowGroupSizeintRelevant only when data format is Parquet. Controls the row group size in the exported files. Default row group size is 100,000 records.
managedIdentitystringThe managed identity for which the continuous export job runs. The managed identity can be an object ID, or the system reserved word. For more information, see Use a managed identity to run a continuous export job.
isDisabledboolDisable or enable the continuous export. Default is false.

Example

The following example creates or alters a continuous export MyExport that exports data from the T table to ExternalBlob. The data exports occur every hour, and have a defined forced latency and size limit per storage artifact.

.create-or-alter continuous-export MyExport
over (T)
to table ExternalBlob
with
(intervalBetweenRuns=1h, 
 forcedLatency=10m, 
 sizeLimit=104857600)
<| T
NameExternalTableNameQueryForcedLatencyIntervalBetweenRunsCursorScopedTablesExportProperties
MyExportExternalBlobS00:10:0001:00:00[
“[‘DB’].[‘S’]"
]
{
“SizeLimit”: 104857600
}

3.5.2 - .drop continuous-export

This article describes how to drop continuous data export.

Drops a continuous-export job.

Permissions

You must have at least Database Admin permissions to run this command.

Syntax

.drop continuous-export ContinuousExportName

Parameters

NameTypeRequiredDescription
ContinuousExportNamestring✔️The name of the continuous export.

Returns

The remaining continuous exports in the database (post deletion). Output schema as in the show continuous export command.

3.5.3 - .show continuous data-export failures

This article describes how to show continuous data export failures.

Returns all failures logged as part of the continuous export within the past 14 days. To view only a specific time range, filter the results by the Timestamp column.

The command doesn’t return any results if executed on a follower database, it must be executed against the leader database.

The command doesn’t return any results if executed on a database shortcut, it must be executed against the leader database.

Permissions

You must have at least Database Monitor or Database Admin permissions to run this command. For more information, see role-based access control.

Syntax

.show continuous-export ContinuousExportName failures

Parameters

NameTypeRequiredDescription
ContinuousExportNamestring✔️The name of the continuous export.

Returns

Output parameterTypeDescription
TimestampdatetimeTimestamp of the failure.
OperationIdstringOperation ID of the failure.
NamestringContinuous export name.
LastSuccessRunTimestampThe last successful run of the continuous export.
FailureKindstringFailure/PartialFailure. PartialFailure indicates some artifacts were exported successfully before the failure occurred.
DetailsstringFailure error details.

Example

The following example shows failures from the continuous export MyExport.

.show continuous-export MyExport failures 

Output

TimestampOperationIdNameLastSuccessRunFailureKindDetails
2019-01-01 11:07:41.1887304ec641435-2505-4532-ba19-d6ab88c96a9dMyExport2019-01-01 11:06:35.6308140FailureDetails…

3.5.4 - .show continuous-export

This article describes how to show continuous data export properties.

Returns the properties of a specified continuous export or all continuous exports in the database.

Permissions

You must have at least Database User, Database Viewer, or Database Monitor permissions to run this command. For more information, see role-based access control.

Syntax

.show continuous-export ContinuousExportName

.show continuous-exports

Parameters

NameTypeRequiredDescription
ContinuousExportNamestring✔️The name of the continuous export.

Returns

Output parameterTypeDescription
CursorScopedTablesstringThe list of explicitly scoped (fact) tables (JSON serialized).
ExportPropertiesstringThe export properties (JSON serialized).
ExportedTodatetimeThe last datetime (ingestion time) that was exported successfully.
ExternalTableNamestringThe external table name.
ForcedLatencytimeSpanThe forced latency timespan, if defined. Returns Null if no timespan is defined.
IntervalBetweenRunstimeSpanThe interval between runs.
IsDisabledboolA boolean value indicating whether the continuous export is disabled.
IsRunningboolA boolean value indicating whether the continuous export is currently running.
LastRunResultstringThe results of the last continuous-export run (Completed or Failed).
LastRunTimedatetimeThe last time the continuous export was executed (start time)
NamestringThe name of the continuous export.
QuerystringThe export query.
StartCursorstringThe starting point of the first execution of this continuous export.

3.5.5 - .show continuous-export exported-artifacts

This article describes how to show continuous data export artifacts.

Returns all artifacts exported by the continuous-export in all runs. Filter the results by the Timestamp column in the command to view only records of interest. The history of exported artifacts is retained for 14 days.

The command doesn’t return any results if executed on a follower database, it must be executed against the leader database.

The command doesn’t return any results if executed on a database shortcut, it must be executed against the leader database.

Permissions

You must have at least Database Monitor or Database Admin permissions to run this command. For more information, see role-based access control.

Syntax

.show continuous-export ContinuousExportName exported-artifacts

Parameters

NameTypeRequiredDescription
ContinuousExportNamestring✔️The name of the continuous export.

Returns

Output parameterTypeDescription
TimestampdatetimeTHe tTimestamp of the continuous export run
ExternalTableNamestringName of the external table
PathstringOutput path
NumRecordslongNumber of records exported to path

Example

The following example shows retrieved artifacts from the continuous export MyExport that were exported within the last hour.

.show continuous-export MyExport exported-artifacts | where Timestamp > ago(1h)

Output

TimestampExternalTableNamePathNumRecordsSizeInBytes
2018-12-20 07:31:30.2634216ExternalBlobhttp://storageaccount.blob.core.windows.net/container1/1_6ca073fd4c8740ec9a2f574eaa98f579.csv101024

3.5.6 - Continuous data export

This article describes Continuous data export.

This article describes continuous export of data from Kusto to an external table with a periodically run query. The results are stored in the external table, which defines the destination, such as Azure Blob Storage, and the schema of the exported data. This process guarantees that all records are exported “exactly once”, with some exceptions.

By default, continuous export runs in a distributed mode, where all nodes export concurrently, so the number of artifacts depends on the number of nodes. Continuous export isn’t designed for low-latency streaming data.

To enable continuous data export, create an external table and then create a continuous export definition pointing to the external table.

In some cases, you must use a managed identity to successfully configure a continuous export job. For more information, see Use a managed identity to run a continuous export job.

Permissions

All continuous export commands require at least Database Admin permissions.

Continuous export guidelines

  • Output schema:

    • The output schema of the export query must match the schema of the external table to which you export.
  • Frequency:

    • Continuous export runs according to the time period configured for it in the intervalBetweenRuns property. The recommended value for this interval is at least several minutes, depending on the latencies you’re willing to accept. The time interval can be as low as one minute, if the ingestion rate is high.

      [!NOTE] The intervalBetweenRuns serves as a recommendation only, and isn’t guaranteed to be precise. Continuous export isn’t suitable for exporting periodic aggregations. For example, a configuration of intervalBetweenRuns=1h with an hourly aggregation (T | summarize by bin(Timestamp, 1h)) won’t work as expected, since the continuous export won’t run exactly on-the-hour. Therefore, each hourly bin will receive multiple entries in the exported data.

  • Number of files:

    • The number of files exported in each continuous export iteration depends on how the external table is partitioned. For more information, see export to external table command. Each continuous export iteration always writes to new files, and never appends to existing ones. As a result, the number of exported files also depends on the frequency in which the continuous export runs. The frequency parameter is intervalBetweenRuns.
  • External table storage accounts:

    • For best performance, the database and the storage accounts should be colocated in the same Azure region.
    • Continuous export works in a distributed manner, such that all nodes are exporting concurrently. On large databases, and if the exported data volume is large, this might lead to storage throttling. The recommendation is to configure multiple storage accounts for the external table. For more information, see storage failures during export commands.

Exactly once export

To guarantee “exactly once” export, continuous export uses database cursors. The continuous export query shouldn’t include a timestamp filter - the database cursors mechanism ensures that records aren’t processed more than once. Adding a timestamp filter in the query can lead to missing data in exported data.

IngestionTime policy must be enabled on all tables referenced in the query that should be processed “exactly once” in the export. The policy is enabled by default on all newly created tables.

The guarantee for “exactly once” export is only for files reported in the show exported artifacts command. Continuous export doesn’t guarantee that each record is written only once to the external table. If a failure occurs after export begins and some of the artifacts were already written to the external table, the external table might contain duplicates. If a write operation was aborted before completion, the external table might contain corrupted files. In such cases, artifacts aren’t deleted from the external table, but they aren’t reported in the show exported artifacts command. Consuming the exported files using the show exported artifacts command guarantees no duplications and no corruptions.

Export from fact and dimension tables

By default, all tables referenced in the export query are assumed to be fact tables. As such, they’re scoped to the database cursor. The syntax explicitly declares which tables are scoped (fact) and which aren’t scoped (dimension). See the over parameter in the create command for details.

The export query includes only the records that joined since the previous export execution. The export query might contain dimension tables in which all records of the dimension table are included in all export queries. When using joins between fact and dimension tables in continuous-export, keep in mind that records in the fact table are only processed once. If the export runs while records in the dimension tables are missing for some keys, records for the respective keys are either missed or include null values for the dimension columns in the exported files. Returning missed or null records depends on whether the query uses inner or outer join. The forcedLatency property in the continuous-export definition can be useful in such cases, where the fact and dimensions tables are ingested during the same time for matching records.

Monitor continuous export

Monitor the health of your continuous export jobs using the following export metrics:

  • Continuous export max lateness - Max lateness (in minutes) of continuous exports in the database. This is the time between now and the min ExportedTo time of all continuous export jobs in database. For more information, see .show continuous export command.
  • Continuous export result - Success/failure result of each continuous export execution. This metric can be split by the continuous export name.

Use the .show continuous export failures command to see the specific failures of a continuous export job.

Resource consumption

  • The impact of the continuous export on the database depends on the query the continuous export is running. Most resources, such as CPU and memory, are consumed by the query execution.
  • The number of export operations that can run concurrently is limited by the database’s data export capacity. For more information, see Management commands throttling. If the database doesn’t have sufficient capacity to handle all continuous exports, some start lagging behind.
  • The show commands-and-queries command can be used to estimate the resources consumption.
    • Filter on | where ClientActivityId startswith "RunContinuousExports" to view the commands and queries associated with continuous export.

Export historical data

Continuous export starts exporting data only from the point of its creation. Records ingested before that time should be exported separately using the non-continuous export command. Historical data might be too large to be exported in a single export command. If needed, partition the query into several smaller batches.

To avoid duplicates with data exported by continuous export, use StartCursor returned by the show continuous export command and export only records where cursor_before_or_at the cursor value. For example:

.show continuous-export MyExport | project StartCursor
StartCursor
636751928823156645

Followed by:

.export async to table ExternalBlob
<| T | where cursor_before_or_at("636751928823156645")

Continuous export from a table with Row Level Security

To create a continuous export job with a query that references a table with Row Level Security policy, you must:

Continuous export to delta table - Preview

Continuous export to a delta table is currently in preview.

To define continuous export to a delta table, do the following steps:

  1. Create an external delta table, as described in Create and alter delta external tables on Azure Storage.

    [!NOTE] If the schema isn’t provided, Kusto will try infer it automatically if there is already a delta table defined in the target storage container.
    Delta table partitioning isn’t supported.

  2. Define continuous export to this table using the commands described in Create or alter continuous export.

    [!IMPORTANT] The schema of the delta table must be in sync with the continuous export query. If the underlying delta table changes, the export might start failing with unexpected behavior.

Limitations

General:

  • The following formats are allowed on target tables: CSV, TSV, JSON, and Parquet.
  • Continuous export isn’t designed to work over materialized views, since a materialized view might be updated, while data exported to storage is always appended and never updated.
  • Continuous export can’t be created on follower databases since follower databases are read-only and continuous export requires write operations.
  • Records in source table must be ingested to the table directly, using an update policy, or ingest from query commands. If records are moved into the table using .move extents or using .rename table, continuous export might not process these records. See the limitations described in the Database Cursors page.
  • If the artifacts used by continuous export are intended to trigger Event Grid notifications, see the known issues section in the Event Grid documentation.

Cross-database and cross-cluster:

  • Continuous export doesn’t support cross-cluster calls.
  • Continuous export supports cross-database calls only for dimension tables. All fact tables must reside in the local database. See more details in Export from fact and dimension tables.
  • If the continuous export includes cross-database calls, it must be configured with a managed identity.

Cross-database and cross-Eventhouse:

  • Continuous export doesn’t support cross-Eventhouse calls.
  • Continuous export supports cross-database calls only for dimension tables. All fact tables must reside in the local database. See more details in Export from fact and dimension tables.

Policies:

3.5.7 - Enable or disable continuous data export

This article describes how to disable or enable continuous data export.

Disables or enables the continuous-export job. A disabled continuous export isn’t executed, but its current state is persisted and can be resumed when the continuous export is enabled.

When enabling a continuous export that was disabled for a long time, exporting continues from where it last stopped when the exporting was disabled. This continuation might result in a long running export, blocking other exports from running, if there isn’t sufficient database capacity to serve all processes. Continuous exports are executed by last run time in ascending order so the oldest export runs first, until catch up is complete.

Permissions

You must have at least Database Admin permissions to run these commands.

Syntax

.enable continuous-export ContinuousExportName

.disable continuous-export ContinuousExportName

Parameters

NameTypeRequiredDescription
ContinuousExportNamestring✔️The name of the continuous export.

Returns

The result of the show continuous export command of the altered continuous export.

3.5.8 - Use a managed identity to run a continuous export job

This article describes how to use a managed identity for continuous export.

A continuous export job exports data to an external table with a periodically run query.

The continuous export job should be configured with a managed identity in the following scenarios:

  • When the external table uses impersonation authentication
  • When the query references tables in other databases
  • When the query references tables with an enabled row level security policy

A continuous export job configured with a managed identity is performed on behalf of the managed identity.

In this article, you learn how to configure a system-assigned or user-assigned managed identity and create a continuous export job using that identity.

Prerequisites

Configure a managed identity

There are two types of managed identities:

  • System-assigned: A system-assigned identity is connected to your cluster and is removed when the cluster is removed. Only one system-assigned identity is allowed per cluster.

  • User-assigned: A user-assigned managed identity is a standalone Azure resource. Multiple user-assigned identities can be assigned to your cluster.

Select one of the following tabs to set up your preferred managed identity type.

User-assigned

  1. Follow the steps to Add a user-assigned identity.

  2. In the Azure portal, in the left menu of your managed identity resource, select Properties. Copy and save the Tenant Id and Principal Id for use in the following steps.

    Screenshot of Azure portal area with managed identity IDs.

  3. Run the following .alter-merge policy managed_identity command, replacing <objectId> with the managed identity object ID from the previous step. This command sets a managed identity policy on the cluster that allows the managed identity to be used with continuous export.

    .alter-merge cluster policy managed_identity ```[
        {
          "ObjectId": "<objectId>",
          "AllowedUsages": "AutomatedFlows"
        }
    ]```
    

    [!NOTE] To set the policy on a specific database, use database <DatabaseName> instead of cluster.

  4. Run the following command to grant the managed identity Database Viewer permissions over all databases used for the continuous export, such as the database that contains the external table.

    .add database <DatabaseName> viewers ('aadapp=<objectId>;<tenantId>')
    

    Replace <DatabaseName> with the relevant database, <objectId> with the managed identity Principal Id from step 2, and <tenantId> with the Microsoft Entra ID Tenant Id from step 2.

System-assigned

  1. Follow the steps to Add a system-assigned identity.

  2. Copy and save the Object (principal) ID for use in a later step.

  3. Run the following .alter-merge policy managed_identity command. This command sets a managed identity policy on the cluster that allows the managed identity to be used with continuous export.

    .alter-merge cluster policy managed_identity ```[
        {
          "ObjectId": "system",
          "AllowedUsages": "AutomatedFlows"
        }
    ]```
    

    [!NOTE] To set the policy on a specific database, use database <DatabaseName> instead of cluster.

  4. Run the following command to grant the managed identity Database Viewer permissions over all databases used for the continuous export, such as the database that contains the external table.

    .add database <DatabaseName> viewers ('aadapp=<objectId>')
    

    Replace <DatabaseName> with the relevant database and <objectId> with the managed identity Object (principal) ID from step 2.

Set up an external table

External tables refer to data located in Azure Storage, such as Azure Blob Storage, Azure Data Lake Storage Gen1, Azure Data Lake Storage Gen2, or SQL Server.

Select one of the following tabs to set up an Azure Storage or SQL Server external table.

Azure Storage

  1. Create a connection string based on the storage connection string templates. This string indicates the resource to access and its authentication information. For continuous export flows, we recommend impersonation authentication.

  2. Run the .create or .alter external table command to create the table. Use the connection string from the previous step as the storageConnectionString argument.

    For example, the following command creates MyExternalTable that refers to CSV-formatted data in mycontainer of mystorageaccount in Azure Blob Storage. The table has two columns, one for an integer x and one for a string s. The connection string ends with ;impersonate, which indicates to use impersonation authentication to access the data store.

    .create external table MyExternalTable (x:int, s:string) kind=storage dataformat=csv 
    ( 
        h@'https://mystorageaccount.blob.core.windows.net/mycontainer;impersonate' 
    )
    
  3. Grant the managed identity write permissions over the relevant external data store. The managed identity needs write permissions because the continuous export job exports data to the data store on behalf of the managed identity.

    External data storeRequired permissionsGrant the permissions
    Azure Blob StorageStorage Blob Data ContributorAssign an Azure role
    Data Lake Storage Gen2Storage Blob Data ContributorAssign an Azure role
    Data Lake Storage Gen1ContributorAssign an Azure role

SQL Server

  1. Create a SQL Server connection string. This string indicates the resource to access and its authentication information. For continuous export flows, we recommend Microsoft Entra integrated authentication, which is impersonation authentication.

  2. Run the .create or .alter external table command to create the table. Use the connection string from the previous step as the sqlServerConnectionString argument.

    For example, the following command creates MySqlExternalTable that refers to MySqlTable table in MyDatabase of SQL Server. The table has two columns, one for an integer x and one for a string s. The connection string contains ;Authentication=Active Directory Integrated, which indicates to use impersonation authentication to access the table.

    .create external table MySqlExternalTable (x:int, s:string) kind=sql table=MySqlTable
    ( 
       h@'Server=tcp:myserver.database.windows.net,1433;Authentication=Active Directory Integrated;Initial Catalog=MyDatabase;'
    )
    
  3. Grant the managed identity CREATE, UPDATE, and INSERT permissions over the SQL Server database. The managed identity needs write permissions because the continuous export job exports data to the database on behalf of the managed identity. To learn more, see Permissions.

Create a continuous export job

Select one of the following tabs to create a continuous export job that runs on behalf of a user-assigned or system-assigned managed identity.

User-assigned

Run the .create-or-alter continuous-export command with the managedIdentity property set to the managed identity object ID.

For example, the following command creates a continuous export job named MyExport to export the data in MyTable to MyExternalTable on behalf of a user-assigned managed identity. <objectId> should be a managed identity object ID.

.create-or-alter continuous-export MyExport over (MyTable) to table MyExternalTable with (managedIdentity=<objectId>, intervalBetweenRuns=5m) <| MyTable

System-assigned

Run the .create-or-alter continuous-export command with the managedIdentity property set to system.

For example, the following command creates a continuous export job named MyExport to export the data in MyTable to MyExternalTable on behalf of your system-assigned managed identity.

.create-or-alter continuous-export MyExport over (MyTable) to table MyExternalTable with (managedIdentity="system", intervalBetweenRuns=5m) <| MyTable

4 - Data ingestion

4.1 - .ingest inline command (push)

This article describes the .ingest inline command (push).

This command inserts data into a table by pushing the data included within the command to the table.

Permissions

You must have at least Table Ingestor permissions to run this command.

Syntax

.ingest inline into table TableName [with ( IngestionPropertyName = IngestionPropertyValue [, …] )] <| Data

.ingest inline into table TableName [with ( IngestionPropertyName = IngestionPropertyValue [, …] )] [ Data ]

Parameters

NameTypeRequiredDescription
TableNamestring✔️The name of the table into which to ingest data. The table name is always relative to the database in context. Its schema is the default schema assumed for the data if no schema mapping object is provided.
Datastring✔️The data content to ingest. Unless otherwise modified by the ingestion properties, this content is parsed as CSV.
IngestionPropertyName, IngestionPropertyValuestringAny number of ingestion properties that affect the ingestion process.

Returns

The result is a table with as many records as the number of generated data shards (“extents”). If no data shards are generated, a single record is returned with an empty (zero-valued) extent ID.

NameTypeDescription
ExtentIdguidThe unique identifier for the data shard that’s generated by the command.

Examples

Ingest with <| syntax

The following command ingests data into a table Purchases with two columns: SKU (of type string) and Quantity (of type long).

.ingest inline into table Purchases <|
    Shoes,1000
    Wide Shoes,50
    "Coats black",20
    "Coats with ""quotes""",5

Ingest with bracket syntax

The following command ingests data into a table Logs with two columns: Date (of type datetime) and EventDetails (of type dynamic).

.ingest inline into table Logs
    [2015-01-01,"{""EventType"":""Read"", ""Count"":""12""}"]
    [2015-01-01,"{""EventType"":""Write"", ""EventValue"":""84""}"]

4.2 - .show data operations

Learn how to use the .show data operations command to return data operations that reached a final state.

Returns a table with data operations that reached a final state. Data operations are available for 30 days from when they ran.

Any operation that results in new extents (data shards) added to the system is considered a data operation.

Permissions

You must have Database Admin or Database Monitor permissions to see any data operations invoked on your database.

Any user can see their own data operations.

For more information, see Kusto role-based access control.

Syntax

.show data operations

Returns

This command returns a table with the following columns:

Output parameterTypeDescription
TimestampdatetimeThe time when the operation reached its final state.
DatabasestringThe database name.
TablestringThe table name.
ClientActivityIdstringThe operation client activity ID.
OperationKindstringOne of BatchIngest, SetOrAppend, RowStoreSeal, MaterializedView, QueryAcceleration, and UpdatePolicy.
OriginalSizelongThe original size of the ingested data.
ExtentSizelongThe extent size.
RowCountlongThe number of rows in the extent.
ExtentCountintThe number of extents.
TotalCputimespanThe total CPU time used by the data operation.
DurationtimespanThe duration of the operation.
PrincipalstringThe identity that initiated the data operation.
PropertiesdynamicAdditional information about the data operation.

Example

The following example returns information about UpdatePolicy, BatchIngest, and SetOrAppend operations.

.show data operations

Output

TimestampDatabaseTableClientActivityIdOperationKindOriginalSizeExtentSizeRowCountExtentCountTotalCpuDurationPrincipalProperties
2024-07-18 15:21:10.5432134TestLogsUTResultsDM.IngestionExecutor;abcd1234-1234-1234-abcd-1234abcdce;1UpdatePolicy100,82975,578279100:00:00.265625000:00:28.9101535aadapp=xxx{“SourceTable”: “UTLogs”}
2024-07-18 15:21:12.9481819TestLogsUTLogsDM.IngestionExecutor;abcd1234-1234-1234-abcd-1234abcdce;1BatchIngest1,045,027,298123,067,9471,688,705200:00:22.984375000:00:29.9745733aadapp=xxx{“Format”: “Csv”,“NumberOfInputStreams”:2}
2024-07-18 15:21:16.1095441KustoAutoIncidentKustoGPTSummarycdef12345-6789-ghij-0123-klmn45678SetOrAppend1,4203,1901100:00:00.015625000:00:00.0638211aaduser=xxx

4.3 - Data formats supported for ingestion

Learn about the various data and compression formats supported for ingestion.

Data ingestion is the process by which data is added to a table and is made available for query. For all ingestion methods, other than ingest-from-query, the data must be in one of the supported formats. The following table lists and describes the formats that is supported for data ingestion.

For more information about why ingestion might fail, see Ingestion failures and Ingestion error codes in Azure Data Explorer.

FormatExtensionDescription
ApacheAvro.avroAn AVRO format with support for logical types. The following compression codecs are supported: null, deflate, and snappy. Reader implementation of the apacheavro format is based on the official Apache Avro library. For information about ingesting Event Hub Capture Avro files, see Ingesting Event Hub Capture Avro files.
Avro.avroA legacy implementation for AVRO format based on .NET library. The following compression codecs are supported: null, deflate (for snappy - use ApacheAvro data format).
CSV.csvA text file with comma-separated values (,). See RFC 4180: Common Format and MIME Type for Comma-Separated Values (CSV) Files.
JSON.jsonA text file with JSON objects delimited by \n or \r\n. See JSON Lines (JSONL).
MultiJSON.multijsonA text file with a JSON array of property bags (each representing a record), or any number of property bags delimited by whitespace, \n or \r\n. Each property bag can be spread on multiple lines.
ORC.orcAn ORC file.
Parquet.parquetA Parquet file.
PSV.psvA text file with pipe-separated values (|).
RAW.rawA text file whose entire contents is a single string value.
SCsv.scsvA text file with semicolon-separated values (;).
SOHsv.sohsvA text file with SOH-separated values. (SOH is ASCII codepoint 1; this format is used by Hive on HDInsight.)
TSV.tsvA text file with tab-separated values (\t).
TSVE.tsvA text file with tab-separated values (\t). A backslash character (\) is used for escaping.
TXT.txtA text file with lines delimited by \n. Empty lines are skipped.
W3CLOGFILE.logWeb log file format standardized by the W3C.

For more info on ingesting data using json or multijson formats, see ingest json formats.

Supported data compression formats

Blobs and files can be compressed through any of the following compression algorithms:

CompressionExtension
gzip.gz
zip.zip

Indicate compression by appending the extension to the name of the blob or file.

For example:

  • MyData.csv.zip indicates a blob or a file formatted as CSV, compressed with zip (archive or a single file)
  • MyData.json.gz indicates a blob or a file formatted as JSON, compressed with gzip.

Blob or file names that don’t include the format extensions but just compression (for example, MyData.zip) is also supported. In this case, the file format must be specified as an ingestion property because it cannot be inferred.

4.4 - Data ingestion properties

Learn about the various data ingestion properties.

Data ingestion is the process by which data is added to a table and is made available for query. You add properties to the ingestion command after the with keyword.

4.5 - Ingest from query

4.5.1 - .cancel operation command

Learn how to use the .cancel operation command to cancel a long-running operation.

This command cancels a long-running ingest from query operation. This command is useful when the operation is taking too long and you would like to abort it while running.

The cancel operation command isn’t guaranteed to succeed. The output of the .cancel operation command indicates whether or not cancellation was successful.

Syntax

.cancel operation OperationId [with ( reason = ReasonPhrase )]

Parameters

NameTypeRequiredDescription
OperationIdguid✔️A guid of the operation ID returned from the running command.
ReasonPhrasestringThe reason for canceling the running command.

Returns

Output parameterTypeDescription
OperationIdguidThe operation ID of the operation that was canceled.
OperationstringThe operation kind that was canceled.
StartedOndatetimeThe start time of the operation that was canceled.
CancellationStatestringReturns one of the following options:
Cancelled successfully: the operation was canceled
Cancel failed: the operation can’t be canceled at this point. The operation may still be running or may have completed.
ReasonPhrasestringReason why cancellation wasn’t successful.

Example

.cancel operation 078b2641-f10d-4694-96f8-1ee2b75dda48 with(Reason="Command canceled by me")
OperationIdOperationStartedOnCancellationStateReasonPhrase
c078b2641-f10d-4694-96f8-1ee2b75dda48TableSetOrAppend2022-07-18 09:03:55.1387320Canceled successfullyCommand canceled by me

4.5.2 - Kusto query ingestion (set, append, replace)

Learn how to use the .set, .append, .set-or-append, and .set-or-replace commands to ingest data from a query.

These commands execute a query or a management command and ingest the results of the query into a table. The difference between these commands is how they treat existing or nonexistent tables and data.

CommandIf table existsIf table doesn’t exist
.setThe command fails.The table is created and data is ingested.
.appendData is appended to the table.The command fails.
.set-or-appendData is appended to the table.The table is created and data is ingested.
.set-or-replaceData replaces the data in the table.The table is created and data is ingested.

To cancel an ingest from query command, see cancel operation.

Permissions

To perform different actions on a table, you need specific permissions:

  • To add rows to an existing table using the .append command, you need a minimum of Table Ingestor permissions.
  • To create a new table using the various .set commands, you need a minimum of Database User permissions.
  • To replace rows in an existing table using the .set-or-replace command, you need a minimum of Table Admin permissions.

For more information on permissions, see Kusto role-based access control.

Syntax

(.set | .append | .set-or-append | .set-or-replace) [async] tableName [with (propertyName = propertyValue [, …])] <| queryOrCommand

Parameters

NameTypeRequiredDescription
asyncstringIf specified, the command returns immediately and continues ingestion in the background. Use the returned OperationId with the .show operations command to retrieve the ingestion completion status and results.
tableNamestring✔️The name of the table to ingest data into. The tableName is always related to the database in context.
propertyName, propertyValuestringOne or more supported ingestion properties used to control the ingestion process.
queryOrCommandstring✔️The text of a query or a management command whose results are used as data to ingest. Only .show management commands are supported.

Performance tips

  • Set the distributed property to true if the amount of data produced by the query is large, exceeds one gigabyte (GB), and doesn’t require serialization. Then, multiple nodes can produce output in parallel. Don’t use this flag when query results are small, since it might needlessly generate many small data shards.
  • Data ingestion is a resource-intensive operation that might affect concurrent activities on the database, including running queries. Avoid running too many ingestion commands at the same time.
  • Limit the data for ingestion to less than one GB per ingestion operation. If necessary, use multiple ingestion commands.

Supported ingestion properties

PropertyTypeDescription
distributedboolIf true, the command ingests from all nodes executing the query in parallel. Default is false. See performance tips.
creationTimestringThe datetime value, formatted as an ISO8601 string, to use at the creation time of the ingested data extents. If unspecified, now() is used. When specified, make sure the Lookback property in the target table’s effective Extents merge policy is aligned with the specified value.
extend_schemaboolIf true, the command might extend the schema of the table. Default is false. This option applies only to .append, .set-or-append, and set-or-replace commands. This option requires at least Table Admin permissions.
recreate_schemaboolIf true, the command might recreate the schema of the table. Default is false. This option applies only to the .set-or-replace command. This option takes precedence over the extend_schema property if both are set. This option requires at least Table Admin permissions.
folderstringThe folder to assign to the table. If the table already exists, this property overwrites the table’s folder.
ingestIfNotExistsstringIf specified, ingestion fails if the table already has data tagged with an ingest-by: tag with the same value. For more information, see ingest-by: tags.
policy_ingestiontimeboolIf true, the Ingestion Time Policy is enabled on the table. The default is true.
tagsstringA JSON string that represents a list of tags to associate with the created extent.
docstringstringA description used to document the table.
persistDetailsA Boolean value that, if specified, indicates that the command should persist the detailed results for retrieval by the .show operation details command. Defaults to false.with (persistDetails=true)

Schema considerations

  • .set-or-replace preserves the schema unless one of extend_schema or recreate_schema ingestion properties is set to true.
  • .set-or-append and .append commands preserve the schema unless the extend_schema ingestion property is set to true.
  • Matching the result set schema to that of the target table is based on the column types. There’s no matching of column names. Make sure that the query result schema columns are in the same order as the table, otherwise data is ingested into the wrong columns.

Character limitation

The command fails if the query generates an entity name with the $ character. The entity names must comply with the naming rules, so the $ character must be removed for the ingest command to succeed.

For example, in the following query, the search operator generates a column $table. To store the query results, use project-rename to rename the column.

.set Texas <| search State has 'Texas' | project-rename tableName=$table

Returns

Returns information on the extents created because of the .set or .append command.

Examples

Create and update table from query source

The following query creates the :::no-loc text=“RecentErrors”::: table with the same schema as :::no-loc text=“LogsTable”:::. It updates :::no-loc text=“RecentErrors”::: with all error logs from :::no-loc text=“LogsTable”::: over the last hour.

.set RecentErrors <|
   LogsTable
   | where Level == "Error" and Timestamp > now() - time(1h)

Create and update table from query source using the distributed flag

The following example creates a new table called OldExtents in the database, asynchronously. The dataset is expected to be bigger than one GB (more than ~one million rows) so the distributed flag is used. It updates OldExtents with ExtentId entries from the MyExtents table that were created more than 30 days ago.

.set async OldExtents with(distributed=true) <|
   MyExtents 
   | where CreatedOn < now() - time(30d)
   | project ExtentId

Append data to table

The following example filters ExtentId entries in the MyExtents table that were created more than 30 days ago and appends the entries to the OldExtents table with associated tags.

.append OldExtents with(tags='["TagA","TagB"]') <| 
   MyExtents 
   | where CreatedOn < now() - time(30d) 
   | project ExtentId

Create or append a table with possibly existing tagged data

The following example either appends to or creates the OldExtents table asynchronously. It filters ExtentId entries in the MyExtents table that were created more than 30 days ago and specifies the tags to append to the new extents with ingest-by:myTag. The ingestIfNotExists parameter ensures that the ingestion only occurs if the data doesn’t already exist in the table with the specified tag.

.set-or-append async OldExtents with(tags='["ingest-by:myTag"]', ingestIfNotExists='["myTag"]') <|
   MyExtents
   | where CreatedOn < now() - time(30d)
   | project ExtentId

Create table or replace data with associated data

The following query replaces the data in the OldExtents table, or creates the table if it doesn’t already exist, with ExtentId entries in the MyExtents table that were created more than 30 days ago. Tag the new extent with ingest-by:myTag if the data doesn’t already exist in the table with the specified tag.

.set-or-replace async OldExtents with(tags='["ingest-by:myTag"]', ingestIfNotExists='["myTag"]') <| 
   MyExtents 
   | where CreatedOn < now() - time(30d) 
   | project ExtentId

Append data with associated data

The following example appends data to the OldExtents table asynchronously, using ExtentId entries from the MyExtents table that were created more than 30 days ago. It sets a specific creation time for the new extents.

.append async OldExtents with(creationTime='2017-02-13T11:09:36.7992775Z') <| 
   MyExtents 
   | where CreatedOn < now() - time(30d) 
   | project ExtentId     

Sample output

The following is a sample of the type of output you may see from your queries.

ExtentIdOriginalSizeExtentSizeCompressedSizeIndexSizeRowCount
23a05ed6-376d-4119-b1fc-6493bcb05563129158821568431410

4.6 - Kusto.ingest into command (pull data from storage)

This article describes The .ingest into command (pull data from storage).

The .ingest into command ingests data into a table by “pulling” the data from one or more cloud storage files. For example, the command can retrieve 1,000 CSV-formatted blobs from Azure Blob Storage, parse them, and ingest them together into a single target table. Data is appended to the table without affecting existing records, and without modifying the table’s schema.

Permissions

You must have at least Table Ingestor permissions to run this command.

Syntax

.ingest [async] into table TableName SourceDataLocator [with ( IngestionPropertyName = IngestionPropertyValue [, …] )]

Parameters

NameTypeRequiredDescription
asyncstringIf specified, the command returns immediately and continues ingestion in the background. The results of the command include an OperationId value that can then be used with the .show operation command to retrieve the ingestion completion status and results.
TableNamestring✔️The name of the table into which to ingest data. The table name is always relative to the database in context. If no schema mapping object is provided, the schema of the database in context is used.
SourceDataLocatorstring✔️A single or comma-separated list of storage connection strings. A single connection string must refer to a single file hosted by a storage account. Ingestion of multiple files can be done by specifying multiple connection strings, or by ingesting from a query of an external table.

Authentication and authorization

Each storage connection string indicates the authorization method to use for access to the storage. Depending on the authorization method, the principal might need to be granted permissions on the external storage to perform the ingestion.

The following table lists the supported authentication methods and the permissions needed for ingesting data from external storage.

Authentication methodAzure Blob Storage / Data Lake Storage Gen2Data Lake Storage Gen1
ImpersonationStorage Blob Data ReaderReader
Shared Access (SAS) tokenList + ReadThis authentication method isn’t supported in Gen1.
Microsoft Entra access token
Storage account access keyThis authentication method isn’t supported in Gen1.
Managed identityStorage Blob Data ReaderReader

Returns

The result of the command is a table with as many records as there are data shards (“extents”) generated by the command. If no data shards were generated, a single record is returned with an empty (zero-valued) extent ID.

NameTypeDescription
ExtentIdguidThe unique identifier for the data shard that was generated by the command.
ItemLoadedstringOne or more storage files that are related to this record.
DurationtimespanHow long it took to perform ingestion.
HasErrorsboolWhether or not this record represents an ingestion failure.
OperationIdguidA unique ID representing the operation. Can be used with the .show operation command.

Examples

Azure Blob Storage with shared access signature

The following example instructs your database to read two blobs from Azure Blob Storage as CSV files, and ingest their contents into table T. The ... represents an Azure Storage shared access signature (SAS) which gives read access to each blob. Obfuscated strings (the h in front of the string values) are used to ensure that the SAS is never recorded.

.ingest into table T (
    h'https://contoso.blob.core.windows.net/container/file1.csv?...',
    h'https://contoso.blob.core.windows.net/container/file2.csv?...'
)

Azure Blob Storage with managed identity

The following example shows how to read a CSV file from Azure Blob Storage and ingest its contents into table T using managed identity authentication. Authentication uses the managed identity ID (object ID) assigned to the Azure Blob Storage in Azure. For more information, see Create a managed identity for storage containers.

.ingest into table T ('https://StorageAccount.blob.core.windows.net/Container/file.csv;managed_identity=802bada6-4d21-44b2-9d15-e66b29e4d63e')

Azure Data Lake Storage Gen 2

The following example is for ingesting data from Azure Data Lake Storage Gen 2 (ADLSv2). The credentials used here (...) are the storage account credentials (shared key), and we use string obfuscation only for the secret part of the connection string.

.ingest into table T (
  'abfss://myfilesystem@contoso.dfs.core.windows.net/path/to/file1.csv;...'
)

Azure Data Lake Storage

The following example ingests a single file from Azure Data Lake Storage (ADLS). It uses the user’s credentials to access ADLS (so there’s no need to treat the storage URI as containing a secret). It also shows how to specify ingestion properties.

.ingest into table T ('adl://contoso.azuredatalakestore.net/Path/To/File/file1.ext;impersonate')
  with (format='csv')

Amazon S3 with an access key

The following example ingests a single file from Amazon S3 using an access key ID and a secret access key.

.ingest into table T ('https://bucketname.s3.us-east-1.amazonaws.com/path/to/file.csv;AwsCredentials=AKIAIOSFODNN7EXAMPLE,wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY')
  with (format='csv')

Amazon S3 with a presigned URL

The following example ingests a single file from Amazon S3 using a preSigned URL.

  with (format='csv')

4.7 - Streaming ingestion

4.7.1 - Clearing cached schema for streaming ingestion

This article describes management command for clearing cached database schema.

Nodes cache schema of the databases that receive data via streaming ingestion. This process optimizes performance and utilization of resources, but can cause propagation delays when the schema change.

Clear the cache to guarantee that subsequent streaming ingestion requests incorporate database or table schema changes. For more information, see Streaming ingestion and schema changes.

Permissions

You must have at least Database Ingestor permissions to run this command.

Syntax

.clear table TableName cache streamingingestion schema

.clear database cache streamingingestion schema

Parameters

NameTypeRequiredDescription
TableNamestring✔️The name of the table for which to clear the cache.

Returns

This command returns a table with the following columns:

ColumnTypeDescription
NodeIdstringIdentifier of the node
StatusstringSucceeded/Failed

Example

.clear database cache streamingingestion schema

.clear table T1 cache streamingingestion schema
NodeIdStatus
Node1Succeeded
Node2Failed

4.7.2 - Streaming ingestion and schema changes

This article discusses options of handling schema changes with streaming ingestion.

Cluster nodes cache the schema of databases that get data through streaming ingestion, boosting performance and resource use. However, when there are schema changes, it can lead to delays in updates.

Eventhouse nodes cache the schema of databases that get data through streaming ingestion, boosting performance and resource use. However, when there are schema changes, it can lead to delays in updates.

If schema changes and streaming ingestion aren’t synchronized, you can encounter failures like schema-related errors or incomplete and distorted data in the table.

This article outlines typical schema changes and provides guidance on avoiding problems with streaming ingestion during these changes.

Schema changes

The following list covers key examples of schema changes:

Coordinate schema changes with streaming ingestion

The schema cache is kept while the database is online. If there are schema changes, the system automatically refreshes the cache, but this refresh can take several minutes. If you rely on the automatic refresh, you can experience uncoordinated ingestion failures.

You can reduce the effects of propagation delay by explicitly clearing the schema cache on the nodes. If the streaming ingestion flow and schema changes are coordinated, you can completely eliminate failures and their associated data distortion.

To coordinate the streaming ingestion flow with schema changes:

  1. Suspend streaming ingestion.
  2. Wait until all outstanding streaming ingestion requests are complete.
  3. Do schema changes.
  4. Issue one or several .clear cache streaming ingestion schema commands.
    • Repeat until successful and all rows in the command output indicate success
  5. Resume streaming ingestion.

5 - Database cursors

5.1 - Database cursors

Learn how to use database cursors to query a database multiple times.

A database cursor is a database-level object that lets you query a database multiple times. You get consistent results even if there are data-append or data-retention operations happening in parallel with the queries.

Database cursors are designed to address two important scenarios:

  • The ability to repeat the same query multiple times and get the same results, as long as the query indicates “same data set”.

  • The ability to make an “exactly once” query. This query only “sees” the data that a previous query didn’t see, because the data wasn’t available then. The query lets you iterate, for example, through all the newly arrived data in a table without fear of processing the same record twice or skipping records by mistake.

The database cursor is represented in the query language as a scalar value of type string. The actual value should be considered opaque and there’s no support for any operation other than to save its value or use the cursor functions noted below.

Cursor functions

Kusto provides three functions to help implement the two above scenarios:

  • cursor_current(): Use this function to retrieve the current value of the database cursor. You can use this value as an argument to the two other functions.

  • cursor_after(rhs:string): This special function can be used on table records that have the IngestionTime policy enabled. It returns a scalar value of type bool indicating whether the record’s ingestion_time() database cursor value comes after the rhs database cursor value.

  • cursor_before_or_at(rhs:string): This special function can be used on the table records that have the IngestionTime policy enabled. It returns a scalar value of type bool indicating whether the record’s ingestion_time() database cursor value comes before or at the rhs database cursor value.

The two special functions (cursor_after and cursor_before_or_at) also have a side-effect: When they’re used, Kusto will emit the current value of the database cursor to the @ExtendedProperties result set of the query. The property name for the cursor is Cursor, and its value is a single string.

For example:

{"Cursor" : "636040929866477946"}

Restrictions

Database cursors can only be used with tables for which the IngestionTime policy has been enabled. Each record in such a table is associated with the value of the database cursor that was in effect when the record was ingested. As such, the ingestion_time() function can be used.

The database cursor object holds no meaningful value unless the database has at least one table that has an IngestionTime policy defined. This value is guaranteed to update, as-needed by the ingestion history, into such tables and the queries run, that reference such tables. It might, or might not, be updated in other cases.

The ingestion process first commits the data, so that it’s available for querying, and only then assigns an actual cursor value to each record. If you attempt to query for data immediately following the ingestion completion using a database cursor, the results might not yet incorporate the last records added, because they haven’t yet been assigned the cursor value. Also, retrieving the current database cursor value repeatedly might return the same value, even if ingestion was done in between, because only a cursor commit can update its value.

Querying a table based on database cursors is only guaranteed to “work” (providing exactly-once guarantees) if the records are ingested directly into that table. If you’re using extents commands, such as move extents/.replace extents to move data into the table, or if you’re using .rename table, then querying this table using database cursors isn’t guaranteed to not miss any data. This is because the ingestion time of the records is assigned when initially ingested, and doesn’t change during the move extents operation. Therefore, when the extents are moved into the target table, it’s possible that the cursor value assigned to the records in these extents was already processed (and next query by database cursor will miss the new records).

Example: Processing records exactly once

For a table Employees with schema [Name, Salary], to continuously process new records as they’re ingested into the table, use the following process:

// [Once] Enable the IngestionTime policy on table Employees
.set table Employees policy ingestiontime true

// [Once] Get all the data that the Employees table currently holds 
Employees | where cursor_after('')

// The query above will return the database cursor value in
// the @ExtendedProperties result set. Lets assume that it returns
// the value '636040929866477946'

// [Many] Get all the data that was added to the Employees table
// since the previous query was run using the previously-returned
// database cursor 

6 - Plugin commands

7 - Policies

7.1 - Policies overview

Learn which policies are available for management.

The following table provides an overview of the policies for managing your environment:

PolicyDescription
Auto delete policySets an expiry date for the table. The table is automatically deleted at this expiry time.
Cache policyDefines how to prioritize resources. Allows customers to differentiate between hot data cache and cold data cache.
Callout policyManages the authorized domains for external calls.
Capacity policyControls the compute resources of data management operations.
Encoding policyDefines how data is encoded, compressed, and indexed.
Extent tags retention policyControls the mechanism that automatically removes extent tags from tables.
Ingestion batching policyGroups multiple data ingestion requests into batches for more efficient processing.
Ingestion time policyAdds a hidden datetime column to the table that records the time of ingestion.
ManagedIdentity policyControls which managed identities can be used for what purposes.
Merge policyDefines rules for merging data from different extents into a single extent.
Mirroring policyAllows you to manage your mirroring policy and mirroring policy operations.
Partitioning policyDefines rules for partitioning extents for a specific table or a materialized view.
Retention policyControls the mechanism that automatically removes data from tables or materialized views.
Restricted view access policyAdds an extra layer of permission requirements for principals to access and view the table.
Row level security policyDefines rules for access to rows in a table based on group membership or execution context.
Row order policyMaintains a specific order for rows within an extent.
Sandbox policyControls the usage and behavior of sandboxes, which are isolated environments for query execution.
Sharding policyDefines rules for how extents are created.
Streaming ingestion policyConfigurations for streaming data ingestion.
Update policyAllows for data to be appended to a target table upon adding data to a source table.
Query weak consistency policyControls the level of consistency for query results.

7.2 - Auto delete

7.2.1 - Auto delete policy

Learn about the auto delete policy to set an expiry date for the table.

An auto delete policy on a table sets an expiry date for the table. The table is automatically deleted at this expiry time. Unlike the retention policy, which determines when data (extents) are removed from a table, the auto delete policy drops the entire table.

The auto delete policy can be useful for temporary staging tables. Temporary staging tables are used for data preparation, until the data is moved to its permanent location. We recommend explicitly dropping temporary tables when they’re no longer needed. Only use the auto delete policy as a fallback mechanism in case the explicit deletion doesn’t occur.

Policy object

An auto delete policy includes the following properties:

  • ExpiryDate:

    • Date and time value indicating when the table should be deleted.
    • The deletion time is imprecise, and could occur few hours later than the time specified in the ExpiryDate property.
    • The value specified can’t be null and it must be greater than current time.
  • DeleteIfNotEmpty:

    • Boolean value indicating whether table should be dropped even if there are still extents in it.
    • Defaults to false.

For more information, see auto delete policy commands.

7.3 - Caching

7.3.1 - Caching policy (hot and cold cache)

This article describes caching policy (hot and cold cache).

To ensure fast query performance, a multi-tiered data cache system is used. Data is stored in reliable storage but parts of it are cached on processing nodes, SSD, or even in RAM for faster access.

The caching policy allows you to choose which data should be cached. You can differentiate between hot data cache and cold data cache by setting a caching policy on hot data. Hot data is kept in local SSD storage for faster query performance, while cold data is stored in reliable storage, which is cheaper but slower to access.

The cache uses 95% of the local SSD disk for hot data. If there isn’t enough space, the most recent data is preferentially kept in the cache. The remaining 5% is used for data that isn’t categorized as hot. This design ensures that queries loading lots of cold data won’t evict hot data from the cache.

The best query performance is achieved when all ingested data is cached. However, certain data might not warrant the expense of being kept in the hot cache. For instance, infrequently accessed old log records might be considered less crucial. In such cases, teams often opt for lower querying performance over paying to keep the data warm.

Use management commands to alter the caching policy at the database, table, or materialized view level.

Use management commands to alter the caching policy at the cluster, database, table, or materialized view level.

How caching policy is applied

When data is ingested, the system keeps track of the date and time of the ingestion, and of the extent that was created. The extent’s ingestion date and time value (or maximum value, if an extent was built from multiple preexisting extents), is used to evaluate the caching policy.

By default, the effective policy is null, which means that all the data is considered hot. A null policy at the table level means that the policy is inherited from the database. A non-null table-level policy overrides a database-level policy.

Scoping queries to hot cache

When running queries, you can limit the scope to only query data in hot cache.

There are several query possibilities:

  • Add a client request property called query_datascope to the query. Possible values: default, all, and hotcache.
  • Use a set statement in the query text: set query_datascope='...'. Possible values are the same as for the client request property.
  • Add a datascope=... text immediately after a table reference in the query body. Possible values are all and hotcache.

The default value indicates use of the default settings, which determine that the query should cover all data.

If there’s a discrepancy between the different methods, then set takes precedence over the client request property. Specifying a value for a table reference takes precedence over both.

For example, in the following query, all table references use hot cache data only, except for the second reference to “T” that is scoped to all the data:

set query_datascope="hotcache";
T | union U | join (T datascope=all | where Timestamp < ago(365d)) on X

Caching policy vs retention policy

Caching policy is independent of retention policy:

  • Caching policy defines how to prioritize resources. Queries for important data are faster.
  • Retention policy defines the extent of the queryable data in a table/database (specifically, SoftDeletePeriod).

Configure this policy to achieve the optimal balance between cost and performance, based on the expected query pattern.

Example:

  • SoftDeletePeriod = 56d
  • hot cache policy = 28d

In the example, the last 28 days of data is stored on the SSD and the additional 28 days of data is stored in Azure blob storage. You can run queries on the full 56 days of data.

7.4 - Callout

7.4.1 - Callout policy

Learn how to update a cluster’s callout policy to manage authorized domains for external calls.

Your cluster can communicate with external services in many different scenarios. Cluster administrators can manage the authorized domains for external calls by updating the cluster’s callout policy.

Supported properties of a callout

A callout policy is composed of the following properties:

NameTypeDescription
CalloutTypestringDefines the type of callout, and can be one of types listed in callout types.
CalloutUriRegexstringSpecifies the regular expression whose matches represent the domain of resources of the callout domain.
CanCallboolWhether the callout is permitted or denied external calls.

Types of callout

Callout policies are managed at cluster-level and are classified into the following types:

Callout policy typeDescription
kustoControls cross-cluster queries.
sqlControls the SQL plugin.
mysqlControls the MySQL plugin.
postgresqlControls the PostgreSql plugin.
azure_digital_twinsControls the Azure Digital Twins plugin.
cosmosdbControls the Cosmos DB plugin.
sandbox_artifactsControls sandboxed plugins (python and R).
external_dataControls access to external data through external tables or externaldata operator.
webapiControls access to http endpoints.
azure_openaiControls calls to Azure OpenAI plugins such as the embedding plugin ai_embed_text plugin.

Predefined callout policies

The following table shows a set of predefined callout policies that are preconfigured on your cluster to enable callouts to selected services:

ServiceDesignationPermitted domains
KustoCross cluster queries[a-z0-9]{3,22}\\.(\\w+\\.)?kusto(mfa)?\\.windows\\.net/?$
KustoCross cluster queries`^https://[a-z0-9]{3,22}\.[a-z0-9-]{1,50}\.(kusto\.azuresynapse
KustoCross cluster queries`^https://([A-Za-z0-9]+\.)?(ade
Azure DBSQL requests[a-z0-9][a-z0-9\\-]{0,61}[a-z0-9]?\\.database\\.windows\\.net/?$
Synapse AnalyticsSQL requests[a-z0-9-]{0,61}?(-ondemand)?\\.sql\\.azuresynapse(-dogfood)?\\.net/?$
External DataExternal data.*
Azure Digital TwinsAzure Digital Twins[A-Za-z0-9\\-]{3,63}\\.api\\.[A-Za-z0-9]+\\.digitaltwins\\.azure\\.net/?$

More predefined policies on your cluster may be observed with next query:

.show cluster policy callout 
| where EntityType == 'Cluster immutable policy'
| project Policy

Remarks

If an external resource of a given type matches more than one policy defined for such type, and at least one of the matched policies has their CanCall property set to false, access to the resource is denied.

7.5 - Capacity

7.5.1 - Capacity policy

Learn how to use the capacity policy to control the compute resources of data management operations on a cluster.

A capacity policy is used for controlling the compute resources of data management operations on the cluster.

The capacity policy object

The capacity policy is made of the following components:

To view the capacity of your cluster, use the .show capacity command.

Ingestion capacity

PropertyTypeDescription
ClusterMaximumConcurrentOperationslongThe maximum number of concurrent ingestion operations allowed in a cluster. This value caps the total ingestion capacity, as shown in the following formula.
CoreUtilizationCoefficientrealDetermines the percentage of cores to use in the ingestion capacity calculation.

Formula

The .show capacity command returns the cluster’s ingestion capacity based on the following formula:

Minimum(ClusterMaximumConcurrentOperations , Number of nodes in cluster * Maximum(1, Core count per node * CoreUtilizationCoefficient))

Extents merge capacity

PropertyTypeDescription
MinimumConcurrentOperationsPerNodelongThe minimal number of concurrent extents merge/rebuild operations on a single node. Default is 1.
MaximumConcurrentOperationsPerNodelongThe maximum number of concurrent extents merge/rebuild operations on a single node. Default is 5.
ClusterMaximumConcurrentOperationslongThe maximum number of concurrent extents merge/rebuild operations allowed in a cluster. This value caps the total merge capacity.

Formula

The .show capacity command returns the cluster’s extents merge capacity based on the following formula:

Minimum(Number of nodes in cluster * Concurrent operations per node, ClusterMaximumConcurrentOperations)

The effective value for Concurrent operations per node is automatically adjusted by the system in the range [MinimumConcurrentOperationsPerNode,MaximumConcurrentOperationsPerNode], as long as the success rate of the merge operations is 90% or higher.

Extents purge rebuild capacity

PropertyTypeDescription
MaximumConcurrentOperationsPerNodelongThe maximum number of concurrent rebuild extents for purge operations on a single node.

Formula

The .show capacity command returns the cluster’s extents purge rebuild capacity based on the following formula:

Number of nodes in cluster x MaximumConcurrentOperationsPerNode

Export capacity

PropertyTypeDescription
ClusterMaximumConcurrentOperationslongThe maximum number of concurrent export operations in a cluster. This value caps the total export capacity, as shown in the following formula.
CoreUtilizationCoefficientlongDetermines the percentage of cores to use in the export capacity calculation.

Formula

The .show capacity command returns the cluster’s export capacity based on the following formula:

Minimum(ClusterMaximumConcurrentOperations , Number of nodes in cluster * Maximum(1, Core count per node * CoreUtilizationCoefficient))

Extents partition capacity

PropertyTypeDescription
ClusterMinimumConcurrentOperationslongThe minimal number of concurrent extents partition operations in a cluster. Default is 1.
ClusterMaximumConcurrentOperationslongThe maximum number of concurrent extents partition operations in a cluster. Default is 32.

The effective value for Concurrent operations is automatically adjusted by the system in the range [ClusterMinimumConcurrentOperations,ClusterMaximumConcurrentOperations], as long as the success rate of the partitioning operations is 90% or higher.

Materialized views capacity policy

The policy can be used to change concurrency settings for materialized views. Changing the materialized views capacity policy can be useful when there’s more than a single materialized view defined on a cluster.

PropertyTypeDescription
ClusterMinimumConcurrentOperationslongThe minimal number of concurrent materialization operations in a cluster. Default is 1.
ClusterMaximumConcurrentOperationslongThe maximum number of concurrent materialization operations in a cluster. Default is 10.

By default, only a single materialization runs concurrently (see how materialized views work). The system adjusts the current concurrency in the range [ClusterMinimumConcurrentOperations,ClusterMaximumConcurrentOperations], based on the number of materialized views in the cluster and the cluster’s CPU. You can increase/decrease concurrency by altering this policy. For example, if the cluster has 10 materialized views, setting the ClusterMinimumConcurrentOperations to five ensures that at least five of them can materialize concurrently. You can view the effective value for the current concurrency using the .show capacity command

Stored query results capacity

PropertyTypeDescription
MaximumConcurrentOperationsPerDbAdminlongThe maximum number of concurrent ingestion operations in a cluster admin node.
CoreUtilizationCoefficientrealDetermines the percentage of cores to use in the stored query results creation calculation.

Formula

The .show capacity command returns the cluster’s stored query results creation capacity based on the following formula:

Minimum(MaximumConcurrentOperationsPerDbAdmin , Number of nodes in cluster * Maximum(1, Core count per node * CoreUtilizationCoefficient))

Streaming ingestion post processing capacity

PropertyTypeDescription
MaximumConcurrentOperationsPerNodelongThe maximum number of concurrent streaming ingestion post processing operations on each cluster node.

Formula

The .show capacity command returns the cluster’s streaming ingestion post processing capacity based on the following formula:

Number of nodes in cluster x MaximumConcurrentOperationsPerNode

Purge storage artifacts cleanup capacity

PropertyTypeDescription
MaximumConcurrentOperationsPerClusterlongThe maximum number of concurrent purge storage artifacts cleanup operations on cluster.

Formula

The .show capacity command returns the cluster’s purge storage artifacts cleanup capacity based on the following formula:

MaximumConcurrentOperationsPerCluster

Periodic storage artifacts cleanup capacity

PropertyTypeDescription
MaximumConcurrentOperationsPerClusterlongThe maximum number of concurrent periodic storage artifacts cleanup operations on cluster.

Formula

The .show capacity command returns the cluster’s periodic storage artifacts cleanup capacity based on the following formula:

MaximumConcurrentOperationsPerCluster

Query Acceleration capacity

PropertyTypeDescription
ClusterMaximumConcurrentOperationslongThe maximum number of concurrent query acceleration caching operations in a cluster. This value caps the total query acceleration caching capacity, as shown in the following formula.
CoreUtilizationCoefficientlongDetermines the percentage of cores to use in the query acceleration caching capacity calculation.

Formula

The .show capacity command returns the cluster’s query acceleration caching capacity based on the following formula:

Minimum(ClusterMaximumConcurrentOperations , Number of nodes in cluster * Maximum(1, Core count per node * CoreUtilizationCoefficient))

Defaults

The default capacity policy has the following JSON representation:

{
  "IngestionCapacity": {
    "ClusterMaximumConcurrentOperations": 512,
    "CoreUtilizationCoefficient": 0.75
  },
  "ExtentsMergeCapacity": {
    "MinimumConcurrentOperationsPerNode": 1,
    "MaximumConcurrentOperationsPerNode": 3
  },
  "ExtentsPurgeRebuildCapacity": {
    "MaximumConcurrentOperationsPerNode": 1
  },
  "ExportCapacity": {
    "ClusterMaximumConcurrentOperations": 100,
    "CoreUtilizationCoefficient": 0.25
  },
  "ExtentsPartitionCapacity": {
    "ClusterMinimumConcurrentOperations": 1,
    "ClusterMaximumConcurrentOperations": 32
  },
  "MaterializedViewsCapacity": {
    "ClusterMaximumConcurrentOperations": 1,
    "ExtentsRebuildCapacity": {
      "ClusterMaximumConcurrentOperations": 50,
      "MaximumConcurrentOperationsPerNode": 5
    }
  },
  "StoredQueryResultsCapacity": {
    "MaximumConcurrentOperationsPerDbAdmin": 250,
    "CoreUtilizationCoefficient": 0.75
  },
  "StreamingIngestionPostProcessingCapacity": {
    "MaximumConcurrentOperationsPerNode": 4
  },
  "PurgeStorageArtifactsCleanupCapacity": {
    "MaximumConcurrentOperationsPerCluster": 2
  },
  "PeriodicStorageArtifactsCleanupCapacity": {
    "MaximumConcurrentOperationsPerCluster": 2
  },
  "QueryAccelerationCapacity": {
    "ClusterMaximumConcurrentOperations": 100,
    "CoreUtilizationCoefficient": 0.5
  }
}

Management commands

Management commands throttling

Kusto limits the number of concurrent requests for the following user-initiated commands:

When the cluster detects that an operation exceeded the limit on concurrent requests:

  • The command’s state, as presented by System information commands, is Throttled.
  • The error message includes the command type, the origin of the throttling and the capacity that exceeded. For example:
    • For example: The management command was aborted due to throttling. Retrying after some backoff might succeed. CommandType: 'TableSetOrAppend', Capacity: 18, Origin: 'CapacityPolicy/Ingestion'.
  • The HTTP response code is 429. The subcode is TooManyRequests.
  • The exception type is ControlCommandThrottledException.

7.6 - Encoding policy

7.6.1 - Encoding policy

This article describes the encoding policy.

The encoding policy defines how data is encoded, compressed, and indexed. This policy applies to all columns of stored data. A default encoding policy is applied based on the column’s data type, and a background process adjusts the encoding policy automatically if necessary.

Scenarios

We recommend the default policy be maintained except for specific scenarios. It can be useful to modify the default column’s encoding policy to fine tune control over the performance/COGS trade-off. For example:

  • The default indexing applied to string columns is built for term searches. If you only query for specific values in the column, COGS might be reduced if the index is simplified using the encoding profile Identifier. For more information, see the string data type.
  • Fields that are never queried on or don’t need fast searches can disable indexing. You can use profile BigObject to turn off the indexes and increase maximal value size in dynamic or string columns. For example, use this profile to store HLL values returned by hll() function.

How it works

Encoding policy changes do not affect data that has already been ingested. Only new ingestion operations will be performed according to the new policy. The encoding policy applies to individual columns in a table, but can be set at the column level, table level (affecting all columns of the table), or database level.

7.7 - Extent tags policy

7.7.1 - Extent tags retention policy

This article describes extent tags retention policies.

The extent tags retention policy controls the mechanism that automatically removes extent tags from tables, based on the age of the extents.

It’s recommended to remove any tags that are no longer helpful, or were used temporarily as part of an ingestion pipeline, and may limit the system from reaching optimal performance. For example: old drop-by: tags, which prevent merging extents together.

The policy can be set at the table-level, or at the database-level. A database-level policy applies to all tables in the database that don’t override the policy.

The policy object

The extent tags retention policy is an array of policy objects. Each object includes the following properties:

Property nameTypeDescriptionExample
TagPrefixstringThe prefix of the tags to be automatically deleted, once RetentionPeriod is exceeded. The prefix must include a colon (:) as its final character, and may only include one colon.drop-by:, ingest-by:, custom_prefix:
RetentionPeriodtimespanThe duration for which it’s guaranteed that the tags aren’t dropped. This period is measured starting from the extent’s creation time.1.00:00:00

Example

The following policy will have any drop-by: tags older than three days and any ingest-by: tags older than two hours automatically dropped:

[
    {
        "TagPrefix": "drop-by:",
        "RetentionPeriod": "3.00:00:00"
    },
    {
        "TagPrefix": "ingest-by:",
        "RetentionPeriod": "02:00:00"
    }
]

Defaults

By default, when the policy isn’t defined, extent tags of any kind are retained as long as the extent isn’t dropped.

Management commands

The following management commands can be used to manage the extent tags retention policy:

7.8 - Ingestion batching

7.8.1 - IngestionBatching policy

Learn how to use the IngestionBatching policy to optimize batching for ingestion.

Overview

During the queued ingestion process, the service optimizes for throughput by batching small ingress data chunks together before ingestion. Batching reduces the resources consumed by the queued ingestion process and doesn’t require post-ingestion resources to optimize the small data shards produced by non-batched ingestion.

The downside to doing batching before ingestion is the forced delay. Therefore, the end-to-end time from requesting the data ingestion until the data ready for query is larger.

When you define the IngestionBatching policy, you’ll need to find a balance between optimizing for throughput and time delay. This policy applies to queued ingestion. It defines the maximum forced delay allowed when batching small blobs together. To learn more about using batching policy commands, and optimizing for throughput, see:

Sealing a batch

There’s an optimal size of about 1 GB of uncompressed data for bulk ingestion. Ingestion of blobs with much less data is suboptimal, so in queued ingestion the service will batch small blobs together.

The following list shows the basic batching policy triggers to seal a batch. A batch is sealed and ingested when the first condition is met:

  • Size: Batch size limit reached or exceeded
  • Count: Batch file number limit reached
  • Time: Batching time has expired

The IngestionBatching policy can be set on databases or tables. Default values are as follows: 5 minutes maximum delay time, 500 items, total size of 1 GB.

The following list shows conditions to seal batches related to single blob ingestion. A batch is sealed and ingested when the conditions are met:

  • SingleBlob_FlushImmediately: Ingest a single blob because ‘FlushImmediately’ was set
  • SingleBlob_IngestIfNotExists: Ingest a single blob because ‘IngestIfNotExists’ was set
  • SingleBlob_IngestByTag: Ingest a single blob because ‘ingest-by’ was set
  • SingleBlob_SizeUnknown: Ingest a single blob because blob size is unknown

If the SystemFlush condition is set, a batch will be sealed when a system flush is triggered. With the SystemFlush parameter set, the system flushes the data, for example due to database scaling or internal reset of system components.

Defaults and limits

TypePropertyDefaultLow latency settingMinimum valueMaximum value
Number of itemsMaximumNumberOfItems500500125,000
Data size (MB)MaximumRawDataSizeMB102410241004096
Time (TimeSpan)MaximumBatchingTimeSpan00:05:0000:00:20 - 00:00:3000:00:1000:30:00

The most effective way of controlling the end-to-end latency using ingestion batching policy is to alter its time boundary at table or database level, according to the higher bound of latency requirements. A database level policy affects all tables in that database that don’t have the table-level policy defined, and any newly created table.

Batch data size

The batching policy data size is set for uncompressed data. For Parquet, AVRO, and ORC files, an estimation is calculated based on file size. For compressed data, the uncompressed data size is evaluated as follows in descending order of accuracy:

  1. If the uncompressed size is provided in the ingestion source options, that value is used.
  2. When ingesting local files using SDKs, zip archives and gzip streams are inspected to assess their raw size.
  3. If previous options don’t provide a data size, a factor is applied to the compressed data size to estimate the uncompressed data size.

Batching latencies

Latencies can result from many causes that can be addressed using batching policy settings.

CauseSolution
Data latency matches the time setting, with too little data to reach the size or count limitReduce the time limit
Inefficient batching due to a large number of very small filesIncrease the size of the source files. If using Kafka Sink, configure it to send data in ~100 KB chunks or higher. If you have many small files, increase the count (up to 2000) in the database or table ingestion policy.
Batching a large amount of uncompressed dataThis is common when ingesting Parquet files. Incrementally decrease size for the table or database batching policy towards 250 MB and check for improvement.
Backlog because the database is under scaledAccept any Azure advisor suggestions to scale aside or scale up your database. Alternatively, manually scale your database to see if the backlog is closed. If these options don’t work, contact support for assistance.

7.9 - Ingestion time

7.9.1 - IngestionTime policy

This article describes IngestionTime policy.

The IngestionTime policy is an optional policy that can be set (enabled) on tables.

When enabled, Kusto adds a hidden datetime column to the table, called $IngestionTime. Now, whenever new data is ingested, the time of ingestion is recorded in the hidden column. That time is measured just before the data is committed.

Since the ingestion time column is hidden, you can’t directly query for its value. Instead, a special function called ingestion_time() retrieves that value. If there’s no datetime column in the table, or the IngestionTime policy wasn’t enabled when a record was ingested, a null value is returned.

The IngestionTime policy is designed for two main scenarios:

  • To allow users to estimate the latency in ingesting data. Many tables with log data have a timestamp column. The timestamp value gets filled by the source and indicates the time when the record was produced. By comparing that column’s value with the ingestion time column, you can estimate the latency for getting the data in.

    [!NOTE] The calculated value is only an estimate, because the source and Kusto don’t necessarily have their clocks synchronized.

  • To support Database Cursors that let users issue consecutive queries, the query is limited to the data that was ingested since the previous query.

For more information. see the management commands for managing the IngestionTime policy.

For more information. see the management commands for managing the IngestionTime policy.

7.10 - Managed identity

7.10.1 - Kusto ManagedIdentity policy

Learn about the ManagedIdentity policy to control managed identities.

ManagedIdentity is a policy that controls which managed identities can be used for what purposes. For example, you can configure a policy that allows a specific managed identity to be used for accessing a storage account for ingestion purposes.

This policy can be enabled at the cluster and database levels. The policy is additive, meaning that for every operation that involves a managed identity, the operation will be permitted if the usage is allowed at either the cluster or database level.

Permissions

Creating or altering a managed identity policy requires AllDatabasesAdmin permissions.

The ManagedIdentity policy object

A cluster or database may have zero or more ManagedIdentity policy objects associated with it. Each ManagedIdentity policy object has the following user-definable properties: DisplayName and AllowedUsages. Other properties are automatically populated from the managed identity associated with the specified ObjectId and displayed for convenience.

The following table describes the properties of the ManagedIdentity policy object:

PropertyTypeRequiredDescription
ObjectIdstring✔️Either the actual object ID of the managed identity or the reserved keyword system to reference the System Managed Identity of the cluster on which the command is run.
ClientIdstringNot applicableThe client ID of the managed identity.
TenantIdstringNot applicableThe tenant ID of the managed identity.
DisplayNamestringNot applicableThe display name of the managed identity.
IsSystemboolNot applicableA Boolean value indicating true if the identity is a System Managed Identity; false if otherwise.
AllowedUsagesstring✔️A list of comma-separated allowed usage values for the managed identity. See managed identity usages.

The following is an example of a ManagedIdentity policy object:

{
  "ObjectId": "<objectID>",
  "ClientId": "<clientID>",
  "TenantId": "<tenantID",
  "DisplayName": "myManagedIdentity",
  "IsSystem": false,
  "AllowedUsages": "NativeIngestion, ExternalTable"
}

Managed identity usages

The following values specify authentication to a usage using the configured managed identity:

ValueDescription
AllAll current and future usages are allowed.
AutomatedFlowsRun a Continuous Export or Update Policy automated flow on behalf of a managed identity.
AzureAIAuthenticate to an Azure OpenAI service using the ai_embed_text plugin with a managed identity.
DataConnectionAuthenticate to data connections to an Event Hub or an Event Grid.
ExternalTableAuthenticate to external tables using connection strings configured with a managed identity.
NativeIngestionAuthenticate to an SDK for native ingestion from an external source.
SandboxArtifactsAuthenticate to external artifacts referenced in sandboxed plugins (e.g., Python) with a managed identity. This usage needs to be defined on the cluster level managed identity policy.
SqlRequestAuthenticate to an external database using the sql_request or cosmosdb_request plugin with a managed identity.

7.11 - Merge policy

7.11.1 - Extents merge policy

Learn how to use the merge policy to define how extents are merged.

The merge policy defines if and how Extents (data shards) should get merged.

There are two types of merge operations: Merge, which rebuilds indexes, and Rebuild, which completely reingests the data.

Both operation types result in a single extent that replaces the source extents.

By default, Rebuild operations are preferred. If there are extents that don’t fit the criteria for being rebuilt, then an attempt will be made to merge them.

Merge policy properties

The merge policy contains the following properties:

  • RowCountUpperBoundForMerge:
    • Defaults to 16,000,000.
    • Maximum allowed row count of the merged extent.
    • Applies to Merge operations, not Rebuild.
  • OriginalSizeMBUpperBoundForMerge:
    • Defaults to 30,000.
    • Maximum allowed original size (in MBs) of the merged extent.
    • Applies to Merge operations, not Rebuild.
  • MaxExtentsToMerge:
    • Defaults to 100.
    • Maximum allowed number of extents to be merged in a single operation.
    • Applies to Merge operations.
    • This value shouldn’t be changed.
  • AllowRebuild:
    • Defaults to ’true'.
    • Defines whether Rebuild operations are enabled (in which case, they’re preferred over Merge operations).
  • AllowMerge:
    • Defaults to ’true'.
    • Defines whether Merge operations are enabled, in which case, they’re less preferred than Rebuild operations.
  • MaxRangeInHours:
    • Defaults to 24.
    • The maximum allowed difference, in hours, between any two different extents’ creation times, so that they can still be merged.
    • Timestamps are of extent creation, and don’t relate to the actual data contained in the extents.
    • Applies to both Merge and Rebuild operations.
    • In materialized views: defaults to 336 (14 days), unless recoverability is disabled in the materialized view’s effective retention policy.
    • This value should be set according to the effective retention policy SoftDeletePeriod, or cache policy DataHotSpan values. Take the lower value of SoftDeletePeriod and DataHotSpan. Set the MaxRangeInHours value to between 2-3% of it. See the examples .
  • Lookback:
    • Defines the timespan during which extents are considered for rebuild/merge.
    • Supported values:
      • Default - The system-managed default. This is the recommended and default value, whose period is currently set to 14 days.
      • All - All extents, hot and cold, are included.
      • HotCache - Only hot extents are included.
      • Custom - Only extents whose age is under the provided CustomPeriod are included. CustomPeriod is a timespan value in the format dd.hh:mm.

Default policy example

The following example shows the default policy:

{
  "RowCountUpperBoundForMerge": 16000000,
  "OriginalSizeMBUpperBoundForMerge": 30000,
  "MaxExtentsToMerge": 100,,
  "MaxRangeInHours": 24,
  "AllowRebuild": true,
  "AllowMerge": true,
  "Lookback": {
    "Kind": "Default",
    "CustomPeriod": null
  }
}

MaxRangeInHours examples

min(SoftDeletePeriod (Retention Policy), DataHotSpan (Cache Policy))Max Range in hours (Merge Policy)
7 days (168 hours)4
14 days (336 hours)8
30 days (720 hours)18
60 days (1,440 hours)36
90 days (2,160 hours)60
180 days (4,320 hours)120
365 days (8,760 hours)250

When a database is created, it’s set with the default merge policy values mentioned above. The policy is by default inherited by all tables created in the database, unless their policies are explicitly overridden at table-level.

For more information, see management commands that allow you to manage merge policies for databases or tables.

7.12 - Mirroring policy

7.12.1 - Mirroring policy

Learn how to use the mirroring policy.

The mirroring policy commands allow you to view, change, partition, and delete your table mirroring policy. They also provide a way to check the mirroring latency by reviewing the operations mirroring status.

Management commands

The policy object

The mirroring policy includes the following properties:

PropertyDescriptionValuesDefault
FormatThe format of your mirrored files.Valid value is parquet.parquet
ConnectionStringsAn array of connection strings that help configure and establish connections. This value is autopopulated.
IsEnabledDetermines whether the mirroring policy is enabled. When the mirroring policy is disabled and set to false, the underlying mirroring data is retained in the database.true, false, null.null
PartitionsA comma-separated list of columns used to divide the data into smaller partitions.See Partitions formatting.

Data types mapping

To ensure compatibility and optimize queries, ensure that your data types are properly mapped to the parquet data types.

Event house to Delta parquet data types mapping

Event house data types are mapped to Delta Parquet data types using the following rules:

Event house data typeDelta data type
boolboolean
datetimetimestamp OR date (for date-bound partition definitions)
dynamicstring
guidstring
intinteger
longlong
realdouble
stringstring
timespanlong
decimaldecimal(38,18)

For more information on Event house data types, see Scalar data types.

Example policy

{
  "Format": "parquet",
  "IsEnabled": true,
  "Partitions": null,
}

7.13 - Partitioning policy

7.13.1 - Partitioning policy

Learn how to use the partitioning policy to improve query performance.

The partitioning policy defines if and how extents (data shards) should be partitioned for a specific table or a materialized view.

The policy triggers an additional background process that takes place after the creation of extents, following data ingestion. This process includes reingesting data from the source extents and producing homogeneous extents, in which all values of the column designated as the partition key reside within a single partition.

The primary objective of the partitioning policy is to enhance query performance in specific supported scenarios.

Supported scenarios

The following are the only scenarios in which setting a data partitioning policy is recommended. In all other scenarios, setting the policy isn’t advised.

  • Frequent filters on a medium or high cardinality string or guid column:
    • For example: multitenant solutions, or a metrics table where most or all queries filter on a column of type string or guid, such as the TenantId or the MetricId.
    • Medium cardinality is at least 10,000 distinct values.
    • Set the hash partition key to be the string or guid column, and set the PartitionAssignmentMode property to uniform.
  • Frequent aggregations or joins on a high cardinality string or guid column:
    • For example, IoT information from many different sensors, or academic records of many different students.
    • High cardinality is at least 1,000,000 distinct values, where the distribution of values in the column is approximately even.
    • In this case, set the hash partition key to be the column frequently grouped-by or joined-on, and set the PartitionAssignmentMode property to ByPartition.
  • Out-of-order data ingestion:
    • Data ingested into a table might not be ordered and partitioned into extents (shards) according to a specific datetime column that represents the data creation time and is commonly used to filter data. This could be due to a backfill from heterogeneous source files that include datetime values over a large time span.
    • In this case, set the uniform range datetime partition key to be the datetime column.
    • If you need retention and caching policies to align with the datetime values in the column, instead of aligning with the time of ingestion, set the OverrideCreationTime property to true.

Partition keys

The following kinds of partition keys are supported.

KindColumn TypePartition propertiesPartition value
Hashstring or guidFunction, MaxPartitionCount, Seed, PartitionAssignmentModeFunction(ColumnName, MaxPartitionCount, Seed)
Uniform rangedatetimeRangeSize, Reference, OverrideCreationTimebin_at(ColumnName, RangeSize, Reference)

Hash partition key

If the policy includes a hash partition key, all homogeneous extents that belong to the same partition will be assigned to the same data node.

  • A hash-modulo function is used to partition the data.
  • Data in homogeneous (partitioned) extents is ordered by the hash partition key.
    • You don’t need to include the hash partition key in the row order policy, if one is defined on the table.
  • Queries that use the shuffle strategy, and in which the shuffle key used in join, summarize or make-series is the table’s hash partition key, are expected to perform better because the amount of data required to move across nodes is reduced.

Partition properties

PropertyDescriptionSupported value(s)Recommended value
FunctionThe name of a hash-modulo function to use.XxHash64
MaxPartitionCountThe maximum number of partitions to create (the modulo argument to the hash-modulo function) per time period.In the range (1,2048].Higher values lead to greater overhead of the data partitioning process, and a higher number of extents for each time period. The recommended value is 128. Higher values will significantly increase the overhead of partitioning the data post-ingestion, and the size of metadata - and are therefore not recommended.
SeedUse for randomizing the hash value.A positive integer.1, which is also the default value.
PartitionAssignmentModeThe mode used for assigning partitions to nodes.ByPartition: All homogeneous (partitioned) extents that belong to the same partition are assigned to the same node.
Uniform: An extents’ partition values are disregarded. Extents are assigned uniformly to the nodes.
If queries don’t join or aggregate on the hash partition key, use Uniform. Otherwise, use ByPartition.

Hash partition key example

A hash partition key over a string-typed column named tenant_id. It uses the XxHash64 hash function, with MaxPartitionCount set to the recommended value 128, and the default Seed of 1.

{
  "ColumnName": "tenant_id",
  "Kind": "Hash",
  "Properties": {
    "Function": "XxHash64",
    "MaxPartitionCount": 128,
    "Seed": 1,
    "PartitionAssignmentMode": "Uniform"
  }
}

Uniform range datetime partition key

In these cases, you can reshuffle the data between extents so that each extent includes records from a limited time range. This process results in filters on the datetime column being more effective at query time.

The partition function used is bin_at() and isn’t customizable.

Partition properties

PropertyDescriptionRecommended value
RangeSizeA timespan scalar constant that indicates the size of each datetime partition.Start with the value 1.00:00:00 (one day). Don’t set a shorter value, because it may result in the table having a large number of small extents that can’t be merged.
ReferenceA datetime scalar constant that indicates a fixed point in time, according to which datetime partitions are aligned.Start with 1970-01-01 00:00:00. If there are records in which the datetime partition key has null values, their partition value is set to the value of Reference.
OverrideCreationTimeA bool indicating whether or not the result extent’s minimum and maximum creation times should be overridden by the range of the values in the partition key.Defaults to false. Set to true if data isn’t ingested in-order of time of arrival. For example, a single source file may include datetime values that are distant, and/or you may want to enforce retention or caching based on the datetime values rather than the time of ingestion.

When OverrideCreationTime is set to true, extents may be missed in the merge process. Extents are missed if their creation time is older than the Lookback period of the table’s Extents merge policy. To make sure that the extents are discoverable, set the Lookback property to HotCache.

Uniform range datetime partition example

The snippet shows a uniform datetime range partition key over a datetime typed column named timestamp. It uses datetime(2021-01-01) as its reference point, with a size of 7d for each partition, and doesn’t override the extents’ creation times.

{
  "ColumnName": "timestamp",
  "Kind": "UniformRange",
  "Properties": {
    "Reference": "2021-01-01T00:00:00",
    "RangeSize": "7.00:00:00",
    "OverrideCreationTime": false
  }
}

The policy object

By default, a table’s data partitioning policy is null, in which case data in the table won’t be repartitioned after it’s ingested.

The data partitioning policy has the following main properties:

  • PartitionKeys:

  • EffectiveDateTime:

    • The UTC datetime from which the policy is effective.
    • This property is optional. If it isn’t specified, the policy will take effect for data ingested after the policy was applied.

Data partitioning example

Data partitioning policy object with two partition keys.

  1. A hash partition key over a string-typed column named tenant_id.
    • It uses the XxHash64 hash function, with MaxPartitionCount set to the recommended value 128, and the default Seed of 1.
  2. A uniform datetime range partition key over a datetime type column named timestamp.
    • It uses datetime(2021-01-01) as its reference point, with a size of 7d for each partition.
{
  "PartitionKeys": [
    {
      "ColumnName": "tenant_id",
      "Kind": "Hash",
      "Properties": {
        "Function": "XxHash64",
        "MaxPartitionCount": 128,
        "Seed": 1,
        "PartitionAssignmentMode": "Uniform"
      }
    },
    {
      "ColumnName": "timestamp",
      "Kind": "UniformRange",
      "Properties": {
        "Reference": "2021-01-01T00:00:00",
        "RangeSize": "7.00:00:00",
        "OverrideCreationTime": false
      }
    }
  ]
}

Additional properties

The following properties can be defined as part of the policy. These properties are optional and we recommend not changing them.

PropertyDescriptionRecommended valueDefault value
MinRowCountPerOperationMinimum target for the sum of row count of the source extents of a single data partitioning operation.0
MaxRowCountPerOperationMaximum target for the sum of the row count of the source extents of a single data partitioning operation.Set a value lower than 5M if you see that the partitioning operations consume a large amount of memory or CPU per operation.0, with a default target of 5,000,000 records.
MaxOriginalSizePerOperationMaximum target for the sum of the original size (in bytes) of the source extents of a single data partitioning operation.If the partitioning operations consume a large amount of memory or CPU per operation, set a value lower than 5 GB.0, with a default target of 5,368,709,120 bytes (5 GB).

The data partitioning process

  • Data partitioning runs as a post-ingestion background process.
    • A table that is continuously ingested into is expected to always have a “tail” of data that is yet to be partitioned (nonhomogeneous extents).
  • Data partitioning runs only on hot extents, regardless of the value of the EffectiveDateTime property in the policy.
    • If partitioning cold extents is required, you need to temporarily adjust the caching policy.

You can monitor the partitioning status of tables with defined policies in a database by using the .show database extents partitioning statistics command and partitioning metrics.

Partitioning capacity

  • The data partitioning process results in the creation of more extents. The extents merge capacity may gradually increase, so that the process of merging extents can keep up.

  • If there’s a high ingestion throughput, or a large enough number of tables that have a partitioning policy defined, then the Extents partition capacity may gradually increase, so that the process of partitioning extents can keep up.

  • To avoid consuming too many resources, these dynamic increases are capped. You may be required to gradually and linearly increase them beyond the cap, if they’re used up entirely.

    • If increasing the capacities causes a significant increase in the use of the cluster’s resources, you can scale the cluster up/out, either manually, or by enabling autoscale.

Limitations

  • Attempts to partition data in a database that already has more than 5,000,000 extents will be throttled.
    • In such cases, the EffectiveDateTime property of partitioning policies of tables in the database will be automatically delayed by several hours, so that you can reevaluate your configuration and policies.

Outliers in partitioned columns

  • The following situations can contribute to imbalanced distribution of data across nodes, and degrade query performance:
    • If a hash partition key includes values that are much more prevalent than others, for example, an empty string, or a generic value (such as null or N/A).
    • The values represent an entity (such as tenant_id) that is more prevalent in the dataset.
  • If a uniform range datetime partition key has a large enough percentage of values that are “far” from the majority of the values in the column, the overhead of the data partitioning process is increased and may lead to many small extents to keep track of. An example of such a situation is datetime values from the distant past or future.

In both of these cases, either “fix” the data, or filter out any irrelevant records in the data before or at ingestion time, to reduce the overhead of the data partitioning. For example, use an update policy.

7.14 - Query acceleration policy

7.14.1 - Query acceleration policy (preview)

Learn how to use the query acceleration policy to accelerate queries over external delta tables.

An external table is a schema entity that references data stored external to a Kusto database. Queries run over external tables can be less performant than on data that is ingested due to various factors such as network calls to fetch data from storage, the absence of indexes, and more. Query acceleration allows specifying a policy on top of external delta tables. This policy defines a number of days to accelerate data for high-performance queries.

Query acceleration is supported in Azure Data Explorer over Azure Data Lake Store Gen2 or Azure blob storage external tables.

Query acceleration is supported in Eventhouse over OneLake, Azure Data Lake Store Gen2, or Azure blob storage external tables.

To enable query acceleration in the Fabric UI, see Query acceleration over OneLake shortcuts.

Limitations

  • The number of columns in the external table can’t exceed 900.
  • Delta tables with checkpoint V2 are not supported.
  • Query performance over accelerated external delta tables which have partitions may not be optimal during preview.
  • The feature assumes delta tables with static advanced features, for example column mapping doesn’t change, partitions don’t change, and so on. To change advanced features, first disable the policy, and once the change is made, re-enable the policy.
  • Schema changes on the delta table must also be followed with the respective .alter external delta table schema, which might result in acceleration starting from scratch if there was breaking schema change.
  • Index-based pruning isn’t supported for partitions.
  • Parquet files larger than 1 GB won’t be cached.
  • Query acceleration isn’t supported for external tables with impersonation authentication.

Known issues

  • Data in the external delta table that is optimized with the OPTIMIZE function will need to be reaccelearted.
  • If you run frequent MERGE/UPDATE/DELETE operations in delta, the underlying parquet files may be rewritten with changes and Kusto will skip accelerating such files, causing retrieval during query time.
  • The system assumes that all artifacts under the delta table directory have the same access level to the selected users. Different files having different access permissions under the delta table directory might result with unexpected behavior.

Commands for query acceleration

7.15 - Query week consistency policy

7.15.1 - Query weak consistency policy

Learn how to use the query weak consistency policy to configure the weak consistency service.

The query weak consistency policy is a cluster-level policy object that configures the weak consistency service.

Management commands

The policy object

The query weak consistency policy includes the following properties:

PropertyDescriptionValuesDefault
PercentageOfNodesThe percentage of nodes in the cluster that execute the query weak consistency service (the selected nodes will execute the weakly consistent queries).An integer between 1 to 100, or -1 for default value (which is currently 20%).-1
MinimumNumberOfNodesMinimum number of nodes that execute the query weak consistency service (will determine the number of nodes in case PercentageOfNodes*#NodesInCluster is smaller).A positive integer, or -1 for default value (which is currently 2). Smaller or equal to MaximumNumberOfNodes.-1
MaximumNumberOfNodesMaximum number of nodes that execute the query weak consistency service (will determine the number of nodes in case PercentageOfNodes*#NodesInCluster is greater).A positive integer, or -1 for default value (which is currently 30). Greater or equal to MinimumNumberOfNodes.-1
SuperSlackerNumberOfNodesThresholdIf the total number of nodes in the cluster exceeds this number, nodes that execute the weak consistency service will become ‘super slacker’, meaning they won’t have data on them (in order to reduce load). See Warning below.A positive integer that is greater than or equal to 4, or -1 for default value (currently no threshold - weak consistency nodes won’t become ‘super slacker’).-1
EnableMetadataPrefetchWhen set to true, database metadata will be pre-loaded when the cluster comes up, and reloaded every few minutes, on all weak consistency nodes. When set to false, database metadata load will be triggered by queries (on demand), so some queries might be delayed (until the database metadata is pulled from storage). Database metadata must be reloaded from storage to query the database, when its age is greater than MaximumLagAllowedInMinutes. See Warning and Important below.true or falsefalse
MaximumLagAllowedInMinutesThe maximum duration (in minutes) that weakly consistent metadata is allowed to lag behind. If metadata is older than this value, the most up-to-date metadata will be pulled from storage (when the database is queried, or periodically if EnableMetadataPrefech is enabled). See Warning below.An integer between 1 to 60, or -1 for default value (currently 5 minutes).-1
RefreshPeriodInSecondsThe refresh period (in seconds) to update a database metadata on each weak consistency node. See Warning below.An integer between 30 to 1800, or -1 for default value (currently 120 seconds).-1

Default policy

The default policy is:

{
  "PercentageOfNodes": -1,
  "MinimumNumberOfNodes": -1,
  "MaximumNumberOfNodes": -1,
  "SuperSlackerNumberOfNodesThreshold": -1,
  "EnableMetadataPrefetch": false,
  "MaximumLagAllowedInMinutes": -1,
  "RefreshPeriodInSeconds": -1
}

7.16 - Restricted view access

7.16.1 - Restricted view access policy

Learn how to use the restricted view access policy to limit the principals who can query specified tables in a database.

The restricted view access policy is an optional security feature that governs view permissions on a table. By default, the policy is disabled. When enabled, the policy adds an extra layer of permission requirements for principals to access and view the table.

For a table with an enabled restricted view access policy, only principals assigned the UnrestrictedViewer role have the necessary permissions to view the table. Even principals with roles like Table Admin or Database Admin are restricted unless granted the UnrestrictedViewer role.

While the restricted view access policy is specific to individual tables, the UnrestrictedViewer role operates at the database level. Thereby, a principal with the UnrestrictedViewer role has view permissions for all tables within the database. For more detailed information on managing table view access, see Manage view access to tables.

Limitations

7.17 - Retention policy

7.17.1 - Retention policy

Learn how to use the retention policy to control how data is removed.

The retention policy controls the mechanism that automatically removes data from tables or materialized views. It’s useful to remove data that continuously flows into a table, and whose relevance is age-based. For example, the policy can be used for a table that holds diagnostics events that may become uninteresting after two weeks.

The retention policy can be configured for a specific table or materialized view, or for an entire database. The policy then applies to all tables in the database that don’t override it. When the policy is configured both at the database and table level, the retention policy in the table takes precedence over the database policy.

Setting up a retention policy is important when continuously ingesting data, which will limit costs.

Data that is “outside” the retention policy is eligible for removal. There’s no specific guarantee when removal occurs. Data may “linger” even if the retention policy is triggered.

The retention policy is most commonly set to limit the age of the data since ingestion. For more information, see SoftDeletePeriod.

deleted before the limit is exceeded, but deletion isn’t immediate following that point.

The policy object

A retention policy includes the following properties:

  • SoftDeletePeriod:
    • Time span for which it’s guaranteed that the data is kept available to query. The period is measured starting from the time the data was ingested.
    • Defaults to 1,000 years.
    • When altering the soft-delete period of a table or database, the new value applies to both existing and new data.
  • Recoverability:
    • Data recoverability (Enabled/Disabled) after the data was deleted.
    • Defaults to Enabled.
    • If set to Enabled, the data will be recoverable for 14 days after it’s been soft-deleted.
    • It is not possible to configure the recoverability period.

Management commands

Defaults

By default, when a database or a table is created, it doesn’t have a retention policy defined. Normally, the database is created and then immediately has its retention policy set by its creator according to known requirements. When you run a .show command for the retention policy of a database or table that hasn’t had its policy set, Policy appears as null.

The default retention policy, with the default values mentioned above, can be applied using the following command.

.alter database DatabaseName policy retention "{}"
.alter table TableName policy retention "{}"
.alter materialized-view ViewName policy retention "{}"

The command results in the following policy object applied to the database or table.

{
  "SoftDeletePeriod": "365000.00:00:00", "Recoverability":"Enabled"
}

Clearing the retention policy of a database or table can be done using the following command.

.delete database DatabaseName policy retention
.delete table TableName policy retention

Examples

For an environment that has a database named MyDatabase, with tables MyTable1, MyTable2, and MySpecialTable.

Soft-delete period of seven days and recoverability disabled

Set all tables in the database to have a soft-delete period of seven days and disabled recoverability.

  • Option 1 (Recommended): Set a database-level retention policy, and verify there are no table-level policies set.

    .delete table MyTable1 policy retention        // optional, only if the table previously had its policy set
    .delete table MyTable2 policy retention        // optional, only if the table previously had its policy set
    .delete table MySpecialTable policy retention  // optional, only if the table previously had its policy set
    .alter-merge database MyDatabase policy retention softdelete = 7d recoverability = disabled
    .alter-merge materialized-view ViewName policy retention softdelete = 7d 
    
  • Option 2: For each table, set a table-level retention policy, with a soft-delete period of seven days and recoverability disabled.

    .alter-merge table MyTable1 policy retention softdelete = 7d recoverability = disabled
    .alter-merge table MyTable2 policy retention softdelete = 7d recoverability = disabled
    .alter-merge table MySpecialTable policy retention softdelete = 7d recoverability = disabled
    

Soft-delete period of seven days and recoverability enabled

  • Set tables MyTable1 and MyTable2 to have a soft-delete period of seven days and recoverability disabled.

  • Set MySpecialTable to have a soft-delete period of 14 days and recoverability enabled.

  • Option 1 (Recommended): Set a database-level retention policy, and set a table-level retention policy.

    .delete table MyTable1 policy retention   // optional, only if the table previously had its policy set
    .delete table MyTable2 policy retention   // optional, only if the table previously had its policy set
    .alter-merge database MyDatabase policy retention softdelete = 7d recoverability = disabled
    .alter-merge table MySpecialTable policy retention softdelete = 14d recoverability = enabled
    
  • Option 2: For each table, set a table-level retention policy, with the relevant soft-delete period and recoverability.

    .alter-merge table MyTable1 policy retention softdelete = 7d recoverability = disabled
    .alter-merge table MyTable2 policy retention softdelete = 7d recoverability = disabled
    .alter-merge table MySpecialTable policy retention softdelete = 14d recoverability = enabled
    

Soft-delete period of seven days, and MySpecialTable keeps its data indefinitely

Set tables MyTable1 and MyTable2 to have a soft-delete period of seven days, and have MySpecialTable keep its data indefinitely.

  • Option 1: Set a database-level retention policy, and set a table-level retention policy, with a soft-delete period of 1,000 years, the default retention policy, for MySpecialTable.

    .delete table MyTable1 policy retention   // optional, only if the table previously had its policy set
    .delete table MyTable2 policy retention   // optional, only if the table previously had its policy set
    .alter-merge database MyDatabase policy retention softdelete = 7d
    .alter table MySpecialTable policy retention "{}" // this sets the default retention policy
    
  • Option 2: For tables MyTable1 and MyTable2, set a table-level retention policy, and verify that the database-level and table-level policy for MySpecialTable aren’t set.

    .delete database MyDatabase policy retention   // optional, only if the database previously had its policy set
    .delete table MySpecialTable policy retention   // optional, only if the table previously had its policy set
    .alter-merge table MyTable1 policy retention softdelete = 7d
    .alter-merge table MyTable2 policy retention softdelete = 7d
    
  • Option 3: For tables MyTable1 and MyTable2, set a table-level retention policy. For table MySpecialTable, set a table-level retention policy with a soft-delete period of 1,000 years, the default retention policy.

    .alter-merge table MyTable1 policy retention softdelete = 7d
    .alter-merge table MyTable2 policy retention softdelete = 7d
    .alter table MySpecialTable policy retention "{}"
    

7.18 - Row level security policy

7.18.1 - Row level security policy

Learn how to use the Row Level Security policy to control access to rows in a database table.

Use group membership or execution context to control access to rows in a database table.

Row Level Security (RLS) simplifies the design and coding of security. It lets you apply restrictions on data row access in your application. For example, limit user access to rows relevant to their department, or restrict customer access to only the data relevant to their company.

The access restriction logic is located in the database tier, rather than away from the data in another application tier. The database system applies the access restrictions every time data access is attempted from any tier. This logic makes your security system more reliable and robust by reducing the surface area of your security system.

RLS lets you provide access to other applications and users, only to a certain portion of a table. For example, you might want to:

  • Grant access only to rows that meet some criteria
  • Anonymize data in some of the columns
  • All of the above

For more information, see management commands for managing the Row Level Security policy.

Limitations

  • There’s no limit on the number of tables on which Row Level Security policy can be configured.
  • Row Level Security policy cannot be configured on External Tables.
  • The RLS policy can’t be enabled on a table under the following circumstances:
  • The RLS query can’t reference other tables that have Row Level Security policy enabled.
  • The RLS query can’t reference tables located in other databases.

Examples

Limit access to Sales table

In a table named Sales, each row contains details about a sale. One of the columns contains the name of the salesperson. Instead of giving your salespeople access to all records in Sales, enable a Row Level Security policy on this table to only return records where the salesperson is the current user:

Sales | where SalesPersonAadUser == current_principal()

You can also mask the email address:

Sales | where SalesPersonAadUser == current_principal() | extend EmailAddress = "****"

If you want every sales person to see all the sales of a specific country/region, you can define a query similar to:

let UserToCountryMapping = datatable(User:string, Country:string)
[
  "john@domain.com", "USA",
  "anna@domain.com", "France"
];
Sales
| where Country in ((UserToCountryMapping | where User == current_principal_details()["UserPrincipalName"] | project Country))

If you have a group that contains the managers, you might want to give them access to all rows. Here’s the query for the Row Level Security policy.

let IsManager = current_principal_is_member_of('aadgroup=sales_managers@domain.com');
let AllData = Sales | where IsManager;
let PartialData = Sales | where not(IsManager) and (SalesPersonAadUser == current_principal()) | extend EmailAddress = "****";
union AllData, PartialData

Expose different data to members of different Microsoft Entra groups

If you have multiple Microsoft Entra groups, and you want the members of each group to see a different subset of data, use this structure for an RLS query.

Customers
| where (current_principal_is_member_of('aadgroup=group1@domain.com') and <filtering specific for group1>) or
        (current_principal_is_member_of('aadgroup=group2@domain.com') and <filtering specific for group2>) or
        (current_principal_is_member_of('aadgroup=group3@domain.com') and <filtering specific for group3>)

Apply the same RLS function on multiple tables

First, define a function that receives the table name as a string parameter, and references the table using the table() operator.

For example:

.create-or-alter function RLSForCustomersTables(TableName: string) {
    table(TableName)
    | ...
}

Then configure RLS on multiple tables this way:

.alter table Customers1 policy row_level_security enable "RLSForCustomersTables('Customers1')"
.alter table Customers2 policy row_level_security enable "RLSForCustomersTables('Customers2')"
.alter table Customers3 policy row_level_security enable "RLSForCustomersTables('Customers3')"

Produce an error upon unauthorized access

If you want nonauthorized table users to receive an error instead of returning an empty table, use the assert() function. The following example shows you how to produce this error in an RLS function:

.create-or-alter function RLSForCustomersTables() {
    MyTable
    | where assert(current_principal_is_member_of('aadgroup=mygroup@mycompany.com') == true, "You don't have access")
}

You can combine this approach with other examples. For example, you can display different results to users in different Microsoft Entra groups, and produce an error for everyone else.

Control permissions on follower databases

The RLS policy that you configure on the production database will also take effect in the follower databases. You can’t configure different RLS policies on the production and follower databases. However, you can use the current_cluster_endpoint() function in your RLS query to achieve the same effect, as having different RLS queries in follower tables.

For example:

.create-or-alter function RLSForCustomersTables() {
    let IsProductionCluster = current_cluster_endpoint() == "mycluster.eastus.kusto.windows.net";
    let DataForProductionCluster = TempTable | where IsProductionCluster;
    let DataForFollowerClusters = TempTable | where not(IsProductionCluster) | extend EmailAddress = "****";
    union DataForProductionCluster, DataForFollowerClusters
}

Control permissions on shortcut databases

The RLS policy that you configure on the production database will also take effect in the shortcut databases. You can’t configure different RLS policies on the production and shortcut databases. However, you can use the current_cluster_endpoint() function in your RLS query to achieve the same effect, as having different RLS queries in shortcut tables.

For example:

.create-or-alter function RLSForCustomersTables() {
    let IsProductionCluster = current_cluster_endpoint() == "mycluster.eastus.kusto.windows.net";
    let DataForProductionCluster = TempTable | where IsProductionCluster;
    let DataForFollowerClusters = TempTable | where not(IsProductionCluster) | extend EmailAddress = "****";
    union DataForProductionCluster, DataForFollowerClusters
}

More use cases

  • A call center support person may identify callers by several digits of their social security number. This number shouldn’t be fully exposed to the support person. An RLS policy can be applied on the table to mask all but the last four digits of the social security number in the result set of any query.
  • Set an RLS policy that masks personally identifiable information (PII), and enables developers to query production environments for troubleshooting purposes without violating compliance regulations.
  • A hospital can set an RLS policy that allows nurses to view data rows for their patients only.
  • A bank can set an RLS policy to restrict access to financial data rows based on an employee’s business division or role.
  • A multi-tenant application can store data from many tenants in a single tableset (which is efficient). They would use an RLS policy to enforce a logical separation of each tenant’s data rows from every other tenant’s rows, so each tenant can see only its data rows.

Performance impact on queries

When an RLS policy is enabled on a table, there will be some performance impact on queries that access that table. Access to the table will be replaced by the RLS query that’s defined on that table. The performance impact of an RLS query will normally consist of two parts:

  • Membership checks in Microsoft Entra ID: Checks are efficient. You can check membership in tens, or even hundreds of groups without major impact on the query performance.
  • Filters, joins, and other operations that are applied on the data: Impact depends on the complexity of the query

For example:

let IsRestrictedUser = current_principal_is_member_of('aadgroup=some_group@domain.com');
let AllData = MyTable | where not(IsRestrictedUser);
let PartialData = MyTable | where IsRestrictedUser and (...);
union AllData, PartialData

If the user isn’t part of some_group@domain.com, then IsRestrictedUser is evaluated to false. The query that is evaluated is similar to this one:

let AllData = MyTable;           // the condition evaluates to `true`, so the filter is dropped
let PartialData = <empty table>; // the condition evaluates to `false`, so the whole expression is replaced with an empty table
union AllData, PartialData       // this will just return AllData, as PartialData is empty

Similarly, if IsRestrictedUser evaluates to true, then only the query for PartialData will be evaluated.

Improve query performance when RLS is used

Performance impact on ingestion

There’s no performance impact on ingestion.

7.19 - Row order policy

7.19.1 - Row order policy

Learn how to use the row order policy to order rows in an extent.

The row order policy sets the preferred arrangement of rows within an extent. The policy is optional and set at the table level.

The main purpose of the policy is to improve the performance of queries that are narrowed to a small subset of values in ordered columns. Additionally, it may contribute to improvements in compression.

Use management commands to alter, alter-merge delete, or show the row order policy for a table.

When to set the policy

It’s appropriate to set the policy under the following conditions:

  • Most queries filter on specific values of a certain large-dimension column, such as an “application ID” or a “tenant ID”
  • The data ingested into the table is unlikely to be preordered according to this column

Performance considerations

There are no hardcoded limits set on the amount of columns, or sort keys, that can be defined as part of the policy. However, every additional column adds some overhead to the ingestion process, and as more columns are added, the effective return diminishes.

7.20 - Sandbox policy

7.20.1 - Sandbox policy

This article describes Sandbox policy.

Certain plugins run within sandboxes whose available resources are limited and controlled for security and for resource governance.

Sandboxes run on the nodes of your cluster. Some of their limitations are defined in sandbox policies, where each sandbox kind can have its own policy.

Sandbox policies are managed at cluster-level and affect all the nodes in the cluster.

Permissions

You must have AllDatabasesAdmin permissions to run this command.

The policy object

A sandbox policy has the following properties.

  • SandboxKind: Defines the type of the sandbox (such as, PythonExecution, RExecution).
  • IsEnabled: Defines if sandboxes of this type may run on the cluster’s nodes.
    • The default value is false.
  • InitializeOnStartup: Defines whether sandboxes of this type are initialized on startup, or lazily, upon first use.
    • The default value is false. To ensure consistent performance and avoid any delays for running queries following service restart, set this property to true.
  • TargetCountPerNode: Defines how many sandboxes of this type are allowed to run on the cluster’s nodes.
    • Values can be between one and twice the number of processors per node.
    • The default value is 16.
  • MaxCpuRatePerSandbox: Defines the maximum CPU rate as a percentage of all available cores that a single sandbox can use.
    • Values can be between 1 and 100.
    • The default value is 50.
  • MaxMemoryMbPerSandbox: Defines the maximum amount of memory (in megabytes) that a single sandbox can use.
    • For Hyper-V technology sandboxes, values can be between 200 and 32768 (32 GB). The default value is 1024 (1 GB). The maximum memory of all sandboxes on a node (TargetCountPerNode * MaxMemoryMbPerSandbox) is 32768 (32 GB).
    • For legacy sandboxes, values can be between 200 and 65536 (64 GB). The default value is 20480 (20 GB).

If a policy isn’t explicitly defined for a sandbox kind, an implicit policy with the default values and IsEnabled set to true applies.

Example

The following policy sets different limits for PythonExecution and RExecution sandboxes:

[
  {
    "SandboxKind": "PythonExecution",
    "IsEnabled": true,
    "InitializeOnStartup": false,
    "TargetCountPerNode": 4,
    "MaxCpuRatePerSandbox": 55,
    "MaxMemoryMbPerSandbox": 8192
  },
  {
    "SandboxKind": "RExecution",
    "IsEnabled": true,
    "InitializeOnStartup": false,
    "TargetCountPerNode": 2,
    "MaxCpuRatePerSandbox": 50,
    "MaxMemoryMbPerSandbox": 10240
  }
]

7.20.2 - Sandboxes

This article describes Sandboxes.

Kusto can run sandboxes for specific flows that must be run in a secure and isolated environment. Examples of these flows are user-defined scripts that run using the Python plugin or the R plugin.

Sandboxes are run locally (meaning, processing is done close to the data), with no extra latency for remote calls.

Prerequisites and limitations

  • Sandboxes must run on VM sizes supporting nested virtualization, which implemented using Hyper-V technology and have no limitations.
  • The image for running the sandboxes is deployed to every cluster node and requires dedicated SSD space to run.
    • The estimated size is between 10-20 GB.
    • This affects the cluster’s data capacity, and may affect the cost of the cluster.

Runtime

  • A sandboxed query operator may use one or more sandboxes for its execution.
    • A sandbox is only used for a single query and is disposed of once that query completes.
    • When a node is restarted, for example, as part of a service upgrade, all running sandboxes on it are disposed of.
  • Each node maintains a predefined number of sandboxes that are ready for running incoming requests.
    • Once a sandbox is used, a new one is automatically made available to replace it.
  • If there are no pre-allocated sandboxes available to serve a query operator, it will be throttled until new sandboxes are available. For more information, see Errors. New sandbox allocation could take up to 10-15 seconds per sandbox, depending on the SKU and available resources on the data node.

Sandbox parameters

Some of the parameters can be controlled using a cluster-level sandbox policy, for each kind of sandbox.

  • Number of sandboxes per node: The number of sandboxes per node is limited.
    • Requests that are made when there’s no available sandbox will be throttled.
  • Initialize on startup: if set to false (default), sandboxes are lazily initialized on a node, the first time a query requires a sandbox for its execution. Otherwise, if set to true, sandboxes are initialized as part of service startup.
    • This means that the first execution of a plugin that uses sandboxes on a node will include a short warm-up period.
  • CPU: The maximum rate of CPU a sandbox can consume of its host’s processors is limited (default is 50%).
    • When the limit is reached, the sandbox’s CPU use is throttled, but execution continues.
  • Memory: The maximum amount of RAM a sandbox can consume of its host’s RAM is limited.
    • Default memory for Hyper-V technology is 1 GB, and for legacy sandboxes 20 GB.
    • Reaching the limit results in termination of the sandbox, and a query execution error.

Sandbox limitations

  • Network: A sandbox can’t interact with any resource on the virtual machine (VM) or outside of it.
    • A sandbox can’t interact with another sandbox.

Errors

ErrorCodeStatusMessagePotential reason
E_SB_QUERY_THROTTLED_ERRORTooManyRequests (429)The sandboxed query was aborted because of throttling. Retrying after some backoff might succeedThere are no available sandboxes on the target node. New sandboxes should become available in a few seconds
E_SB_QUERY_THROTTLED_ERRORTooManyRequests (429)Sandboxes of kind ‘{kind}’ haven’t yet been initializedThe sandbox policy has recently changed. New sandboxes obeying the new policy will become available in a few seconds
InternalServiceError (520)The sandboxed query was aborted due to a failure in initializing sandboxesAn unexpected infrastructure failure.

VM Sizes supporting nested virtualization

The following table lists all modern VM sizes that support Hyper-V sandbox technology.

NameCategory
Standard_L8s_v3storage-optimized
Standard_L16s_v3storage-optimized
Standard_L8as_v3storage-optimized
Standard_L16as_v3storage-optimized
Standard_E8as_v5storage-optimized
Standard_E16as_v5storage-optimized
Standard_E8s_v4storage-optimized
Standard_E16s_v4storage-optimized
Standard_E8s_v5storage-optimized
Standard_E16s_v5storage-optimized
Standard_E2ads_v5compute-optimized
Standard_E4ads_v5compute-optimized
Standard_E8ads_v5compute-optimized
Standard_E16ads_v5compute-optimized
Standard_E2d_v4compute-optimized
Standard_E4d_v4compute-optimized
Standard_E8d_v4compute-optimized
Standard_E16d_v4compute-optimized
Standard_E2d_v5compute-optimized
Standard_E4d_v5compute-optimized
Standard_E8d_v5compute-optimized
Standard_E16d_v5compute-optimized
Standard_D32d_v4compute-optimized

7.21 - Sharding policy

7.21.1 - Data sharding policy

Learn how to use the data sharding policy to define if and how extents in the database are created.

The sharding policy defines if and how extents (data shards) in your cluster are created. You can only query data in an extent once it’s created.

The data sharding policy contains the following properties:

  • ShardEngineMaxRowCount:

    • Maximum row count for an extent created by an ingestion or rebuild operation.
    • Defaults to 1,048,576.
    • Not in effect for merge operations.
      • If you must limit the number of rows in extents created by merge operations, adjust the RowCountUpperBoundForMerge property in the entity’s extents merge policy.
  • ShardEngineMaxExtentSizeInMb:

    • Maximum allowed compressed data size (in megabytes) for an extent created by a merge or rebuild operation.
    • Defaults to 8,192 (8 GB).
  • ShardEngineMaxOriginalSizeInMb:

    • Maximum allowed original data size (in megabytes) for an extent created by a rebuild operation.
    • In effect only for rebuild operations.
    • Defaults to 3,072 (3 GB).

When a database is created, it contains the default data sharding policy. This policy is inherited by all tables created in the database (unless the policy is explicitly overridden at the table level).

Use the sharding policy management commands to manage data sharding policies for databases and tables.

7.22 - Streaming ingestion policy

7.22.1 - Streaming ingestion policy

Learn how to use the streaming ingestion policy to optimize operational processing of many tables where the stream of data is small.

Streaming ingestion target scenarios

Streaming ingestion should be used for the following scenarios:

  • Latency of less than a few seconds is required.
  • To optimize operational processing of many tables where the stream of data into each table is relatively small (a few records per second), but the overall data ingestion volume is high (thousands of records per second).

If the stream of data into each table is high (over 4 GB per hour), consider using queued ingestion.

Streaming ingestion policy definition

The streaming ingestion policy contains the following properties:

  • IsEnabled:
    • defines the status of streaming ingestion functionality for the table/database
    • mandatory, no default value, must explicitly be set to true or false
  • HintAllocatedRate:
    • if set provides a hint on the hourly volume of data in gigabytes expected for the table. This hint helps the system adjust the amount of resources that are allocated for a table in support of streaming ingestion.
    • default value null (unset)

To enable streaming ingestion on a table, define the streaming ingestion policy with IsEnabled set to true. This definition can be set on a table itself or on the database. Defining this policy at the database level applies the same settings to all existing and future tables in the database. If the streaming ingestion policy is set at both the table and database levels, the table level setting takes precedence. This setting means that streaming ingestion can be generally enabled for the database but specifically disabled for certain tables, or the other way around.

Set the data rate hint

The streaming ingestion policy can provide a hint about the hourly volume of data expected for the table. This hint helps the system adjust the amount of resources allocated for this table in support of streaming ingestion. Set the hint if the rate of streaming data ingresses into the table exceeds 1 Gb/hour. If setting HintAllocatedRate in the streaming ingestion policy for the database, set it by the table with the highest expected data rate. It isn’t recommended to set the effective hint for a table to a value much higher than the expected peak hourly data rate. This setting might have an adverse effect on the query performance.

7.23 - Update policy

7.23.1 - Common scenarios for using table update policies

Learn about common scenarios that can use table update policies to perform complex transformations and save the results to destination tables.

This section describes some well-known scenarios that use update policies. Consider adopting these scenarios when your circumstances are similar.

In this article, you learn about the following common scenarios:

Medallion architecture data enrichment

Update policies on tables provide an efficient way to apply rapid transformations and are compatible with the medallion lakehouse architecture in Fabric.

In the medallion architecture, when raw data lands in a landing table (bronze layer), an update policy can be used to apply initial transformations and save the enriched output to a silver layer table. This process can cascade, where the data from the silver layer table can trigger another update policy to further refine the data and hydrate a gold layer table.

The following diagram illustrates an example of a data enrichment update policy named Get_Values. The enriched data is output to a silver layer table, which includes a calculated timestamp value and lookup values based on the raw data.

Diagram showing the medallion architecture data enrichment scenario using update policies solution.

Data routing

A special case of data enrichment occurs when a raw data element contains data that must be routed to a different table based on one or more attributes of the data itself.

Consider an example that uses the same base data as the previous scenario, but this time there are three messages. The first message is a device telemetry message, the second message is a device alarm message, and the third message is an error.

To handle this scenario, three update policies are used. The Get_Telemetry update policy filters the device telemetry message, enriches the data, and saves it to the Device_Telemetry table. Similarly, the Get_Alarms update policy saves the data to the Device_Alarms table. Lastly, the Log_Error update policy sends unknown messages to the Error_Log table, allowing operators to detect malformed messages or unexpected schema evolution.

The following diagram depicts the example with the three update policies.

Diagram showing the data routing scenario using update policies solution.

Optimize data models

Update policies on tables are built for speed. Tables typically conform to star schema design, which supports the development of data models that are optimized for performance and usability.

Querying tables in a star schema often requires joining tables. However, table joins can lead to performance issues, especially when querying high volumes of data. To improve query performance, you can flatten the model by storing denormalized data at ingestion time.

Joining tables at ingestion time has the added benefit of operating on a small batch of data, resulting in a reduced computational cost of the join. This approach can massively improve the performance of downstream queries.

For example, you can enrich raw telemetry data from a device by looking up values from a dimension table. An update policy can perform the lookup at ingestion time and save the output to a denormalized table. Furthermore, you can extend the output with data sourced from a reference data table.

The following diagram depicts the example, which comprises an update policy named Enrich_Device_Data. It extends the output data with data sourced from the Site reference data table.

Diagram showing the optimized data models scenario using update policies solution.

7.23.2 - Run an update policy with a managed identity

This article describes how to use a managed identity for update policy.

The update policy must be configured with a managed identity in the following scenarios:

  • When the update policy query references tables in other databases
  • When the update policy query references tables with an enabled row level security policy

An update policy configured with a managed identity is performed on behalf of the managed identity.

In this article, you learn how to configure a system-assigned or user-assigned managed identity and create an update policy using that identity.

Prerequisites

Configure a managed identity

There are two types of managed identities:

  • System-assigned: A system-assigned identity is connected to your cluster and is removed when the cluster is removed. Only one system-assigned identity is allowed per cluster.

  • User-assigned: A user-assigned managed identity is a standalone Azure resource. Multiple user-assigned identities can be assigned to your cluster.

Select one of the following tabs to set up your preferred managed identity type.

User-assigned

  1. Follow the steps to Add a user-assigned identity.

  2. In the Azure portal, in the left menu of your managed identity resource, select Properties. Copy and save the Tenant Id and Principal ID for use in the following steps.

    Screenshot of Azure portal area with managed identity IDs.

  3. Run the following .alter-merge policy managed_identity command, replacing <objectId> with the managed identity Principal ID from the previous step. This command sets a managed identity policy on the cluster that allows the managed identity to be used with the update policy.

    .alter-merge cluster policy managed_identity ```[
        {
          "ObjectId": "<objectId>",
          "AllowedUsages": "AutomatedFlows"
        }
    ]```
    

    [!NOTE] To set the policy on a specific database, use database <DatabaseName> instead of cluster.

  4. Run the following command to grant the managed identity Database Viewer permissions over all databases referenced by the update policy query.

    .add database <DatabaseName> viewers ('aadapp=<objectId>;<tenantId>')
    

    Replace <DatabaseName> with the relevant database, <objectId> with the managed identity Principal ID from step 2, and <tenantId> with the Microsoft Entra ID Tenant Id from step 2.

System-assigned

  1. Follow the steps to Add a system-assigned identity.

  2. Copy and save the Object ID for use in a later step.

  3. Run the following .alter-merge policy managed_identity command. This command sets a managed identity policy on the cluster that allows the managed identity to be used with the update policy.

    .alter-merge cluster policy managed_identity ```[
        {
          "ObjectId": "system",
          "AllowedUsages": "AutomatedFlows"
        }
    ]```
    

    [!NOTE] To set the policy on a specific database, use database <DatabaseName> instead of cluster.

  4. Run the following command to grant the managed identity Database Viewer permissions over all databases referenced by the update policy query.

    .add database <DatabaseName> viewers ('aadapp=<objectId>')
    

    Replace <DatabaseName> with the relevant database and <objectId> with the managed identity Object ID you saved earlier.

Create an update policy

Select one of the following tabs to create an update policy that runs on behalf of a user-assigned or system-assigned managed identity.

User-assigned

Run the .alter table policy update command with the ManagedIdentity property set to the managed identity object ID.

For example, the following command alters the update policy of the table MyTable in the database MyDatabase. It’s important to note that both the Source and Query parameters should only reference objects within the same database where the update policy is defined. However, the code contained within the function specified in the Query parameter can interact with tables located in other databases. For example, the function MyUpdatePolicyFunction() can access OtherTable in OtherDatabase on behalf of a user-assigned managed identity. <objectId> should be a managed identity object ID.

.alter table MyDatabase.MyTable policy update
```
[
    {
        "IsEnabled": true,
        "Source": "MyTable",
        "Query": "MyUpdatePolicyFunction()",
        "IsTransactional": false,
        "PropagateIngestionProperties": false,
        "ManagedIdentity": "<objectId>"
    }
]
```

System-assigned

Run the .alter table policy update command with the ManagedIdentity property set to the managed identity object ID.

For example, the following command alters the update policy of the table MyTable in the database MyDatabase. It’s important to note that both the Source and Query parameters should only reference objects within the same database where the update policy is defined. However, the code contained within the function specified in the Query parameter can interact with tables located in other databases. For example, the function MyUpdatePolicyFunction() can access OtherTable in OtherDatabase on behalf of your system-assigned managed identity.

.alter table MyDatabase.MyTable policy update
```
[
    {
        "IsEnabled": true,
        "Source": "MyTable",
        "Query": "MyUpdatePolicyFunction()",
        "IsTransactional": false,
        "PropagateIngestionProperties": false,
        "ManagedIdentity": "system"
    }
]
```

7.23.3 - Update policy overview

Learn how to trigger an update policy to add data to a source table.

Update policies are automation mechanisms triggered when new data is written to a table. They eliminate the need for special orchestration by running a query to transform the ingested data and save the result to a destination table. Multiple update policies can be defined on a single table, allowing for different transformations and saving data to multiple tables simultaneously. The target tables can have a different schema, retention policy, and other policies from the source table.

For example, a high-rate trace source table can contain data formatted as a free-text column. The target table can include specific trace lines, with a well-structured schema generated from a transformation of the source table’s free-text data using the parse operator. For more information, common scenarios.

The following diagram depicts a high-level view of an update policy. It shows two update policies that are triggered when data is added to the second source table. Once they’re triggered, transformed data is added to the two target tables.

Diagram shows an overview of the update policy.

An update policy is subject to the same restrictions and best practices as regular ingestion. The policy scales-out according to the cluster size, and is more efficient when handling bulk ingestion. An update policy is subject to the same restrictions and best practices as regular ingestion. The policy scales-out according to the Eventhouse size, and is more efficient when handling bulk ingestion.

Ingesting formatted data improves performance, and CSV is preferred because of it’s a well-defined format. Sometimes, however, you have no control over the format of the data, or you want to enrich ingested data, for example, by joining records with a static dimension table in your database.

Update policy query

If the update policy is defined on the target table, multiple queries can run on data ingested into a source table. If there are multiple update policies, the order of execution isn’t necessarily known.

Query limitations

  • The policy-related query can invoke stored functions, but:
    • It can’t perform cross-cluster queries.
    • It can’t access external data or external tables.
    • It can’t make callouts (by using a plugin).
  • The query doesn’t have read access to tables that have the RestrictedViewAccess policy enabled.
  • For update policy limitations in streaming ingestion, see streaming ingestion limitations.
  • The policy-related query can invoke stored functions, but:
    • It can’t perform cross-eventhouse queries.
    • It can’t access external data or external tables.
    • It can’t make callouts (by using a plugin).
  • The query doesn’t have read access to tables that have the RestrictedViewAccess policy enabled.
  • By default, the Streaming ingestion policy is enabled for all tables in the Eventhouse. To use functions with the join operator in an update policy, the streaming ingestion policy must be disabled. Use the .alter table TableName policy streamingingestion PolicyObject command to disable it.

When referencing the Source table in the Query part of the policy, or in functions referenced by the Query part:

  • Don’t use the qualified name of the table. Instead, use TableName.
  • Don’t use database("<DatabaseName>").TableName or cluster("<ClusterName>").database("<DatabaseName>").TableName.
  • Don’t use the qualified name of the table. Instead, use TableName.
  • Don’t use database("<DatabaseName>").TableName or cluster("<EventhouseName>").database("<DatabaseName>").TableName.

The update policy object

A table can have zero or more update policy objects associated with it. Each such object is represented as a JSON property bag, with the following properties defined.

PropertyTypeDescription
IsEnabledboolStates if update policy is true - enabled, or false - disabled
SourcestringName of the table that triggers invocation of the update policy
QuerystringA query used to produce data for the update
IsTransactionalboolStates if the update policy is transactional or not, default is false. If the policy is transactional and the update policy fails, the source table isn’t updated.
PropagateIngestionPropertiesboolStates if properties specified during ingestion to the source table, such as extent tags and creation time, apply to the target table.
ManagedIdentitystringThe managed identity on behalf of which the update policy runs. The managed identity can be an object ID, or the system reserved word. The update policy must be configured with a managed identity when the query references tables in other databases or tables with an enabled row level security policy. For more information, see Use a managed identity to run a update policy.

Management commands

Update policy management commands include:

Update policy is initiated following ingestion

Update policies take effect when data is ingested or moved to a source table, or extents are created in a source table. These actions can be done using any of the following commands:

Remove data from source table

After ingesting data to the target table, you can optionally remove it from the source table. Set a soft-delete period of 0sec (or 00:00:00) in the source table’s retention policy, and the update policy as transactional. The following conditions apply:

  • The source data isn’t queryable from the source table
  • The source data doesn’t persist in durable storage as part of the ingestion operation
  • Operational performance improves. Post-ingestion resources are reduced for background grooming operations on extents in the source table.

Performance impact

Update policies can affect performance, and ingestion for data extents is multiplied by the number of target tables. It’s important to optimize the policy-related query. You can test an update policy’s performance impact by invoking the policy on already-existing extents, before creating or altering the policy, or on the function used with the query.

Evaluate resource usage

Use .show queries, to evaluate resource usage (CPU, memory, and so on) with the following parameters:

  • Set the Source property, the source table name, as MySourceTable
  • Set the Query property to call a function named MyFunction()
// '_extentId' is the ID of a recently created extent, that likely hasn't been merged yet.
let _extentId = toscalar(
    MySourceTable
    | project ExtentId = extent_id(), IngestionTime = ingestion_time()
    | where IngestionTime > ago(10m)
    | top 1 by IngestionTime desc
    | project ExtentId
);
// This scopes the source table to the single recent extent.
let MySourceTable =
    MySourceTable
    | where ingestion_time() > ago(10m) and extent_id() == _extentId;
// This invokes the function in the update policy (that internally references `MySourceTable`).
MyFunction

Transactional settings

The update policy IsTransactional setting defines whether the update policy is transactional and can affect the behavior of the policy update, as follows:

  • IsTransactional:false: If the value is set to the default value, false, the update policy doesn’t guarantee consistency between data in the source and target table. If an update policy fails, data is ingested only to the source table and not to the target table. In this scenario, ingestion operation is successful.
  • IsTransactional:true: If the value is set to true, the setting does guarantee consistency between data in the source and target tables. If an update policy fails, data isn’t ingested to the source or target table. In this scenario, the ingestion operation is unsuccessful.

Handling failures

When policy updates fail, they’re handled differently based on whether the IsTransactional setting is true or false. Common reasons for update policy failures are:

  • A mismatch between the query output schema and the target table.
  • Any query error.

You can view policy update failures using the .show ingestion failures command with the following command: In any other case, you can manually retry ingestion.

.show ingestion failures
| where FailedOn > ago(1hr) and OriginatesFromUpdatePolicy == true

Example of extract, transform, load

You can use update policy settings to perform extract, transform, load (ETL).

In this example, use an update policy with a simple function to perform ETL. First, we create two tables:

  • The source table - Contains a single string-typed column into which data is ingested.
  • The target table - Contains the desired schema. The update policy is defined on this table.
  1. Let’s create the source table:

    .create table MySourceTable (OriginalRecord:string)
    
  2. Next, create the target table:

    .create table MyTargetTable (Timestamp:datetime, ThreadId:int, ProcessId:int, TimeSinceStartup:timespan, Message:string)
    
  3. Then create a function to extract data:

    .create function
     with (docstring = 'Parses raw records into strongly-typed columns', folder = 'UpdatePolicyFunctions')
         ExtractMyLogs()
        {
        MySourceTable
        | parse OriginalRecord with "[" Timestamp:datetime "] [ThreadId:" ThreadId:int "] [ProcessId:" ProcessId:int "] TimeSinceStartup: " TimeSinceStartup:timespan " Message: " Message:string
        | project-away OriginalRecord
    }
    
  4. Now, set the update policy to invoke the function that we created:

    .alter table MyTargetTable policy update
    @'[{ "IsEnabled": true, "Source": "MySourceTable", "Query": "ExtractMyLogs()", "IsTransactional": true, "PropagateIngestionProperties": false}]'
    
  5. To empty the source table after data is ingested into the target table, define the retention policy on the source table to have 0s as its SoftDeletePeriod.

     .alter-merge table MySourceTable policy retention softdelete = 0s
    

8 - Query results cache

8.1 - Query results cache commands

This article describes Query results cache.

The query results cache is a cache dedicated for storing query results. For more information, see Query results cache.

Query results cache commands

Kusto provides two commands for cache management and observability:

9 - Schema

9.1 - Avrotize k2a tool

Learn how to use the Avrotize k2a command to connect to a Kusto database and create an Avro schema.

Avrotize is a versatile tool for converting data and database schema formats, and generating code in various programming languages. The tool supports the conversion of Kusto table schemas to Apache Avro format and vice versa with the Convert Kusto table definition to Avrotize Schema command. The tool handles dynamic columns in Kusto tables by:

  • Inferring the schema through sampling
  • Resolving arrays and records at any level of nesting
  • Detecting conflicting schemas
  • Creating type unions for each different schema branch

Convert table definition to AVRO format

You can use the avrotize k2a command to connect to a Kusto database and create an Avro schema with a record type for each of the tables in the database.

The following are examples of how to use the command:

  • Create an Avro schema with a top-level union with a record for each table:

    avrotize k2a --kusto-uri <Uri> --kusto-database <DatabaseName> --avsc <AvroFilename.avsc>
    
  • Create a XRegistry Catalog file with CloudEvent wrappers and per-event schemas:

    In the following example, you create xRegistry catalog files with schemas for each table. If the input table contains CloudEvents identified by columns like id, source, and type, the tool creates separate schemas for each event type.

    avrotize k2a --kusto-uri <URI> --kusto-database <DatabaseName> --avsc <AvroFilename.xreg.json> --emit-cloudevents-xregistry --avro-namespace <AvroNamespace>
    

Convert AVRO schema to Kusto table declaration

You can use the avrotize a2k command to create KQL table declarations from Avro schema and JSON mappings. It can also include docstrings in the table declarations extracted from the “doc” annotations in the Avro record types.

If the Avro schema is a single record type, the output script includes a .create table command for the record. The record fields are converted into columns in the table. If the Avro schema is a type union (a top-level array), the output script emits a separate .create table command for each record type in the union.

avrotize a2k  .\<AvroFilename.avsc> --out <KustoFilename.kql>

The Avrotize tool is capable of converting JSON Schema, XML Schema, ASN.1 Schema, and Protobuf 2 and Protobuf 3 schemas into Avro schema. You can first convert the source schema into an Avro schema to normalize it and then convert it into Kusto schema.

For example, to convert “address.json” into Avro schema, the following command first converts an input JSON Schema document “address.json” to normalize it:

avrotize j2a address.json --out address.avsc

Then convert the Avro schema file into Kusto schema:

avrotize a2k address.avsc --out address.kql

You can also chain the commands together to convert from JSON Schema via Avro into Kusto schema:

avrotize j2a address.json | avrotize a2k --out address.kql

9.2 - Best practices for schema management

This article describes Best practices for schema management.

Here are several best practices to follow. They’ll help make your management commands work better, and have a lighter impact on the service resources.

ActionUseDon’t useNotes
Create multiple tablesUse a single .create tables commandDon’t issue many .create table commands
Rename multiple tablesMake a single call to .rename tablesDon’t issue a separate call for each pair of tables
Show commandsUse the lowest-scoped .show commandDon’t apply filters after a pipe (|)Limit use as much as possible. When possible, cache the information they return.
Show extentsUse .show table T extentsDon’t use `.show cluster extentswhere TableName == ‘T’`
Show database schema.Use .show database DB schemaDon’t use `.show schemawhere DatabaseName == ‘DB’`
Show large schema
Use .show databases schemaDon’t use .show schemaFor example, use on an environment with more than 100 databases.
Check a table’s existence or get the table’s schemaUse .show table T schema as jsonDon’t use .show table TOnly use this command to get actual statistics on a single table.
Define the schema for a table that will include datetime valuesSet the relevant columns to the datetime typeDon’t convert string or numeric columns to datetime at query time for filtering, if that can be done before or during ingestion time
Add extent tag to metadataUse sparinglyAvoid drop-by: tags, which limit the system’s ability to do performance-oriented grooming processes in the background.
See performance notes.

9.3 - Columns

9.3.1 - Change column type without data loss

Learn how to preserve preexisting data by changing column type without data loss.

The .alter column command changes the column type, making the original data unrecoverable. To preserve preexisting data while changing the column type, create a new, properly typed table.

For each table OriginalTable you’d like to change a column type in, execute the following steps:

  1. Create a table NewTable with the correct schema (the right column types and the same column order).

  2. Ingest the data into NewTable from OriginalTable, applying the required data transformations. In the following example, Col1 is being converted to the string data type.

    .set-or-append NewTable <| OriginalTable | extend Col1=tostring(Col1)
    
  3. Use the .rename tables command to swap table names.

    .rename tables NewTable=OriginalTable, OriginalTable=NewTable
    

    When the command completes, the new data from existing ingestion pipelines flows to OriginalTable that is now typed correctly.

  4. Drop the table NewTable.

    NewTable includes only a copy of the historical data from before the schema change. It can be safely dropped after confirming the schema and data in OriginalTable were correctly updated.

    .drop table NewTable
    

Example

The following example updates the schema of OriginalTable while preserving its data.

Create the table, OriginalTable, with a column, “Col1,” of type guid.

.create table OriginalTable (Col1:guid, Id:int)

Then ingest data into OriginalTable.

.ingest inline into table OriginalTable <|
b642dec0-1040-4eac-84df-a75cfeba7aa4,1
c224488c-ad42-4e6c-bc55-ae10858af58d,2
99784a64-91ad-4897-ae0e-9d44bed8eda0,3
d8857a93-2728-4bcb-be1d-1a2cd35386a7,4
b1ddcfcc-388c-46a2-91d4-5e70aead098c,5

Create the table, NewTable of type string.

.create table NewTable (Col1:string, Id:int)

Append data from OriginalTable to NewTable and use the tostring() function to convert the “Col1” column from type guid to type string.

.set-or-append NewTable <| OriginalTable | extend Col1=tostring(Col1)

Swap the table names.

.rename tables NewTable = OriginalTable, OriginalTable = NewTable

Drop table, NewTable with the old schema and data.

.drop table NewTable

9.3.2 - Columns management

This article describes Columns management.

This section describes the following management commands used for managing table columns:

CommandDescription
.alter columnAlters the data type of an existing table column
.alter-merge column docstrings and .alter column docstringsSets the docstring property of one or more columns of the specified table
.alter table, .alter-merge tableModify the schema of a table (add/remove columns)
drop column and drop table columnsRemoves one or multiple columns from a table
rename column or columnsChanges the name of an existing or multiple table columns

9.4 - Databases

9.5 - External tables

9.5.1 - Azure SQL external tables

9.5.1.1 - Create and alter Azure SQL external tables

Learn how to create and alter an SQL external table.

Creates or alters an Azure SQL external table in the database in which the command is executed.

Supported Azure SQL external table types

  1. SQL Server
  2. MySQL
  3. PostgreSQL
  4. Cosmos DB

Permissions

To .create requires at least Database User permissions and to .alter requires at least Table Admin permissions.

To .create, .alter, or .create-or-alter an external table using managed identity authentication requires Database Admin permissions. This method is supported for SQL Server and Cosmos DB external tables.

Syntax

(.create | .alter | .create-or-alter) external table TableName (Schema) kind = sql [ table = SqlTableName ] (SqlConnectionString) [with ( [ sqlDialect = SqlDialect ] , [ Property , … ])]

Parameters

NameTypeRequiredDescription
TableNamestring✔️The name of the external table. The name must follow the rules for entity names, and an external table can’t have the same name as a regular table in the same database.
Schemastring✔️The external data schema is a comma-separated list of one or more column names and data types, where each item follows the format: ColumnName : ColumnType.
SqlTableNamestringThe name of the SQL table not including the database name. For example, “MySqlTable” and not “db1.MySqlTable”. If the name of the table contains a period ("."), use [‘Name.of.the.table’] notation.


This specification is required for all types of tables except for Cosmos DB, as for Cosmos DB the collection name is part of the connection string.
SqlConnectionStringstring✔️The connection string to the SQL server.
SqlDialectstringIndicates the type of Azure SQL external table. SQL Server is the default. For MySQL, specify MySQL. For PostgreSQL, specify PostgreSQL. For Cosmos DB, specify CosmosDbSql.
PropertystringA key-value property pair in the format PropertyName = PropertyValue. See optional properties.

Optional properties

PropertyTypeDescription
folderstringThe table’s folder.
docStringstringA string documenting the table.
firetriggerstrue/falseIf true, instructs the target system to fire INSERT triggers defined on the SQL table. The default is false. (For more information, see BULK INSERT and System.Data.SqlClient.SqlBulkCopy)
createifnotexiststrue/ falseIf true, the target SQL table is created if it doesn’t already exist; the primarykey property must be provided in this case to indicate the result column that is the primary key. The default is false.
primarykeystringIf createifnotexists is true, the resulting column name is used as the SQL table’s primary key if it’s created by this command.

Authentication and authorization

To interact with an external Azure SQL table, you must specify authentication means as part of the SqlConnectionString. The SqlConnectionString defines the resource to access and its authentication information.

For more information, see Azure SQL external table authentication methods.

Examples

The following examples show how to create each type of Azure SQL external table.

SQL Server

.create external table MySqlExternalTable (x:long, s:string) 
kind=sql
table=MySqlTable
( 
   h@'Server=tcp:myserver.database.windows.net,1433;Authentication=Active Directory Integrated;Initial Catalog=mydatabase;'
)
with 
(
   docstring = "Docs",
   folder = "ExternalTables", 
   createifnotexists = true,
   primarykey = x,
   firetriggers=true
)  

Output

TableNameTableTypeFolderDocStringProperties
MySqlExternalTableSqlExternalTablesDocs{
“TargetEntityKind”: “sqltable`”,
“TargetEntityName”: “MySqlTable”,
“TargetEntityConnectionString”: “Server=tcp:myserver.database.windows.net,1433;Authentication=Active Directory Integrated;Initial Catalog=mydatabase;”,
“FireTriggers”: true,
“CreateIfNotExists”: true,
“PrimaryKey”: “x”
}

MySQL

.create external table MySqlExternalTable (x:long, s:string) 
kind=sql
table=MySqlTable
( 
   h@'Server=myserver.mysql.database.windows.net;Port = 3306;UID = USERNAME;Pwd = PASSWORD;Database = mydatabase;'
)
with 
(
   sqlDialect = "MySql",
   docstring = "Docs",
   folder = "ExternalTables", 
)  

PostgreSQL

.create external table PostgreSqlExternalTable (x:long, s:string) 
kind=sql
table=PostgreSqlTable
( 
   h@'Host = hostname.postgres.database.azure.com; Port = 5432; Database= db; User Id=user; Password=pass; Timeout = 30;'
)
with 
(
   sqlDialect = "PostgreSQL",
   docstring = "Docs",
   folder = "ExternalTables", 
)  

Cosmos DB

.create external table CosmosDBSQLExternalTable (x:long, s:string) 
kind=sql
( 
   h@'AccountEndpoint=https://cosmosdbacc.documents.azure.com/;Database=MyDatabase;Collection=MyCollection;AccountKey=' h'R8PM...;'
)
with 
(
   sqlDialect = "CosmosDbSQL",
   docstring = "Docs",
   folder = "ExternalTables", 
)  

9.5.1.2 - Query SQL external tables

This article describes how to query external tables based on SQL tables.

You can query a SQL external table just as you would query an Azure Data Explorer or a table in a KQL Database.

How it works

Azure SQL external table queries are translated from Kusto Query Language (KQL) to SQL. The operators after the external_table function call, such as where, project, count, and so on, are pushed down and translated into a single SQL query to be executed against the target SQL table.

Example

For example, consider an external table named MySqlExternalTable with two columns x and s. In this case, the following KQL query is translated into the following SQL query.

KQL query

external_table(MySqlExternalTable)
| where x > 5 
| count

SQL query

SELECT COUNT(*) FROM (SELECT x, s FROM MySqlTable WHERE x > 5) AS Subquery1

9.5.1.3 - Use row-level security with Azure SQL external tables

This document describes how to create a row-level security solution with SQL external tables.

Apply row-level security on Azure SQL external tables

This document describes how to apply a row-level security (RLS) solution with SQL external tables. row-level security implements data isolation at the user level, restricting the access to data based on the current user credential. However, Kusto external tables don’t support RLS policy definitions, so data isolation on external SQL tables require a different approach. The following solution employs using row-level security in SQL Server, and Microsoft Entra ID Impersonation in the SQL Server connection string. This combination provides the same behavior as applying user access control with RLS on standard Kusto tables, such that the users querying the SQL External Table are able to only see the records addressed to them, based on the row-level security policy defined in the source database.

Prerequisites

Sample table

The example source is a SQL Server table called SourceTable, with the following schema. The systemuser column contains the user email to whom the data record belongs. This is the same user who should have access to this data.

CREATE TABLE SourceTable (
    id INT,
    region VARCHAR(5),
    central VARCHAR(5),
    systemuser VARCHAR(200)
)

Configure row-level security in the source SQL Server - SQL Server side

For general information on SQL Server row-level security, see row-level security in SQL Server.

  1. Create a SQL Function with the logic for the data access policy. In this example, the row-level security is based on the current user’s email matching the systemuser column. This logic could be modified to meet any other business requirement.

    CREATE SCHEMA Security;
    GO
    
    CREATE FUNCTION Security.mySecurityPredicate(@CheckColumn AS nvarchar(100))
        RETURNS TABLE
    WITH SCHEMABINDING
    AS
        RETURN SELECT 1 AS mySecurityPredicate_result
        WHERE @CheckColumn = ORIGINAL_LOGIN() OR USER_NAME() = 'Manager';
    GO
    
  2. Create the Security Policy on the table SourceTable with passing the column name as the parameter:

    CREATE SECURITY POLICY SourceTableFilter
    ADD FILTER PREDICATE Security.mySecurityPredicate(systemuser)
    ON dbo.SourceTable
    WITH (STATE = ON)
    GO
    

    [!NOTE] At this point, the data is already restricted by the mySecurityPredicate function logic.

Allow user access to SQL Server - SQL Server side

The following steps depend on the SQL Server version that you’re using.

  1. Create a sign in and User for each Microsoft Entra ID credential that is going to access the data stored in SQL Server:

    CREATE LOGIN [user@domain.com] FROM EXTERNAL PROVIDER --MASTER
    
    CREATE USER [user@domain.com] FROM EXTERNAL PROVIDER --DATABASE
    
  2. Grant SELECT on the Security function to the Microsoft Entra ID user:

    GRANT SELECT ON Security.mySecurityPredicate to [user@domain.com]
    
  3. Grant SELECT on the SourceTable to the Microsoft Entra ID user:

    GRANT SELECT ON dbo.SourceTable to [user@domain.com]
    

Define SQL external table connection String - Kusto side

For more information on the connection string, see SQL External Table Connection Strings.

  1. Create a SQL External Table with using Connection String with Active Directory Integrated authentication type. For more information, see Microsoft Entra integrated (impersonation).

    .create external table SQLSourceTable (id:long, region:string, central:string, systemser:string) 
    kind=sql
    table=SourceTable
    ( 
       h@'Server=tcp:[sql server endpoint],1433;Authentication=Active Directory Integrated;Initial Catalog=[database name];'
    )
    with 
    (
       docstring = "Docs",
       folder = "ExternalTables", 
       createifnotexists = false,
       primarykey = 'id'
    )
    

    Connection String:

    Server=tcp:[sql server endpoint],1433;Authentication=Active Directory Integrated;Initial Catalog=[database name];
    
  2. Validate the data isolation based on the Microsoft Entra ID, like it would work with row-level security on in Kusto. In this case, the data is filtered based on the SourceTable’s systemuser column, matching the Microsoft Entra ID user (email address) from the Kusto impersonation:

    external_table('SQLSourceTable')
    

    [!NOTE] The policy can be disabled and enabled again, on the SQL Server side, for testing purposes.

To disable and enable the policy, use the following SQL commands:

ALTER SECURITY POLICY SourceTableFilter
WITH (STATE = OFF);
ALTER SECURITY POLICY SourceTableFilter
WITH (STATE = ON);

With the Security Policy enabled on the SQL Server side, Kusto users only see the records matching their Microsoft Entra IDs, as the result of the query against the SQL External table. With the Security Policy disabled, all users are able to access the full table content as the result of the query against the SQL External table.

9.5.2 - Azure Storage external tables

9.5.2.1 - Create and alter Azure Storage delta external tables

This article describes how to create and alter delta external tables

The commands in this article can be used to create or alter a delta external table in the database from which the command is executed. A delta external table references Delta Lake table data located in Azure Blob Storage, Azure Data Lake Store Gen1, or Azure Data Lake Store Gen2.

To accelerate queries over external delta tables, see Query acceleration policy.

Permissions

To .create requires at least Database User permissions, and to .alter requires at least Table Admin permissions.

To .create-or-alter an external table using managed identity authentication requires AllDatabasesAdmin permissions.

Syntax

(.create | .alter | .create-or-alter) external table TableName [(Schema)] kind = delta (StorageConnectionString ) [with (Property [, …])]

Parameters

NameTypeRequiredDescription
TableNamestring✔️An external table name that adheres to the entity names rules. An external table can’t have the same name as a regular table in the same database.
SchemastringThe optional external data schema is a comma-separated list of one or more column names and data types, where each item follows the format: ColumnName : ColumnType. If not specified, it will be automatically inferred from the delta log based on the latest delta table version.
StorageConnectionStringstring✔️delta table root folder path, including credentials. Can point to Azure Blob Storage blob container, Azure Data Lake Gen 2 file system or Azure Data Lake Gen 1 container. The external table storage type is determined by the provided connection string. See storage connection strings.
PropertystringA key-value property pair in the format PropertyName = PropertyValue. See optional properties.

Authentication and authorization

The authentication method to access an external table is based on the connection string provided during its creation, and the permissions required to access the table vary depending on the authentication method.

The supported authentication methods are the same as those supported by Azure Storage external tables.

Optional properties

PropertyTypeDescription
folderstringTable’s folder
docStringstringString documenting the table
compressedboolOnly relevant for the export scenario.
If set to true, the data is exported in the format specified by the compressionType property. For the read path, compression is automatically detected.
compressionTypestringOnly relevant for the export scenario.
The compression type of exported files. For non-Parquet files, only gzip is allowed. For Parquet files, possible values include gzip, snappy, lz4_raw, brotli, and zstd. Default is gzip. For the read path, compression type is automatically detected.
namePrefixstringIf set, specifies the prefix of the files. On write operations, all files will be written with this prefix. On read operations, only files with this prefix are read.
fileExtensionstringIf set, specifies extension of the files. On write, files names will end with this suffix. On read, only files with this file extension will be read.
encodingstringSpecifies how the text is encoded: UTF8NoBOM (default) or UTF8BOM.
dryRunboolIf set, the external table definition isn’t persisted. This option is useful for validating the external table definition, especially in conjunction with the filesPreview or sampleUris parameter.

Examples

Create or alter a delta external table with an inferred schema

In the following external table, the schema is automatically inferred from the latest delta table version.

.create-or-alter external table ExternalTable  
kind=delta 
( 
   h@'https://storageaccount.blob.core.windows.net/container1;secretKey'
) 

Create a delta external table with a custom schema

In the following external table, a custom schema is specified and overrides the schema of the delta table. If, at some later time, you need to replace the custom schema with the schema based on the latest delta table version, run the .alter | .create-or-alter command without specifying a schema, like in the previous example.

.create external table ExternalTable (Timestamp:datetime, x:long, s:string) 
kind=delta
( 
   h@'abfss://filesystem@storageaccount.dfs.core.windows.net/path;secretKey'
)

Limitations

  • Time travel is not supported. Only the latest delta table version is used.

9.5.2.2 - Create and alter Azure Storage external tables

This article describes how to create and alter external tables based on Azure Blob Storage or Azure Data Lake

The commands in this article can be used to create or alter an Azure Storage external table in the database from which the command is executed. An Azure Storage external table references data located in Azure Blob Storage, Azure Data Lake Store Gen1, or Azure Data Lake Store Gen2.

Permissions

To .create requires at least Database User permissions, and to .alter requires at least Table Admin permissions.

To .create-or-alter an external table using managed identity authentication requires AllDatabasesAdmin permissions.

Syntax

(.create | .alter | .create-or-alter) external table TableName (Schema) kind = storage [partition by (Partitions) [pathformat = (PathFormat)]] dataformat = DataFormat (StorageConnectionString [, …] ) [with (Property [, …])]

Parameters

NameTypeRequiredDescription
TableNamestring✔️An external table name that adheres to the entity names rules. An external table can’t have the same name as a regular table in the same database.
Schemastring✔️The external data schema is a comma-separated list of one or more column names and data types, where each item follows the format: ColumnName : ColumnType. If the schema is unknown, use infer_storage_schema to infer the schema based on external file contents.
PartitionsstringA comma-separated list of columns by which the external table is partitioned. Partition column can exist in the data file itself, or as part of the file path. See partitions formatting to learn how this value should look.
PathFormatstringAn external data folder URI path format to use with partitions. See path format.
DataFormatstring✔️The data format, which can be any of the ingestion formats. We recommend using the Parquet format for external tables to improve query and export performance, unless you use JSON paths mapping. When using an external table for export scenario, you’re limited to the following formats: CSV, TSV, JSON and Parquet.
StorageConnectionStringstring✔️One or more comma-separated paths to Azure Blob Storage blob containers, Azure Data Lake Gen 2 file systems or Azure Data Lake Gen 1 containers, including credentials. The external table storage type is determined by the provided connection strings. See storage connection strings.
PropertystringA key-value property pair in the format PropertyName = PropertyValue. See optional properties.

Authentication and authorization

The authentication method to access an external table is based on the connection string provided during its creation, and the permissions required to access the table vary depending on the authentication method.

The following table lists the supported authentication methods for Azure Storage external tables and the permissions needed to read or write to the table.

Authentication methodAzure Blob Storage / Data Lake Storage Gen2Data Lake Storage Gen1
ImpersonationRead permissions: Storage Blob Data Reader
Write permissions: Storage Blob Data Contributor
Read permissions: Reader
Write permissions: Contributor
Managed identityRead permissions: Storage Blob Data Reader
Write permissions: Storage Blob Data Contributor
Read permissions: Reader
Write permissions: Contributor
Shared Access (SAS) tokenRead permissions: List + Read
Write permissions: Write
This authentication method isn’t supported in Gen1.
Microsoft Entra access tokenNo additional permissions required.No additional permissions required.
Storage account access keyNo additional permissions required.This authentication method isn’t supported in Gen1.

Path format

The PathFormat parameter allows you to specify the format for the external data folder URI path in addition to partitions. It consists of a sequence of partition elements and text separators. A partition element refers to a partition that is declared in the partition by clause, and the text separator is any text enclosed in quotes. Consecutive partition elements must be set apart using the text separator.

[ StringSeparator ] Partition [ StringSeparator ] [Partition [ StringSeparator ] …]

To construct the original file path prefix, partition elements are rendered as strings and separated with corresponding text separators. You can use the datetime_pattern macro (datetime_pattern(DateTimeFormat, PartitionName)) to specify the format used for rendering a datetime partition value. The macro adheres to the .NET format specification, and allows format specifiers to be enclosed in curly brackets. For example, the following two formats are equivalent:

  • ‘year=‘yyyy’/month=‘MM
  • year={yyyy}/month={MM}

By default, datetime values are rendered using the following formats:

Partition functionDefault format
startofyearyyyy
startofmonthyyyy/MM
startofweekyyyy/MM/dd
startofdayyyyy/MM/dd
bin(Column, 1d)yyyy/MM/dd
bin(Column, 1h)yyyy/MM/dd/HH
bin(Column, 1m)yyyy/MM/dd/HH/mm

Virtual columns

When data is exported from Spark, partition columns (that are provided to the dataframe writer’s partitionBy method) aren’t written to data files. This process avoids data duplication because the data is already present in the folder names (for example, column1=<value>/column2=<value>/), and Spark can recognize it upon read.

External tables support reading this data in the form of virtual colums. Virtual columns can be of either type string or datetime, and are specified using the following syntax:

.create external table ExternalTable (EventName:string, Revenue:double)  
kind=storage  
partition by (CustomerName:string, Date:datetime)  
pathformat=("customer=" CustomerName "/date=" datetime_pattern("yyyyMMdd", Date))  
dataformat=parquet
( 
   h@'https://storageaccount.blob.core.windows.net/container1;secretKey'
)

To filter by virtual columns in a query, specify partition names in query predicate:

external_table("ExternalTable")
 | where Date between (datetime(2020-01-01) .. datetime(2020-02-01))
 | where CustomerName in ("John.Doe", "Ivan.Ivanov")

Optional properties

PropertyTypeDescription
folderstringTable’s folder
docStringstringString documenting the table
compressedboolOnly relevant for the export scenario.
If set to true, the data is exported in the format specified by the compressionType property. For the read path, compression is automatically detected.
compressionTypestringOnly relevant for the export scenario.
The compression type of exported files. For non-Parquet files, only gzip is allowed. For Parquet files, possible values include gzip, snappy, lz4_raw, brotli, and zstd. Default is gzip. For the read path, compression type is automatically detected.
includeHeadersstringFor delimited text formats (CSV, TSV, …), specifies whether files contain a header. Possible values are: All (all files contain a header), FirstFile (first file in a folder contains a header), None (no files contain a header).
namePrefixstringIf set, specifies the prefix of the files. On write operations, all files will be written with this prefix. On read operations, only files with this prefix are read.
fileExtensionstringIf set, specifies the extension of the files. On write, files names will end with this suffix. On read, only files with this file extension will be read.
encodingstringSpecifies how the text is encoded: UTF8NoBOM (default) or UTF8BOM.
sampleUrisboolIf set, the command result provides several examples of simulated external data files URI as they’re expected by the external table definition. This option helps validate whether the Partitions and PathFormat parameters are defined properly.
filesPreviewboolIf set, one of the command result tables contains a preview of .show external table artifacts command. Like sampleUri, the option helps validate the Partitions and PathFormat parameters of external table definition.
validateNotEmptyboolIf set, the connection strings are validated for having content in them. The command will fail if the specified URI location doesn’t exist, or if there are insufficient permissions to access it.
dryRunboolIf set, the external table definition isn’t persisted. This option is useful for validating the external table definition, especially in conjunction with the filesPreview or sampleUris parameter.

File filtering logic

When querying an external table, performance is improved by filtering out irrelevant external storage files. The process of iterating files and deciding whether a file should be processed is as follows:

  1. Build a URI pattern that represents a place where files are found. Initially, the URI pattern equals a connection string provided as part of the external table definition. If there are any partitions defined, they’re rendered using PathFormat, then appended to the URI pattern.

  2. For all files found under the URI pattern(s) created, check that:

    • Partition values match predicates used in a query.
    • Blob name starts with NamePrefix, if such a property is defined.
    • Blob name ends with FileExtension, if such a property is defined.

Once all the conditions are met, the file is fetched and processed.

Examples

Non-partitioned external table

In the following non-partitioned external table, the files are expected to be placed directly under the container(s) defined:

.create external table ExternalTable (x:long, s:string)  
kind=storage 
dataformat=csv 
( 
   h@'https://storageaccount.blob.core.windows.net/container1;secretKey' 
) 

Partitioned by date

In the following external table partitioned by date, the files are expected to be placed under directories of the default datetime format yyyy/MM/dd:

.create external table ExternalTable (Timestamp:datetime, x:long, s:string) 
kind=storage
partition by (Date:datetime = bin(Timestamp, 1d)) 
dataformat=csv 
( 
   h@'abfss://filesystem@storageaccount.dfs.core.windows.net/path;secretKey'
)

Partitioned by month

In the following external table partitioned by month, the directory format is year=yyyy/month=MM:

.create external table ExternalTable (Timestamp:datetime, x:long, s:string) 
kind=storage 
partition by (Month:datetime = startofmonth(Timestamp)) 
pathformat=(datetime_pattern("'year='yyyy'/month='MM", Month)) 
dataformat=csv 
( 
   h@'https://storageaccount.blob.core.windows.net/container1;secretKey' 
) 

Partitioned by name and date

In the following external table, the data is partitioned first by customer name and then by date, meaning that the expected directory structure is, for example, customer_name=Softworks/2019/02/01:

.create external table ExternalTable (Timestamp:datetime, CustomerName:string) 
kind=storage 
partition by (CustomerNamePart:string = CustomerName, Date:datetime = startofday(Timestamp)) 
pathformat=("customer_name=" CustomerNamePart "/" Date)
dataformat=csv 
(  
   h@'https://storageaccount.blob.core.windows.net/container1;secretKey' 
)

Partitioned by hash and date

The following external table is partitioned first by customer name hash (modulo ten), then by date. The expected directory structure is, for example, customer_id=5/dt=20190201, and data file names end with the .txt extension:

.create external table ExternalTable (Timestamp:datetime, CustomerName:string) 
kind=storage 
partition by (CustomerId:long = hash(CustomerName, 10), Date:datetime = startofday(Timestamp)) 
pathformat=("customer_id=" CustomerId "/dt=" datetime_pattern("yyyyMMdd", Date)) 
dataformat=csv 
( 
   h@'https://storageaccount.blob.core.windows.net/container1;secretKey'
)
with (fileExtension = ".txt")

Filter by partition columns in a query

To filter by partition columns in a query, specify original column name in query predicate:

external_table("ExternalTable")
 | where Timestamp between (datetime(2020-01-01) .. datetime(2020-02-01))
 | where CustomerName in ("John.Doe", "Ivan.Ivanov")

Sample Output

TableNameTableTypeFolderDocStringPropertiesConnectionStringsPartitionsPathFormat
ExternalTableBlobExternalTablesDocs{“Format”:“Csv”,“Compressed”:false,“CompressionType”:null,“FileExtension”:null,“IncludeHeaders”:“None”,“Encoding”:null,“NamePrefix”:null}[“https://storageaccount.blob.core.windows.net/container1;*******”][{“Mod”:10,“Name”:“CustomerId”,“ColumnName”:“CustomerName”,“Ordinal”:0},{“Function”:“StartOfDay”,“Name”:“Date”,“ColumnName”:“Timestamp”,“Ordinal”:1}]“customer_id=” CustomerId “/dt=” datetime_pattern(“yyyyMMdd”,Date)

9.6 - Functions

9.6.1 - Stored functions management overview

This article describes Stored functions management overview.

This section describes management commands used for creating and altering user-defined functions:

FunctionDescription
.alter functionAlters an existing function and stores it inside the database metadata
.alter function docstringAlters the DocString value of an existing function
.alter function folderAlters the Folder value of an existing function
.create functionCreates a stored function
.create-or-alter functionCreates a stored function or alters an existing function and stores it inside the database metadata
.drop function and .drop functionsDrops a function (or functions) from the database
.show functions and .show functionLists all the stored functions, or a specific function, in the currently-selected database

|.show functions and .show function |Lists all the stored functions, or a specific function, in the currently-selected database |

9.7 - Ingestion mappings

9.7.1 - AVRO Mapping

Learn how to use AVRO mapping to map data to columns inside tables upon ingestion.

Use AVRO mapping to map incoming data to columns inside tables when your ingestion source file is in AVRO format.

Each AVRO mapping element must contain either of the following optional properties:

PropertyTypeDescription
FieldstringName of the field in the AVRO record.
PathstringIf the value starts with $, it’s treated as the path to the field in the AVRO document. This path specifies the part of the AVRO document that becomes the content of the column in the table. The path that denotes the entire AVRO record is $. If the value doesn’t start with $, it’s treated as a constant value. Paths that include special characters should be escaped as ['Property Name']. For more information, see JSONPath syntax.
ConstValuestringThe constant value to be used for a column instead of some value inside the AVRO file.
TransformstringTransformation that should be applied on the content with mapping transformations.

Examples

JSON serialization

The following example mapping is serialized as a JSON string when provided as part of the .ingest management command.

[
  {"Column": "event_timestamp", "Properties": {"Field": "Timestamp"}},
  {"Column": "event_name",      "Properties": {"Field": "Name"}},
  {"Column": "event_type",      "Properties": {"Field": "Type"}},
  {"Column": "event_time",      "Properties": {"Field": "Timestamp", "Transform": "DateTimeFromUnixMilliseconds"}},
  {"Column": "ingestion_time",  "Properties": {"ConstValue": "2021-01-01T10:32:00"}},
  {"Column": "full_record",     "Properties": {"Path": "$"}}
]

Here the serialized JSON mapping is included in the context of the .ingest management command.

.ingest into Table123 (@"source1", @"source2")
  with
  (
      format = "AVRO",
      ingestionMapping =
      ```
      [
        {"Column": "column_a", "Properties": {"Field": "Field1"}},
        {"Column": "column_b", "Properties": {"Field": "$.[\'Field name with space\']"}}
      ]
      ```
  )

Precreated mapping

When the mapping is precreated, reference the mapping by name in the .ingest management command.

.ingest into Table123 (@"source1", @"source2")
    with
    (
        format="AVRO",
        ingestionMappingReference = "Mapping_Name"
    )

Identity mapping

Use AVRO mapping during ingestion without defining a mapping schema (see identity mapping).

.ingest into Table123 (@"source1", @"source2")
    with
    (
        format="AVRO"
    )

9.7.2 - CSV Mapping

Learn how to use CSV mapping to map data to columns inside tables upon ingestion.

Use CSV mapping to map incoming data to columns inside tables when your ingestion source file is any of the following delimiter-separated tabular formats: CSV, TSV, PSV, SCSV, SOHsv, TXT and RAW. For more information, see supported data formats.

Each CSV mapping element must contain either of the following optional properties:

PropertyTypeDescription
OrdinalintThe column order number in CSV.
ConstValuestringThe constant value to be used for a column instead of some value inside the CSV file.
TransformstringTransformation that should be applied on the content with mapping transformations. The only supported transformation by is SourceLocation.

Examples

[
  {"Column": "event_time", "Properties": {"Ordinal": "0"}},
  {"Column": "event_name", "Properties": {"Ordinal": "1"}},
  {"Column": "event_type", "Properties": {"Ordinal": "2"}},
  {"Column": "ingestion_time", "Properties": {"ConstValue": "2023-01-01T10:32:00"}}
  {"Column": "source_location", "Properties": {"Transform": "SourceLocation"}}
]

The mapping above is serialized as a JSON string when it’s provided as part of the .ingest management command.

.ingest into Table123 (@"source1", @"source2")
    with
    (
        format="csv",
        ingestionMapping =
        ```
        [
            {"Column": "event_time", "Properties": {"Ordinal": "0"}},
            {"Column": "event_name", "Properties": {"Ordinal": "1"}},
            {"Column": "event_type", "Properties": {"Ordinal": "2"}},
            {"Column": "ingestion_time", "Properties": {"ConstValue": "2023-01-01T10:32:00"}},
            {"Column": "source_location", "Properties": {"Transform": "SourceLocation"}}
        ]
        ```
    )

Pre-created mapping

When the mapping is pre-created, reference the mapping by name in the .ingest management command.

.ingest into Table123 (@"source1", @"source2")
    with
    (
        format="csv",
        ingestionMappingReference = "MappingName"
    )

Identity mapping

Use CSV mapping during ingestion without defining a mapping schema (see identity mapping).

.ingest into Table123 (@"source1", @"source2")
    with
    (
        format="csv"
    )

9.7.3 - Ingestion mappings

This article describes ingestion mappings.

Ingestion mappings are used during ingestion to map incoming data to columns inside tables.

Data Explorer supports different types of mappings, both row-oriented (CSV, JSON, AVRO and W3CLOGFILE), and column-oriented (Parquet and ORC).

Ingestion mappings can be defined in the ingest command, or can be precreated and referenced from the ingest command using ingestionMappingReference parameters. Ingestion is possible without specifying a mapping. For more information, see identity mapping.

Each element in the mapping list is constructed from three fields:

PropertyRequiredDescription
Column✔️Target column name in the table.
DatatypeDatatype with which to create the mapped column if it doesn’t already exist in the table.
PropertiesProperty-bag containing properties specific for each mapping as described in each specific mapping type page.

Supported mapping types

The following table defines mapping types to be used when ingesting or querying external data of a specific format.

Data FormatMapping Type
CSVCSV Mapping
TSVCSV Mapping
TSVeCSV Mapping
PSVCSV Mapping
SCSVCSV Mapping
SOHsvCSV Mapping
TXTCSV Mapping
RAWCSV Mapping
JSONJSON Mapping
AVROAVRO Mapping
APACHEAVROAVRO Mapping
ParquetParquet Mapping
ORCORC Mapping
W3CLOGFILEW3CLOGFILE Mapping

Ingestion mapping examples

The following examples use the RawEvents table with the following schema:

.create table RawEvents (timestamp: datetime, deviceId: guid, messageId: guid, temperature: decimal, humidity: decimal) 

Simple mapping

The following example shows ingestion where the mapping is defined in the ingest command. The command ingests a JSON file from a URL into the RawEvents table. The mapping specifies the path to each field in the JSON file.

.ingest into table RawEvents ('https://kustosamplefiles.blob.core.windows.net/jsonsamplefiles/simple.json') 
    with (
            format = "json",
            ingestionMapping =
            ```
            [ 
              {"column":"timestamp","Properties":{"path":"$.timestamp"}},
              {"column":"deviceId","Properties":{"path":"$.deviceId"}},
              {"column":"messageId","Properties":{"path":"$.messageId"}},
              {"column":"temperature","Properties":{"path":"$.temperature"}},
              {"column":"humidity","Properties":{"path":"$.humidity"}}
            ]
            ```
          )

Mapping with ingestionMappingReference

To map the same JSON file using a precreated mapping, create the RawEventMapping ingestion mapping reference with the following command:

.create table RawEvents ingestion json mapping 'RawEventMapping' 
  ```
  [ 
    {"column":"timestamp","Properties":{"path":"$.timestamp"}},
    {"column":"deviceId","Properties":{"path":"$.deviceId"}},
    {"column":"messageId","Properties":{"path":"$.messageId"}},
    {"column":"temperature","Properties":{"path":"$.temperature"}},
    {"column":"humidity","Properties":{"path":"$.humidity"}}
  ]
  ```

Ingest the JSON file using the RawEventMapping ingestion mapping reference with the following command:

.ingest into table RawEvents ('https://kustosamplefiles.blob.core.windows.net/jsonsamplefiles/simple.json') 
  with (
          format="json",
          ingestionMappingReference="RawEventMapping"
        )

Identity mapping

Ingestion is possible without specifying ingestionMapping or ingestionMappingReference properties. The data is mapped using an identity data mapping derived from the table’s schema. The table schema remains the same. format property should be specified. See ingestion formats.

Format typeFormatMapping logic
Tabular data formats with defined order of columns, such as delimiter-separated or single-line formats.CSV, TSV, TSVe, PSV, SCSV, Txt, SOHsv, RawAll table columns are mapped in their respective order to data columns in order they appear in the data source. Column data type is taken from the table schema.
Formats with named columns or records with named fields.JSON, Parquet, Avro, ApacheAvro, Orc, W3CLOGFILEAll table columns are mapped to data columns or record fields having the same name (case-sensitive). Column data type is taken from the table schema.

Mapping transformations

Some of the data format mappings (Parquet, JSON, and AVRO) support simple and useful ingest-time transformations. Where the scenario requires more complex processing at ingest time, use Update policy, which allows defining lightweight processing using KQL expression.

Path-dependant transformationDescriptionConditions
PropertyBagArrayToDictionaryTransforms JSON array of properties, such as {events:[{"n1":"v1"},{"n2":"v2"}]}, to dictionary and serializes it to valid JSON document, such as {"n1":"v1","n2":"v2"}.Available for JSON, Parquet, AVRO, and ORC mapping types.
SourceLocationName of the storage artifact that provided the data, type string (for example, the blob’s “BaseUri” field).Available for CSV, JSON, Parquet, AVRO, ORC, and W3CLOGFILE mapping types.
SourceLineNumberOffset relative to that storage artifact, type long (starting with ‘1’ and incrementing per new record).Available for CSV, JSON, Parquet, AVRO, ORC, and W3CLOGFILE mapping types.
DateTimeFromUnixSecondsConverts number representing unix-time (seconds since 1970-01-01) to UTC datetime string.Available for CSV, JSON, Parquet, AVRO, and ORC mapping types.
DateTimeFromUnixMillisecondsConverts number representing unix-time (milliseconds since 1970-01-01) to UTC datetime string.Available for CSV, JSON, Parquet, AVRO, and ORC mapping types.
DateTimeFromUnixMicrosecondsConverts number representing unix-time (microseconds since 1970-01-01) to UTC datetime string.Available for CSV, JSON, Parquet, AVRO, and ORC mapping types.
DateTimeFromUnixNanosecondsConverts number representing unix-time (nanoseconds since 1970-01-01) to UTC datetime string.Available for CSV, JSON, Parquet, AVRO, and ORC mapping types.
DropMappedFieldsMaps an object in the JSON document to a column and removes any nested fields already referenced by other column mappings.Available for JSON, Parquet, AVRO, and ORC mapping types.
BytesAsBase64Treats the data as byte array and converts it to a base64-encoded string.Available for AVRO mapping type. For ApacheAvro format, the schema type of the mapped data field should be bytes or fixed Avro type. For Avro format, the field should be an array containing byte values from [0-255] range. null is ingested if the data doesn’t represent a valid byte array.

Mapping transformation examples

DropMappedFields transformation:

Given the following JSON contents:

{
    "Time": "2012-01-15T10:45",
    "Props": {
        "EventName": "CustomEvent",
        "Revenue": 0.456
    }
}

The following data mapping maps entire Props object into dynamic column Props while excluding already mapped columns (Props.EventName is already mapped into column EventName, so it’s excluded).

[
    { "Column": "Time", "Properties": { "Path": "$.Time" } },
    { "Column": "EventName", "Properties": { "Path": "$.Props.EventName" } },
    { "Column": "Props", "Properties": { "Path": "$.Props", "Transform":"DropMappedFields" } },
]

The ingested data looks as follows:

TimeEventNameProps
2012-01-15T10:45CustomEvent{"Revenue": 0.456}

BytesAsBase64 transformation

Given the following AVRO file contents:

{
    "Time": "2012-01-15T10:45",
    "Props": {
        "id": [227,131,34,92,28,91,65,72,134,138,9,133,51,45,104,52]
    }
}

The following data mapping maps the ID column twice, with and without the transformation.

[
    { "Column": "ID", "Properties": { "Path": "$.props.id" } },
    { "Column": "Base64EncodedId", "Properties": { "Path": "$.props.id", "Transform":"BytesAsBase64" } },
]

The ingested data looks as follows:

IDBase64EncodedId
[227,131,34,92,28,91,65,72,134,138,9,133,51,45,104,52]44MiXBxbQUiGigmFMy1oNA==

9.7.4 - JSON Mapping

Learn how to use JSON mapping to map data to columns inside tables upon ingestion.

Use JSON mapping to map incoming data to columns inside tables when your ingestion source file is in JSON format.

Each JSON mapping element must contain either of the following optional properties:

PropertyTypeDescription
PathstringIf the value starts with $ it’s interpreted as the JSON path to the field in the JSON document that will become the content of the column in the table. The JSON path that denotes the entire document is $. If the value doesn’t start with $ it’s interpreted as a constant value. JSON paths that include special characters should be escaped as ['Property Name']. For more information, see JSONPath syntax.
ConstValuestringThe constant value to be used for a column instead of some value inside the JSON file.
TransformstringTransformation that should be applied on the content with mapping transformations.

Examples

[
  {"Column": "event_timestamp", "Properties": {"Path": "$.Timestamp"}},
  {"Column": "event_name",      "Properties": {"Path": "$.Event.Name"}},
  {"Column": "event_type",      "Properties": {"Path": "$.Event.Type"}},
  {"Column": "source_uri",      "Properties": {"Transform": "SourceLocation"}},
  {"Column": "source_line",     "Properties": {"Transform": "SourceLineNumber"}},
  {"Column": "event_time",      "Properties": {"Path": "$.Timestamp", "Transform": "DateTimeFromUnixMilliseconds"}},
  {"Column": "ingestion_time",  "Properties": {"ConstValue": "2021-01-01T10:32:00"}},
  {"Column": "full_record",     "Properties": {"Path": "$"}}
]

The mapping above is serialized as a JSON string when it’s provided as part of the .ingest management command.

.ingest into Table123 (@"source1", @"source2")
  with
  (
      format = "json",
      ingestionMapping =
      ```
      [
        {"Column": "column_a", "Properties": {"Path": "$.Obj.Property"}},
        {"Column": "column_b", "Properties": {"Path": "$.Property"}},
        {"Column": "custom_column", "Properties": {"Path": "$.[\'Property name with space\']"}}
      ]
      ```
  )

Pre-created mapping

When the mapping is pre-created, reference the mapping by name in the .ingest management command.

.ingest into Table123 (@"source1", @"source2")
    with
    (
        format="json",
        ingestionMappingReference = "Mapping_Name"
    )

Identity mapping

Use JSON mapping during ingestion without defining a mapping schema (see identity mapping).

.ingest into Table123 (@"source1", @"source2")
    with
    (
        format="json"
    )

Copying JSON mapping

You can copy JSON mapping of an existing table and create a new table with the same mapping using the following process:

  1. Run the following command on the table whose mapping you want to copy:

    .show table TABLENAME ingestion json mappings
    | extend formatted_mapping = strcat("'",replace_string(Mapping, "'", "\\'"),"'")
    | project formatted_mapping
    
  2. Use the output of the above command to create a new table with the same mapping:

    .create table TABLENAME ingestion json mapping "TABLENAME_Mapping" RESULT_OF_ABOVE_CMD
    

9.7.5 - ORC Mapping

Learn how to use ORC mapping to map data to columns inside tables upon ingestion.

Use ORC mapping to map incoming data to columns inside tables when your ingestion source file is in ORC format.

Each ORC mapping element must contain either of the following optional properties:

PropertyTypeDescription
FieldstringName of the field in the ORC record.
PathstringIf the value starts with $ it’s interpreted as the path to the field in the ORC document that will become the content of the column in the table. The path that denotes the entire ORC record is $. If the value doesn’t start with $ it’s interpreted as a constant value. Paths that include special characters should be escaped as ['Property Name']. For more information, see JSONPath syntax.
ConstValuestringThe constant value to be used for a column instead of some value inside the ORC file.
TransformstringTransformation that should be applied on the content with mapping transformations.

Examples

[
  {"Column": "event_timestamp", "Properties": {"Path": "$.Timestamp"}},
  {"Column": "event_name",      "Properties": {"Path": "$.Event.Name"}},
  {"Column": "event_type",      "Properties": {"Path": "$.Event.Type"}},
  {"Column": "event_time",      "Properties": {"Path": "$.Timestamp", "Transform": "DateTimeFromUnixMilliseconds"}},
  {"Column": "ingestion_time",  "Properties": {"ConstValue": "2021-01-01T10:32:00"}},
  {"Column": "full_record",     "Properties": {"Path": "$"}}
]

The mapping above is serialized as a JSON string when it’s provided as part of the .ingest management command.

.ingest into Table123 (@"source1", @"source2")
  with
  (
      format = "orc",
      ingestionMapping =
      ```
      [
        {"Column": "column_a", "Properties": {"Path": "$.Field1"}},
        {"Column": "column_b", "Properties": {"Path": "$.[\'Field name with space\']"}}
      ]
      ```
  )

Pre-created mapping

When the mapping is pre-created, reference the mapping by name in the .ingest management command.

.ingest into Table123 (@"source1", @"source2")
    with
    (
        format="orc",
        ingestionMappingReference = "ORC_Mapping"
    )

Identity mapping

Use ORC mapping during ingestion without defining a mapping schema (see identity mapping).

.ingest into Table123 (@"source1", @"source2")
    with
    (
        format="orc"
    )

9.7.6 - Parquet Mapping

Learn how to use Parquet mapping to map data to columns inside tables upon ingestion and optimize data processing in Kusto.

Use Parquet mapping to map incoming data to columns inside tables when your ingestion source file is in Parquet format.

Each Parquet mapping element must contain either of the following optional properties:

PropertyTypeDescription
FieldstringName of the field in the Parquet record.
PathstringIf the value starts with $ it’s interpreted as the path to the field in the Parquet document that will become the content of the column in the table. The path that denotes the entire Parquet record is $. If the value doesn’t start with $ it’s interpreted as a constant value. Paths that include special characters should be escaped as ['Property Name']. For more information, see JSONPath syntax.
ConstValuestringThe constant value to be used for a column instead of some value inside the Parquet file.
TransformstringTransformation that should be applied on the content with mapping transformations.

Parquet type conversions

Comprehensive support is provided for converting data types when you’re ingesting or querying data from a Parquet source.

The following table provides a mapping of Parquet field types, and the table column types they can be converted to. The first column lists the Parquet type, and the others show the table column types they can be converted to.

Parquet typeboolintlongrealdecimaldatetimetimespanstringguiddynamic
INT8✔️✔️✔️✔️✔️✔️
INT16✔️✔️✔️✔️✔️✔️
INT32✔️✔️✔️✔️✔️✔️
INT64✔️✔️✔️✔️✔️✔️
UINT8✔️✔️✔️✔️✔️✔️
UINT16✔️✔️✔️✔️✔️✔️
UINT32✔️✔️✔️✔️✔️✔️
UINT64✔️✔️✔️✔️✔️
FLOAT32✔️✔️✔️✔️✔️✔️
FLOAT64✔️✔️✔️✔️✔️✔️
BOOLEAN✔️✔️✔️
DECIMAL (I32)✔️✔️✔️✔️✔️✔️
DECIMAL (I64)✔️✔️✔️✔️✔️✔️
DECIMAL (FLBA)✔️✔️✔️✔️
DECIMAL (BA)✔️✔️✔️✔️✔️✔️
TIMESTAMP✔️✔️
DATE✔️✔️
STRING✔️✔️✔️✔️✔️✔️
UUID✔️✔️
JSON✔️✔️
LIST✔️
MAP✔️
STRUCT✔️

Examples

[
  {"Column": "event_timestamp", "Properties": {"Path": "$.Timestamp"}},
  {"Column": "event_name",      "Properties": {"Path": "$.Event.Name"}},
  {"Column": "event_type",      "Properties": {"Path": "$.Event.Type"}},
  {"Column": "event_time",      "Properties": {"Path": "$.Timestamp", "Transform": "DateTimeFromUnixMilliseconds"}},
  {"Column": "ingestion_time",  "Properties": {"ConstValue": "2021-01-01T10:32:00"}},
  {"Column": "full_record",     "Properties": {"Path": "$"}}
]

The mapping above is serialized as a JSON string when it’s provided as part of the .ingest management command.

.ingest into Table123 (@"source1", @"source2")
  with
  (
    format = "parquet",
    ingestionMapping =
    ```
    [
      {"Column": "column_a", "Properties": {"Path": "$.Field1.Subfield"}},
      {"Column": "column_b", "Properties": {"Path": "$.[\'Field name with space\']"}},
    ]
    ```
  )

Pre-created mapping

When the mapping is pre-created, reference the mapping by name in the .ingest management command.

.ingest into Table123 (@"source1", @"source2")
  with
  (
      format="parquet",
      ingestionMappingReference = "Mapping_Name"
  )

Identity mapping

Use Parquet mapping during ingestion without defining a mapping schema (see identity mapping).

.ingest into Table123 (@"source1", @"source2")
  with
  (
    format="parquet"
  )

9.7.7 - W3CLOGFILE Mapping

Learn how to use W3CLOGFILE mapping to map data to columns inside tables upon ingestion.

Use W3CLOGFILE mapping to map incoming data to columns inside tables when your ingestion source file is in W3CLOGFILE format.

Each W3CLOGFILE mapping element must contain either of the following optional properties:

PropertyTypeDescription
FieldstringName of the field in the W3CLOGFILE log record.
ConstValuestringThe constant value to be used for a column instead of some value inside the W3CLOGFILE file.
TransformstringTransformation that should be applied on the content with mapping transformations.

Examples

[
   {"Column": "Date",          "Properties": {"Field": "date"}},
   {"Column": "Time",          "Properties": {"Field": "time"}},
   {"Column": "IP",            "Properties": {"Field": "s-ip"}},
   {"Column": "ClientMethod",  "Properties": {"Field": "cs-method"}},
   {"Column": "ClientQuery",   "Properties": {"Field": "cs-uri-query"}},
   {"Column": "ServerPort",    "Properties": {"Field": "s-port"}},
   {"Column": "ClientIP",      "Properties": {"Field": "c-ip"}},
   {"Column": "UserAgent",     "Properties": {"Field": "cs(User-Agent)"}},
   {"Column": "Referer",       "Properties": {"Field": "cs(Referer)"}},
   {"Column": "Status",        "Properties": {"Field": "sc-status"}},
   {"Column": "ResponseBytes", "Properties": {"Field": "sc-bytes"}},
   {"Column": "RequestBytes",  "Properties": {"Field": "cs-bytes"}},
   {"Column": "TimeTaken",     "Properties": {"Field": "time-taken"}}
]

The mapping above is serialized as a JSON string when it’s provided as part of the .ingest management command.

.ingest into Table123 (@"source1", @"source2")
  with
  (
      format = "w3clogfile",
      ingestionMapping =
      ```
      [
         {"Column": "column_a", "Properties": {"Field": "field1"}},
         {"Column": "column_b", "Properties": {"Field": "field2"}}
      ]
      ```
  )

Pre-created mapping

When the mapping is pre-created, reference the mapping by name in the .ingest management command.

.ingest into Table123 (@"source1", @"source2")
    with
    (
        format="w3clogfile",
        ingestionMappingReference = "Mapping_Name"
    )

Identity mapping

Use W3CLOGFILE mapping during ingestion without defining a mapping schema (see identity mapping).

.ingest into Table123 (@"source1", @"source2")
    with
    (
        format="w3clogfile"
    )

9.8 - Manage external table mappings

9.9 - Materialized views

9.9.1 - Materialized views

This article describes materialized views.

Materialized views expose an aggregation query over a source table, or over another materialized view.

Materialized views always return an up-to-date result of the aggregation query (always fresh). Querying a materialized view is more performant than running the aggregation directly over the source table.

Why use materialized views?

By investing resources (data storage, background CPU cycles) for materialized views of commonly used aggregations, you get the following benefits:

  • Performance improvement: Querying a materialized view commonly performs better than querying the source table for the same aggregation function(s).

  • Freshness: A materialized view query always returns the most up-to-date results, independent of when materialization last took place. The query combines the materialized part of the view with the records in the source table, which haven’t yet been materialized (the delta part), always providing the most up-to-date results.

  • Cost reduction: Querying a materialized view consumes less resources than doing the aggregation over the source table. Retention policy of source table can be reduced if only aggregation is required. This setup reduces hot cache costs for the source table.

For example use cases, see Materialized view use cases.

How materialized views work

A materialized view is made of two components:

  • A materialized part - a table holding aggregated records from the source table, which have already been processed. This table always holds a single record per the aggregation’s group-by combination.
  • A delta - the newly ingested records in the source table that haven’t yet been processed.

Querying the materialized view combines the materialized part with the delta part, providing an up-to-date result of the aggregation query. The offline materialization process ingests new records from the delta to the materialized table, and updates existing records. If the intersection between the delta and the materialized part is large, and many records require updates, this might have a negative impact on the materialization process. See monitor materialized views on how to troubleshoot such situations.

Materialized views queries

There are 2 ways to query a materialized view:

  • Query the entire view: when you query the materialized view by its name, similarly to querying a table, the materialized view query combines the materialized part of the view with the records in the source table that haven’t been materialized yet (the delta).

    • Querying the materialized view always returns the most up-to-date results, based on all records ingested to the source table. For more information about the materialized vs. non-materialized parts in materialized view, see how materialized views work.
    • This option might not perform best as it needs to materialize the delta part during query time. Performance in this case depends on the view’s age and the filters applied in the query. The materialized view query optimizer section includes possible ways to improve query performance when querying the entire view.
  • Query the materialized part only: another way of querying the view is by using the materialized_view() function. This option supports querying only the materialized part of the view, while specifying the max latency the user is willing to tolerate.

    • This option isn’t guaranteed to return the most up-to-date records, but it should always be more performant than querying the entire view.
    • This function is useful for scenarios in which you’re willing to sacrifice some freshness for performance, for example for telemetry dashboards.
  • Materialized views participate in cross-cluster or cross-database queries, but aren’t included in wildcard unions or searches.

    • The following examples all include materialized views by the name ViewName:
    cluster('cluster1').database('db').ViewName
    cluster('cluster1').database('*').ViewName
    database('*').ViewName
    database('DB*').ViewName
    database('*').materialized_view('ViewName')
    database('DB*').materialized_view('ViewName')
    
    • The following examples do not include records from materialized views:
    cluster('cluster1').database('db').*
    database('*').View*
    search in (*)
    search * 
    
  • Materialized views participate in cross-Eventhouse or cross-database queries, but aren’t included in wildcard unions or searches.

    • The following examples all include materialized views by the name ViewName:
    cluster("<serviceURL>").database('db').ViewName
    cluster("<serviceURL>").database('*').ViewName
    database('*').ViewName
    database('DB*').ViewName
    database('*').materialized_view('ViewName')
    database('DB*').materialized_view('ViewName')
    
    • The following examples do not include records from materialized views:
    cluster("<serviceURL>").database('db').*
    database('*').View*
    search in (*)
    search * 
    

Materialized view query optimizer

When querying the entire view, the materialized part is combined with the delta during query time. This includes aggregating the delta and joining it with the materialized part.

  • Querying the entire view performs better if the query includes filters on the group by keys of the materialized view query. See more tips about how to create your materialized view, based on your query pattern, in the .create materialized-view performance tips section.
  • The query optimizer chooses summarize/join strategies that are expected to improve query performance. For example, the decision on whether to shuffle the query is based on number of records in delta part. The following client request properties provide some control over the optimizations applied. You can test these properties with your materialized view queries and evaluate their impact on queries performance.
Client request property nameTypeDescription
materialized_view_query_optimization_costbased_enabledboolIf set to false, disables summarize/join optimizations in materialized view queries. Uses default strategies. Default is true.
materialized_view_shuffledynamicForce shuffling of the materialized view query, and (optionally) provide specific keys to shuffle by. See examples below.

ingestion_time() function in the context of materialized views

ingestion_time() function returns null values, when used in the context of a materialized view, if querying the entire view. When querying the materialized part of the view, the return value depends on the type of materialized view:

  • In materialized views which include a single arg_max()/arg_min()/take_any() aggregation, the ingestion_time() is equal to the ingestion_time() of the corresponding record in the source table.
  • In all other materialized views, the value of ingestion_time() is approximately the time of materialization (see how materialized views work).

Examples

  1. Query the entire view. The most recent records in source table are included:

    ViewName
    
  2. Query the materialized part of the view only, regardless of when it was last materialized.

    materialized_view("ViewName")
    
  3. Query the entire view, and provide a “hint” to use shuffle strategy. The most recent records in source table are included:

    • Example #1: shuffle based on the Id column (similarly to using hint.shufflekey=Id):
    set materialized_view_shuffle = dynamic([{"Name" : "ViewName", "Keys" : [ "Id" ] }]);
    ViewName
    
    • Example #2: shuffle based on all keys (similarly to using hint.strategy=shuffle):
    set materialized_view_shuffle = dynamic([{"Name" : "ViewName" }]);
    ViewName
    

Performance considerations

The main contributors that can impact a materialized view health are:

  • Cluster resources: Like any other process running on the cluster, materialized views consume resources (CPU, memory) from the cluster. If the cluster is overloaded, adding materialized views to it may cause a degradation in the cluster’s performance. Monitor your cluster’s health using cluster health metrics. Optimized autoscale currently doesn’t take materialized views health under consideration as part of autoscale rules.

  • Overlap with materialized data: During materialization, all new records ingested to the source table since the last materialization (the delta) are processed and materialized into the view. The higher the intersection between new records and already materialized records is, the worse the performance of the materialized view will be. A materialized view works best if the number of records being updated (for example, in arg_max view) is a small subset of the source table. If all or most of the materialized view records need to be updated in every materialization cycle, then the materialized view might not perform well.

  • Ingestion rate: There are no hard-coded limits on the data volume or ingestion rate in the source table of the materialized view. However, the recommended ingestion rate for materialized views is no more than 1-2GB/sec. Higher ingestion rates may still perform well. Performance depends on database size, available resources, and amount of intersection with existing data.

  • Number of materialized views in cluster: The above considerations apply to each individual materialized view defined in the cluster. Each view consumes its own resources, and many views compete with each other on available resources. While there are no hard-coded limits to the number of materialized views in a cluster, the cluster may not be able to handle all materialized views, when there are many defined. The capacity policy can be adjusted if there is more than a single materialized view in the cluster. Increase the value of ClusterMinimumConcurrentOperations in the policy to run more materialized views concurrently.

  • Materialized view definition: The materialized view definition must be defined according to query best practices for best query performance. For more information, see create command performance tips.

Materialized view over materialized view

A materialized view can be created over another materialized view if the source materialized view is a deduplication view. Specifically, the aggregation of the source materialized view must be take_any(*) in order to deduplicate source records. The second materialized view can use any supported aggregation functions. For specific information on how to create a materialized view over a materialized view, see .create materialized-view command.

9.9.2 - Materialized views data purge

This article describes materialized views data purge.

Data purge commands can be used to purge records from materialized views. The same guidelines for purging records from a table apply to materialized views purge.

The purge command only deletes records from the materialized part of the view (what is the materialized part?). Therefore, if the source table of the materialized view includes records to purge, these records may be returned from the materialized view query, even after purge completed successfully.

The recommended process for purging records from a materialized view is:

  1. Purge the source table of the materialized view.
  2. After the source table purge is completed successfully, purge the materialized view.

Limitations

The purge predicate of a materialized view purge can only reference the group by keys of the aggregation, or any column in a arg_max()/arg_min() /take_any() view. It cannot reference other aggregation functions result columns.

For example, for a materialized view MV, which is defined with the following aggregation function:

T | summarize count(), avg(Duration) by UserId

The following purge predicate isn’t valid, since it references the result of the avg() aggregation:

MV | where avg_Duration > 1h

9.9.3 - Materialized views limitations

This article describes materialized views limitations.

The materialized view source

  • The source table of a materialized view:
    • Must be a table into which data is directly ingested, using an update policy, or ingest from query commands.
      • Using move extents or replace extents from other tables to the source table of the materialized view is only supported if using setNewIngestionTime property as part of the move extents command (refer to .move extents and .replace extents commands for more details).
      • Moving extents to the source table of a materialized view, while not using setNewIngestionTime can cause the move to fail with one of the following errors:
        • Cannot drop/move extents from/to table 'TableName' since Materialized View 'ViewName' is currently processing some of these extents.
        • Cannot move extents to 'TableName' since materialized view 'ViewName' will not process these extents (can lead to data loss in the materialized view).
  • The source table of a materialized view must have IngestionTime policy enabled. This policy is enabled by default.
  • If the materialized view uses a default lookback, the ingestion_time() must be preserved in the materialized view’s query. Operators such as mv-expand or pivot plugin don’t preserve the ingestion_time(), so they can’t be used in a materialized view with a lookback. For more information, see Lookback period.
  • The source table of a materialized view can’t be a table with a restricted view access policy.
  • A materialized view can’t be created on top of another materialized view, unless the first materialized view is of type take_any(*) aggregation. See materialized view over materialized view.
  • Materialized views can’t be defined over external tables.

Impact of records ingested to or dropped from the source table

  • A materialized view only processes new records ingested into the source table. Records that are removed from the source table, either by running data purge/soft delete/drop extents, or due to retention policy or any other reason, have no impact on the materialized view.
  • The materialized view has its own retention policy, which is independent of the retention policy of the source table. The materialized view might include records that aren’t present in the source table.

Follower databases

  • Materialized views can’t be created in follower databases. Follower databases are read-only and materialized views require write operations.
  • Materialized views can’t be created in database shortcuts. Database shortcuts are read-only and materialized views require write operations.
  • Materialized views that are defined on leader databases can be queried from their followers, like any other table in the leader.
  • Use the leader cluster to monitor follower database materialized views. For more information, see Materialized views in follower databases.
  • Use the source Eventhouse to monitor shortcut database materialized views. For more information, see Monitor materialized views.

Other

  • Cursor functions can’t be used on top of materialized views.
  • Continuous export from a materialized view isn’t supported.

9.9.4 - Materialized views policies

This article describes materialized views policies.

This article includes information about policies that can be set on materialized views.

Retention and caching policy

A materialized view has a retention policy and caching policy. The materialized view derives the database retention and caching policies by default. These policies can be changed using retention policy management commands or caching policy management commands.

Both policies are applied on the materialized part of the materialized view only. For an explanation of the differences between the materialized part and delta part, see how materialized views work. For example, if the caching policy of a materialized view is set to 7d, but the caching policy of its source table is set to 0d, there may still be disk misses when querying the materialized view. This behavior occurs because the source table (delta part) also participates in the query.

The retention policy of the materialized view is unrelated to the retention policy of the source table. Retention policy of source table can be shorter than the retention policy of the materialized view, if source records are required for a shorter period. We recommend a minimum retention policy of at least few days, and recoverability set to true on the source table. This setting allows for fast recovery for errors and for diagnostic purposes.

The retention and caching policies both depend on Extent Creation time. The last update for a record determines the extent creation time for a materialized view.

Partitioning policy

A partitioning policy can be applied on a materialized view. We recommend configuring a partitioning policy on a materialized view only when most or all of the view queries filter by one of the materialized view’s group-by keys. This situation is common in multi-tenant solutions, where one of the materialized view’s group-by keys is the tenant’s identifier (for example, tenantId, customerId). For more information, see the first use case described in the partitioning policy supported scenarios page.

For the commands to alter a materialized view’s partitioning policy, see partitioning policy commands.

Adding a partitioning policy on a materialized view increases the number of extents in the materialized view, and creates more “work” for the materialization process. For more information on the reason for this behavior, see the extents rebuild process mentioned in how materialized views work.

Row level security policy

A row level security can be applied on a materialized view, with several limitations:

  • The policy can be applied only to materialized views with arg_max()/arg_min()/take_any() aggregation functions, or when the row level security query references the group by keys of the materialized view aggregation.
  • The policy is applied to the materialized part of the view only.
    • If the same row level security policy isn’t defined on the source table of the materialized view, then querying the materialized view may return records that should be hidden by the policy. This happens because querying the materialized view queries the source table as well.
    • We recommend defining the same row level security policy both on the source table and the materialized view if the view is an arg_max() or arg_min()/take_any().
  • When defining a row level security policy on the source table of an arg_max() or arg_min()/take_any() materialized view, the command fails if there’s no row level security policy defined on the materialized view itself. The purpose of the failure is to alert the user of a potential data leak, since the materialized view may expose information. To mitigate this error, do one of the following actions:
    • Define the row level security policy over the materialized view.
    • Choose to ignore the error by adding allowMaterializedViewsWithoutRowLevelSecurity property to the alter policy command. For example:
    .alter table SourceTable policy row_level_security enable with (allowMaterializedViewsWithoutRowLevelSecurity=true) "RLS_function"

For commands for configuring a row level security policy on a materialized view, see row_level_security policy commands.

9.9.5 - Materialized views use cases

Learn about common and advanced use cases for materialized views.

Materialized views expose an aggregation query over a source table or another materialized view. This article covers common and advanced use cases for materialized views.

Common use cases

The following are common scenarios that can be addressed by using a materialized view:

  • Update data: Update data by returning the last record per entity using arg_max() (aggregation function). For example, create a view that only materializes records ingested from now on:

    .create materialized-view ArgMax on table T
    {
        T | summarize arg_max(Timestamp, *) by User
    }
    
  • Reduce the resolution of data Reduce the resolution of data by calculating periodic statistics over the raw data. Use various aggregation functions by period of time. For example, maintain an up-to-date snapshot of distinct users per day:

    .create materialized-view UsersByDay on table T
    {
        T | summarize dcount(User) by bin(Timestamp, 1d)
    }
    
  • Deduplicate records: Deduplicate records in a table using take_any() (aggregation function). For example, create a materialized view that deduplicates the source table based on the EventId column, using a lookback of 6 hours. Records are deduplicated against only records ingested 6 hours before current records.

    .create materialized-view with(lookback=6h) DeduplicatedTable on table T
    {
        T
        | summarize take_any(*) by EventId
    }
    

    [!NOTE] You can conceal the source table by creating a function with the same name as the table that references the materialized view instead. This pattern ensures that callers querying the table access the deduplicated materialized view because functions override tables with the same name. To avoid cyclic references in the view definition, use the table() function to reference the source table:

    .create materialized-view DeduplicatedTable on table T
    {
        table('T')
        | summarize take_any(*) by EventId
    }
    

For more examples, see the .create materialized-view command.

Advanced scenario

You can use a materialized view for create/update/delete event processing. For records with incomplete or outdated information in each column, a materialized view can provide the latest updates for each column, excluding entities that were deleted.

Consider the following input table named Events:

Input

TimestampcudIDcol1col2col3
2023-10-24 00:00:00.0000000C112
2023-10-24 01:00:00.0000000U12233
2023-10-24 02:00:00.0000000U123
2023-10-24 00:00:00.0000000C212
2023-10-24 00:10:00.0000000U24
2023-10-24 02:00:00.0000000D2

Create a materialized view to get the latest update per column, using the arg_max() aggregation function:

.create materialized-view ItemHistory on table Events
{
    Events
    | extend Timestamp_col1 = iff(isnull(col1), datetime(1970-01-01), Timestamp),
                Timestamp_col2 = iff(isnull(col2), datetime(1970-01-01), Timestamp),
                Timestamp_col3 = iff(isnull(col3), datetime(1970-01-01), Timestamp)
    | summarize arg_max(Timestamp_col1, col1), arg_max(Timestamp_col2, col2), arg_max(Timestamp_col3, col3), arg_max(Timestamp, cud) by id
}

Output

IDTimestamp_col1col1Timestamp_col2col2Timestamp_col3col3Timestampcud
22023-10-24 00:00:00.000000012023-10-24 00:10:00.000000041970-01-01 00:00:00.00000002023-10-24 02:00:00.0000000D
12023-10-24 00:00:00.000000012023-10-24 02:00:00.0000000232023-10-24 01:00:00.0000000332023-10-24 02:00:00.0000000U

You can create a stored function to further clean the results:

ItemHistory
| project Timestamp, cud, id, col1, col2, col3
| where cud != "D"
| project-away cud

Final Output

The latest update for each column for ID 1, since ID 2 was deleted.

TimestampIDcol1col2col3
2023-10-24 02:00:00.0000000112333

Materialized views vs. update policies

Materialized views and update policies work differently and serve different use cases. Use the following guidelines to identify which one you should use:

  • Materialized views are suitable for aggregations, while update policies aren’t. Update policies run separately for each ingestion batch, and therefore can only perform aggregations within the same ingestion batch. If you require an aggregation query, always use materialized views.

  • Update policies are useful for data transformations, enrichments with dimension tables (usually using lookup operator) and other data manipulations that can run in the scope of a single ingestion.

  • Update policies run during ingestion time. Data isn’t available for queries in the source table or the target table until all update policies run. Materialized views, on the other hand, aren’t part of the ingestion pipeline. The materialization process runs periodically in the background, post ingestion. Records in source table are available for queries before they’re materialized.

  • Both update policies and materialized views can incorporate joins, but their effectiveness is limited to specific scenarios. Specifically, joins are suitable only when the data required for the join from both sides is accessible at the time of the update policy or materialization process. If matching entities are ingested when the update policy or materialization runs, there’s a risk of overlooking data. See more about dimension tables in materialized view query parameter and in fact and dimension tables.

9.9.6 - Monitor materialized views

This article describes how to monitor materialized views.

Monitor the materialized view’s health in the following ways:

Troubleshooting unhealthy materialized views

If the MaterializedViewAge metric constantly increases, and the MaterializedViewHealth metric shows that the view is unhealthy, follow these recommendations to identify the root cause:

  • Check the number of materialized views on the cluster, and the current capacity for materialized views:

    .show capacity 
    | where Resource == "MaterializedView"
    | project Resource, Total, Consumed
    

    Output

    ResourceTotalConsumed
    MaterializedView10
    • The number of materialized views that can run concurrently depends on the capacity shown in the Total column, while the Consumed column shows the number of materialized views currently running. You can use the Materialized views capacity policy to specify the minimum and maximum number of concurrent operations, overriding the system’s default concurrency level. The system determines the current concurrency, shown in Total, based on the cluster’s available resources. The following example overrides the system’s decision and changes the minimum concurrent operations from one to three:
    .alter-merge cluster policy capacity '{  "MaterializedViewsCapacity": { "ClusterMinimumConcurrentOperations": 3 } }'
    
    • If you explicitly change this policy, monitor the cluster’s health and ensure that other workloads aren’t affected by this change.
  • Check if there are failures during the materialization process using .show materialized-view failures.

    • If the error is permanent, the system automatically disables the materialized view. To check if it’s disabled, use the .show materialized-view command and see if the value in the IsEnabled column is false. Then check the Journal for the disabled event with the .show journal command. An example of a permanent failure is a source table schema change that makes it incompatible with the materialized view. For more information, see .create materialized-view command.
    • If the failure is transient, the system automatically retries the operation. However, the failure can delay the materialization and increase the age of the materialized view. This type of failure occurs, for example, when hitting memory limits or with a query time-out. See the following recommendations for more ways to troubleshoot transient failures.
  • Analyze the materialization process using the .show commands-and-queries command. Replace Databasename and ViewName to filter for a specific view:

    .show commands-and-queries 
    | where Database  == "DatabaseName" and ClientActivityId startswith "DN.MaterializedViews;ViewName;"
    
    • Check the memory consumption in the MemoryPeak column to identify any operations that failed due to hitting memory limits, such as, runaway queries. By default, the materialization process is limited to a 15-GB memory peak per node. If the queries or commands executed during the materialization process exceed this value, the materialization fails due to memory limits. To increase the memory peak per node, alter the $materialized-views workload group. The following example alters the materialized views workload group to use a maximum of 64-GB memory peak per node during materialization:
    .alter-merge workload_group ['$materialized-views'] ```
    {
      "RequestLimitsPolicy": {
        "MaxMemoryPerQueryPerNode": {
          "Value": 68719241216
        }
      }
    } 
    

    [!NOTE] MaxMemoryPerQueryPerNode can’t exceed 50% of the total memory available on each node.

    • Check if the materialization process is hitting cold cache. The following example shows cache statistics over the past day for the materialized view, ViewName:
    .show commands-and-queries 
    | where ClientActivityId startswith "DN.MaterializedViews;ViewName"
    | where StartedOn > ago(1d)
    | extend HotCacheHits = tolong(CacheStatistics.Shards.Hot.HitBytes), 
             HotCacheMisses = tolong(CacheStatistics.Shards.Hot.MissBytes), 
             HotCacheRetrieved = tolong(CacheStatistics.Shards.Hot.RetrieveBytes), 
             ColdCacheHits = tolong(CacheStatistics.Shards.Cold.HitBytes), 
             ColdCacheMisses = tolong(CacheStatistics.Shards.Cold.MissBytes), 
             ColdCacheRetrieved = tolong(CacheStatistics.Shards.Cold.RetrieveBytes)
    | summarize HotCacheHits = format_bytes(sum(HotCacheHits)), 
                HotCacheMisses = format_bytes(sum(HotCacheMisses)),
                HotCacheRetrieved = format_bytes(sum(HotCacheRetrieved)), 
                ColdCacheHits =format_bytes(sum(ColdCacheHits)), 
                ColdCacheMisses = format_bytes(sum(ColdCacheMisses)),
                ColdCacheRetrieved = format_bytes(sum(ColdCacheRetrieved))
    

    Output

    HotCacheHitsHotCacheMissesHotCacheRetrievedColdCacheHitsColdCacheMissesColdCacheRetrieved
    26 GB0 Bytes0 Bytes1 GB0 Bytes866 MB
    * If the view isn’t fully in the hot cache, materialization can experience disk misses, significantly slowing down the process.
    
    * Increasing the caching policy for the materialized view helps avoid cache misses. For more information, see [hot and cold cache and caching policy](..//Management/Policies/Update/Update%20policy/Update%20policy.md) and [.alter materialized-view policy caching command](..//Management/Policies/Merge/.alter%20materialized-view%20policy%20merge%20command/.alter%20materialized-view%20policy%20merge%20command.md).  
    
    • Check if the materialization is scanning old records by checking the ScannedExtentsStatistics with the .show queries command. If the number of scanned extents is high and the MinDataScannedTime is old, the materialization cycle needs to scan all, or most, of the materialized part of the view. The scan is needed to find intersections with the delta. For more information about the delta and the materialized part, see How materialized views work. The following recommendations provide ways to reduce the amount of data scanned in materialized cycles by minimizing the intersection with the delta.
  • If the materialization cycle scans a large amount of data, potentially including cold cache, consider making the following changes to the materialized view definition:

    • Include a datetime group-by key in the view definition. This can significantly reduce the amount of data scanned, as long as there is no late arriving data in this column. For more information, see Performance tips. You need to create a new materialized view since updates to group-by keys aren’t supported.
    • Use a lookback as part of the view definition. For more information, see .create materialized view supported properties.
  • Check whether there’s enough ingestion capacity by verifying if either the MaterializedViewResult metric or IngestionUtilization metric show InsufficientCapacity values. You can increase ingestion capacity by scaling the available resources (preferred) or by altering the ingestion capacity policy.

  • Check whether there’s enough ingestion capacity by verifying if the MaterializedViewResult metric shows InsufficientCapacity values. You can increase ingestion capacity by scaling the available resources.

  • If the materialized view is still unhealthy, then the service doesn’t have sufficient capacity or resources to materialize all the data on time. Consider the following options:

    • Scale out the cluster by increasing the minimum instance count. Optimized autoscale doesn’t take materialized views into consideration and doesn’t scale out the cluster automatically if materialized views are unhealthy. You need to set the minimum instance count to provide the cluster with more resources to accommodate materialized views.
    • Scale out the Eventhouse to provide it with more resources to accommodate materialized views. For more information, see Enable minimum consumption.
    • Divide the materialized view into several smaller views, each covering a subset of the data. For instance, you can split them based on a high cardinality key from the materialized view’s group-by keys. All views are based on the same source table, and each view filters by SourceTable | where hash(key, number_of_views) == i, where i is part of the set {0,1,…,number_of_views-1}. Then, you can define a stored function that unions all the smaller materialized views. Use this function in queries to access the combined data.

    While splitting the view might increase CPU usage, it reduces the memory peak in materialization cycles. Reducing the memory peak can help if the single view is failing due to memory limits.

MaterializedViewResult metric

The MaterializedViewResult metric provides information about the result of a materialization cycle and can be used to identify issues in the materialized view health status. The metric includes the Database and MaterializedViewName and a Result dimension.

The Result dimension can have one of the following values:

  • Success: The materialization completed successfully.

  • SourceTableNotFound: The source table of the materialized view was dropped, so the materialized view is disabled automatically.

  • SourceTableSchemaChange: The schema of the source table changed in a way that isn’t compatible with the materialized view definition. Since the materialized view query no longer matches the materialized view schema, the materialized view is disabled automatically.

  • InsufficientCapacity: The instance doesn’t have sufficient capacity to materialize the materialized view, due to a lack of ingestion capacity. While insufficient capacity failures can be transient, if they reoccur often, try scaling out the instance or increasing the relevant capacity in the policy.

  • InsufficientCapacity: The instance doesn’t have sufficient capacity to materialize the materialized view, due to a lack of ingestion capacity. While insufficient capacity failures can be transient, if they reoccur often, try scaling out the instance or increasing capacity. For more information, see Plan your capacity size.

  • InsufficientResources: The database doesn’t have sufficient resources (CPU/memory) to materialize the materialized view. While insufficient resource errors might be transient, if they reoccur often, try scaling up or scaling out. For more ideas, see Troubleshooting unhealthy materialized views.

Materialized views in follower databases

Materialized views can be defined in follower databases. However, the monitoring of these materialized views should be based on the leader database, where the materialized view is defined. Specifically:

  • Metrics related to materialized view execution (MaterializedViewResult, MaterializedViewExtentsRebuild) are only present in the leader database. Metrics related to monitoring (MaterializedViewAgeSeconds, MaterializedViewHealth, MaterializedViewRecordsInDelta) also appear in the follower databases.
  • The .show materialized-view failures command only works in the leader database.

Track resource consumption

Materialized views resource consumption: the resources consumed by the materialized views materialization process can be tracked using the .show commands-and-queries command. Filter the records for a specific view using the following (replace DatabaseName and ViewName):

.show commands-and-queries 
| where Database  == "DatabaseName" and ClientActivityId startswith "DN.MaterializedViews;ViewName;"

9.10 - Stored query results

9.10.1 - Stored query results

Learn how to manage stored query results.

Stored query results store the result of a query on the service for up to 24 hours. The same principal identity that created the stored query can reference the results in later queries.

Stored query results can be useful in the following scenarios:

  • Paging through query results. The initial command runs the query and returns the first “page” of records. Later queries reference other “pages” without the need to rerun the query.
  • Drill-down scenarios, in which the results of an initial query are then explored using other queries.

Updates to security policies, such as database access and row level security, aren’t propagated to stored query results. Use .drop stored_query_results if there’s user permission revocation.

Stored query results behave like tables, in that the order of records isn’t preserved. To paginate through the results, we recommended that the query includes unique ID columns. If a query returns multiple result sets, only the first result set is stored.

The following table lists the management commands and functions used for managing stored query results:

CommandDescription
.set stored_query_result commandCreates a stored query result to store the results of a query on the service for up to 24 hours.
.show stored_query_result commandShows information on active query results.
.drop stored_query_result commandDeletes active query results.
stored_query_result()Retrieves a stored query result.

9.11 - Tables

9.11.1 - Tables management

Learn how to use table management commands to display, create, and alter tables.

This topic discusses the life cycle of tables and associated management commands that are helpful for exploring, creating and altering tables.

Select the links in the table below for more information about them.

For information on optimizing table schema, see Schema optimization best practices.

CommandsOperation
.alter table docstring, .alter table folderManage table display properties
.create ingestion mapping, .show ingestion mappings, .alter ingestion mapping, .drop ingestion mappingManage ingestion mapping
.create tables, .create table, .create-merge tables, .create-merge table, .alter table, .alter-merge table, .drop tables, .drop table, .undo drop table, .rename tableCreate/modify/drop tables
.show tables .show table details.show table schemaEnumerate tables in a database
.ingest, .set, .append, .set-or-append (see Data ingestion overview).Data ingestion into a table
.clear table dataClears all the data of a table

CRUD naming conventions for tables

(See full details in the sections linked to in the table, above.)

Command syntaxSemantics
.create entityType entityName ...If an entity of that type and name exists, returns the entity. Otherwise, create the entity.
.create-merge entityType entityName...If an entity of that type and name exists, merge the existing entity with the specified entity. Otherwise, create the entity.
.alter entityType entityName ...If an entity of that type and name does not exist, error. Otherwise, replace it with the specified entity.
.alter-merge entityType entityName ...If an entity of that type and name does not exist, error. Otherwise, merge it with the specified entity.
.drop entityType entityName ...If an entity of that type and name does not exist, error. Otherwise, drop it.
.drop entityType entityName ifexists ...If an entity of that type and name does not exist, return. Otherwise, drop it.

10 - Security roles

10.1 - Manage database security roles

Learn how to use management commands to view, add, and remove security roles on a database level.

Principals are granted access to resources through a role-based access control model, where their assigned security roles determine their resource access.

In this article, you’ll learn how to use management commands to view existing security roles and add and drop principal association to security roles on the database level.

Permissions

You must have at least Database Admin permissions to run these commands.

Database level security roles

The following table shows the possible security roles on the database level and describes the permissions granted for each role.

RolePermissions
adminsView and modify the database and database entities.
usersView the database and create new database entities.
viewersView tables in the database where RestrictedViewAccess isn’t turned on.
unrestrictedviewersView the tables in the database even where RestrictedViewAccess is turned on. The principal must also have admins, viewers, or users permissions.
ingestorsIngest data to the database without access to query.
monitorsView database metadata such as schemas, operations, and permissions.

Show existing security roles

Before you add or remove principals, you can use the .show command to see a table with all of the principals and roles that are already set on the database.

Syntax

To show all roles:

.show database DatabaseName principals

To show your roles:

.show database DatabaseName principal roles

Parameters

NameTypeRequiredDescription
DatabaseNamestring✔️The name of the database for which to list principals.

Example

The following command lists all security principals that have access to the Samples database.

.show database Samples principals

Example output

RolePrincipalTypePrincipalDisplayNamePrincipalObjectIdPrincipalFQN
Database Samples AdminMicrosoft Entra userAbbi Atkinscd709aed-a26c-e3953dec735eaaduser=abbiatkins@fabrikam.com

Add and drop principal association to security roles

This section provides syntax, parameters, and examples for adding and removing principals to and from security roles.

Syntax

Action database DatabaseName Role ( Principal [, Principal…] ) [skip-results] [ Description ]

Parameters

NameTypeRequiredDescription
Actionstring✔️The command .add, .drop, or .set.
.add adds the specified principals, .drop removes the specified principals, and .set adds the specified principals and removes all previous ones.
DatabaseNamestring✔️The name of the database for which to add principals.
Rolestring✔️The role to assign to the principal. For databases, roles can be admins, users, viewers, unrestrictedviewers, ingestors, or monitors.
Principalstring✔️One or more principals or managed identities. To reference managed identities, use the “App” format using the managed identity object ID or managed identity client (application) ID. For guidance on how to specify these principals, see Referencing Microsoft Entra principals and groups.
skip-resultsstringIf provided, the command won’t return the updated list of database principals.
DescriptionstringText to describe the change that displays when using the .show command.
NameTypeRequiredDescription
Actionstring✔️The command .add, .drop, or .set.
.add adds the specified principals, .drop removes the specified principals, and .set adds the specified principals and removes all previous ones.
DatabaseNamestring✔️The name of the database for which to add principals.
Rolestring✔️The role to assign to the principal. For databases, this can be admins, users, viewers, unrestrictedviewers, ingestors, or monitors.
Principalstring✔️One or more principals. For guidance on how to specify these principals, see Referencing Microsoft Entra principals and groups.
skip-resultsstringIf provided, the command won’t return the updated list of database principals.
DescriptionstringText to describe the change that displays when using the .show command.

Examples

In the following examples, you’ll see how to add security roles, remove security roles, and add and remove security roles in the same command.

Add security roles with .add

The following example adds a principal to the users role on the Samples database.

.add database Samples users ('aaduser=imikeoein@fabrikam.com')

The following example adds an application to the viewers role on the Samples database.

.add database Samples viewers ('aadapp=4c7e82bd-6adb-46c3-b413-fdd44834c69b;fabrikam.com')

Remove security roles with .drop

The following example removes all principals in the group from the admins role on the Samples database.

.drop database Samples admins ('aadGroup=SomeGroupEmail@fabrikam.com')

Add new security roles and remove the old with .set

The following example removes existing viewers and adds the provided principals as viewers on the Samples database.

.set database Samples viewers ('aaduser=imikeoein@fabrikam.com', 'aaduser=abbiatkins@fabrikam.com')

Remove all security roles with .set

The following command removes all existing viewers on the Samples database.

.set database Samples viewers none

10.2 - Manage external table roles

Learn how to use management commands to view, add, and remove external table admins on an external table level.

Principals are granted access to resources through a role-based access control model, where their assigned security roles determine their resource access.

On external tables, the only security role is admins. External table admins have the ability to view, modify, and remove the external table.

In this article, you’ll learn how to use management commands to view existing admins as well as add and remove admins on external tables.

Permissions

You must have Database Admin permissions or be an External Table Admin on the specific external table to run these commands. For more information, see role-based access control.

Show existing admins

Before you add or remove principals, you can use the .show command to see a table with all of the principals that already have admin access on the external table.

Syntax

To show all roles:

.show external table ExternalTableName principals

To show your roles:

.show external table ExternalTableName principal roles

Parameters

NameTypeRequiredDescription
ExternalTableNamestring✔️The name of the external table for which to list principals.

Example

The following command lists all security principals that have access to the Samples external table.

.show external table Samples principals

Example output

RolePrincipalTypePrincipalDisplayNamePrincipalObjectIdPrincipalFQN
External Table Samples AdminMicrosoft Entra userAbbi Atkinscd709aed-a26c-e3953dec735eaaduser=abbiatkins@fabrikam.com

Add and drop admins

This section provides syntax, parameters, and examples for adding and removing principals.

Syntax

Action external table ExternalTableName admins ( Principal [, Principal…] ) [skip-results] [ Description ]

Parameters

NameTypeRequiredDescription
Actionstring✔️The command .add, .drop, or .set.
.add adds the specified principals, .drop removes the specified principals, and .set adds the specified principals and removes all previous ones.
ExternalTableNamestring✔️The name of the external table for which to add principals.
Principalstring✔️One or more principals. For guidance how to specify these principals, see Referencing security principals.
skip-resultsstringIf provided, the command won’t return the updated list of external table principals.
DescriptionstringText to describe the change that will be displayed when using the .show command.

Examples

In the following examples, you’ll see how to add admins, remove admins, and add and remove admins in the same command.

Add admins with .add

The following example adds a principal to the admins role on the Samples external table.

.add external table Samples admins ('aaduser=imikeoein@fabrikam.com')

Remove admins with .drop

The following example removes all principals in the group from the admins role on the Samples external table.

.drop external table Samples admins ('aadGroup=SomeGroupEmail@fabrikam.com')

Add new admins and remove the old with .set

The following example removes existing admins and adds the provided principals as admins on the Samples external table.

.set external table Samples admins ('aaduser=imikeoein@fabrikam.com', 'aaduser=abbiatkins@fabrikam.com')

Remove all admins with .set

The following command removes all existing admins on the Samples external table.

.set external table Samples admins none

10.3 - Manage function roles

Learn how to use management commands to view, add, and remove function admins on a function level.

Principals are granted access to resources through a role-based access control model, where their assigned security roles determine their resource access.

On functions, the only security role is admins. Function admins have the ability to view, modify, and remove the function.

In this article, you’ll learn how to use management commands to view existing admins as well as add and remove admins on functions.

Permissions

You must have Database Admin permissions or be a Function Admin on the specific function to run these commands. For more information, see role-based access control.

Show existing admins

Before you add or remove principals, you can use the .show command to see a table with all of the principals that already have admin access on the function.

Syntax

To show all roles:

.show function FunctionName principals

To show your roles:

.show function FunctionName principal roles

Parameters

NameTypeRequiredDescription
FunctionNamestring✔️The name of the function for which to list principals.

Example

The following command lists all security principals that have access to the SampleFunction function.

.show function SampleFunction principals

Example output

RolePrincipalTypePrincipalDisplayNamePrincipalObjectIdPrincipalFQN
Function SampleFunction AdminMicrosoft Entra userAbbi Atkinscd709aed-a26c-e3953dec735eaaduser=abbiatkins@fabrikam.com

Add and drop admins

This section provides syntax, parameters, and examples for adding and removing principals.

Syntax

Action function FunctionName admins ( Principal [, Principal…] ) [skip-results] [ Description ]

Parameters

NameTypeRequiredDescription
Actionstring✔️The command .add, .drop, or .set.
.add adds the specified principals, .drop removes the specified principals, and .set adds the specified principals and removes all previous ones.
FunctionNamestring✔️The name of the function for which to add principals.
Principalstring✔️One or more principals. For guidance on how to specify these principals, see Referencing security principals.
skip-resultsstringIf provided, the command won’t return the updated list of function principals.
DescriptionstringText to describe the change that will be displayed when using the .show command.

Examples

In the following examples, you’ll see how to add admins, remove admins, and add and remove admins in the same command.

Add admins with .add

The following example adds a principal to the admins role on the SampleFunction function.

.add function SampleFunction admins ('aaduser=imikeoein@fabrikam.com')

Remove admins with .drop

The following example removes all principals in the group from the admins role on the SampleFunction function.

.drop function SampleFunction admins ('aadGroup=SomeGroupEmail@fabrikam.com')

Add new admins and remove the old with .set

The following example removes existing admins and adds the provided principals as admins on the SampleFunction function.

.set function SampleFunction admins ('aaduser=imikeoein@fabrikam.com', 'aaduser=abbiatkins@fabrikam.com')

Remove all admins with .set

The following command removes all existing admins on the SampleFunction function.

.set function SampleFunction admins none

10.4 - Manage materialized view roles

Learn how to use management commands to view, add, and remove materialized view admins on a materialized view level.

Principals are granted access to resources through a role-based access control model, where their assigned security roles determine their resource access.

On materialized views, the only security role is admins. Materialized view admins have the ability to view, modify, and remove the materialized view.

In this article, you’ll learn how to use management commands to view existing admins as well as add and remove admins on materialized views.

Permissions

You must have Database Admin permissions or be a Materialized View Admin on the specific materialized view to run these commands. For more information, see role-based access control.

Show existing admins

Before you add or remove principals, you can use the .show command to see a table with all of the principals that already have admin access on the materialized view.

Syntax

To show all roles:

.show materialized-view MaterializedViewName principals

To show your roles:

.show materialized-view MaterializedViewName principal roles

Parameters

NameTypeRequiredDescription
MaterializedViewNamestring✔️The name of the materialized view for which to list principals.

Example

The following command lists all security principals that have access to the SampleView materialized view.

.show materialized view SampleView principals

Example output

RolePrincipalTypePrincipalDisplayNamePrincipalObjectIdPrincipalFQN
Materialized View SampleView AdminMicrosoft Entra userAbbi Atkinscd709aed-a26c-e3953dec735eaaduser=abbiatkins@fabrikam.com

Add and drop admins

This section provides syntax, parameters, and examples for adding and removing principals.

Syntax

Action materialized-view MaterializedViewName admins ( Principal [, Principal…] ) [skip-results] [ Description ]

Parameters

NameTypeRequiredDescription
Actionstring✔️The command .add, .drop, or .set.
.add adds the specified principals, .drop removes the specified principals, and .set adds the specified principals and removes all previous ones.
MaterializedViewNamestring✔️The name of the materialized view for which to add principals.
Principalstring✔️One or more principals. For guidance on how to specify these principals, see Referencing security principals.
skip-resultsstringIf provided, the command won’t return the updated list of materialized view principals.
DescriptionstringText to describe the change that will be displayed when using the .show command.

Examples

In the following examples, you’ll see how to add admins, remove admins, and add and remove admins in the same command.

Add admins with .add

The following example adds a principal to the admins role on the SampleView materialized view.

.add materialized-view SampleView admins ('aaduser=imikeoein@fabrikam.com')

Remove admins with .drop

The following example removes all principals in the group from the admins role on the SampleView materialized view.

.drop materialized-view SampleView admins ('aadGroup=SomeGroupEmail@fabrikam.com')

Add new admins and remove the old with .set

The following example removes existing admins and adds the provided principals as admins on the SampleView materialized view.

.set materialized-view SampleView admins ('aaduser=imikeoein@fabrikam.com', 'aaduser=abbiatkins@fabrikam.com')

Remove all admins with .set

The following command removes all existing admins on the SampleView materialized view.

.set materialized-view SampleView admins none

10.5 - Referencing security principals

Learn how to reference security principals and identity providers.

The authorization model allows for the use of Microsoft Entra user and application identities and Microsoft Accounts (MSAs) as security principals. This article provides an overview of the supported principal types for both Microsoft Entra ID and MSAs, and demonstrates how to properly reference these principals when assigning security roles using management commands.

Microsoft Entra ID

The recommended way to access your environment is by authenticating to the Microsoft Entra service. Microsoft Entra ID is an identity provider capable of authenticating security principals and coordinating with other identity providers, such as Microsoft’s Active Directory.

Microsoft Entra ID supports the following authentication scenarios:

  • User authentication (interactive sign-in): Used to authenticate human principals.
  • Application authentication (non-interactive sign-in): Used to authenticate services and applications that have to run or authenticate without user interaction.

Referencing Microsoft Entra principals and groups

The syntax for referencing Microsoft Entra user and application principals and groups is outlined in the following table.

If you use a User Principal Name (UPN) to reference a user principal, and an attempt will be made to infer the tenant from the domain name and try to find the principal. If the principal isn’t found, explicitly specify the tenant ID or name in addition to the user’s UPN or object ID.

Similarly, you can reference a security group with the group email address in UPN format and an attempt will be made to infer the tenant from the domain name. If the group isn’t found, explicitly specify the tenant ID or name in addition to the group display name or object ID.

Type of EntityMicrosoft Entra tenantSyntax
UserImplicitaaduser=UPN
UserExplicit (ID)aaduser=UPN;TenantId
or
aaduser=ObjectID;TenantId
UserExplicit (Name)aaduser=UPN;TenantName
or
aaduser=ObjectID;TenantName
GroupImplicitaadgroup=GroupEmailAddress
GroupExplicit (ID)aadgroup=GroupDisplayName;TenantId
or
aadgroup=GroupObjectId;TenantId
GroupExplicit (Name)aadgroup=GroupDisplayName;TenantName
or
aadgroup=GroupObjectId;TenantName
AppExplicit (ID)aadapp=ApplicationDisplayName;TenantId
or
aadapp=ApplicationId;TenantId
AppExplicit (Name)aadapp=ApplicationDisplayName;TenantName
or
aadapp=ApplicationId;TenantName

Examples

The following example uses the user UPN to define a principal the user role on the Test database. The tenant information isn’t specified, so your cluster will attempt to resolve the Microsoft Entra tenant using the UPN.

.add database Test users ('aaduser=imikeoein@fabrikam.com') 'Test user (AAD)'

The following example uses a group name and tenant name to assign the group to the user role on the Test database.

.add database Test users ('aadgroup=SGDisplayName;fabrikam.com') 'Test group @fabrikam.com (AAD)'

The following example uses an app ID and tenant name to assign the app the user role on the Test database.

.add database Test users ('aadapp=4c7e82bd-6adb-46c3-b413-fdd44834c69b;fabrikam.com') 'Test app @fabrikam.com (AAD)'

Microsoft Accounts (MSAs)

User authentication for Microsoft Accounts (MSAs) is supported. MSAs are all of the Microsoft-managed non-organizational user accounts. For example, hotmail.com, live.com, outlook.com.

Referencing MSA principals

IdPTypeSyntax
Live.comUsermsauser=UPN

Example

The following example assigns an MSA user to the user role on the Test database.

.add database Test users ('msauser=abbiatkins@live.com') 'Test user (live.com)'

to manage data partitioning policies for tables

10.6 - Security roles

Learn how to use security roles to provide principals access to resources.

Principals are granted access to resources through a role-based access control model, where their assigned security roles determine their resource access.

When a principal attempts an operation, the system performs an authorization check to make sure the principal is associated with at least one security role that grants permissions to perform the operation. Failing an authorization check aborts the operation.

The management commands listed in this article can be used to manage principals and their security roles on databases, tables, external tables, materialized views, and functions.

To learn how to configure them in the Azure portal, see Manage cluster permissions.

Management commands

The following table describes the commands used for managing security roles.

CommandDescription
.showLists principals with the given role.
.addAdds one or more principals to the role.
.dropRemoves one or more principals from the role.
.setSets the role to the specific list of principals, removing all previous ones.

Security roles

The following table describes the level of access granted for each role and shows a check if the role can be assigned within the given object type.

RolePermissionsDatabasesTablesExternal tablesMaterialized viewsFunctions
adminsView, modify, and remove the object and subobjects.✔️✔️✔️✔️✔️
usersView the object and create new subobjects.✔️
viewersView the object where RestrictedViewAccess isn’t turned on.✔️
unrestrictedviewersView the object even where RestrictedViewAccess is turned on. The principal must also have admins, viewers or users permissions.✔️
ingestorsIngest data to the object without access to query.✔️✔️
monitorsView metadata such as schemas, operations, and permissions.✔️

For a full description of the security roles at each scope, see Kusto role-based access control.

Common scenarios

Show your principal roles

To see your own roles on the cluster, run the following command:

To see your own roles on the eventhouse, run the following command:

.show cluster principal roles

Show your roles on a resource

To check the roles assigned to you on a specific resource, run the following command within the relevant database or the database that contains the resource:

// For a database:
.show database DatabaseName principal roles

// For a table:
.show table TableName principal roles

// For an external table:
.show external table ExternalTableName principal roles

// For a function:
.show function FunctionName principal roles

// For a materialized view:
.show materialized-view MaterializedViewName principal roles

Show the roles of all principals on a resource

To see the roles assigned to all principals for a particular resource, run the following command within the relevant database or the database that contains the resource:

// For a database:
.show database DatabaseName principals

// For a table:
.show table TableName principals

// For an external table:
.show external table ExternalTableName principals

// For a function:
.show function FunctionName principals

// For a materialized view:
.show materialized-view MaterializedViewName principals

Modify the role assignments

For details on how to modify your role assignments at the database and table levels, see Manage database security roles and Manage table security roles.

10.7 - Access control

10.7.1 - Access Control Overview

This article describes Access control.

Access control is based on authentication and authorization. Each query and command on an Azure Data Explorer resource, such as a cluster or database, must pass both authentication and authorization checks.

Access control is based on authentication and authorization. Each query and command on a Fabric resource, such as a database, must pass both authentication and authorization checks.

  • Authentication: Validates the identity of the security principal making a request
  • Authorization: Validates the security principal making a request is permitted to make that request on the target resource

Authentication

To programmatically authenticate, a client must communicate with Microsoft Entra ID and request an access token specific to the Kusto service. Then, the client can use the acquired access token as proof of identity when issuing requests to your database.

The main authentication scenarios are as follows:

User authentication

User authentication happens when a user presents credentials to Microsoft Entra ID or an identity provider that federates with Microsoft Entra ID, such as Active Directory Federation Services. The user gets back a security token that can be presented to the Azure Data Explorer service. Azure Data Explorer determines whether the token is valid, whether the token is issued by a trusted issuer, and what security claims the token contains.

Azure Data Explorer supports the following methods of user authentication, including through the Kusto client libraries:

  • Interactive user authentication with sign-in through the user interface.
  • User authentication with a Microsoft Entra token issued for Azure Data Explorer.
  • User authentication with a Microsoft Entra token issued for another resource that can be exchanged for an Azure Data Explorer token using On-behalf-of (OBO) authentication.

Application authentication

Application authentication is needed when requests aren’t associated with a specific user or when no user is available to provide credentials. In this case, the application authenticates to Microsoft Entra ID or the federated IdP by presenting secret information.

Azure Data Explorer supports the following methods of application authentication, including through the Kusto client libraries:

  • Application authentication with an Azure managed identity.
  • Application authentication with an X.509v2 certificate installed locally.
  • Application authentication with an X.509v2 certificate given to the client library as a byte stream.
  • Application authentication with a Microsoft Entra application ID and a Microsoft Entra application key. The application ID and application key are like a username and password.
  • Application authentication with a previously obtained valid Microsoft Entra token, issued to Azure Data Explorer.
  • Application authentication with a Microsoft Entra token issued for another resource that can be exchanged for an Azure Data Explorer token using On-behalf-of (OBO) authentication.

Authorization

Before carrying out an action on a resource, all authenticated users must pass an authorization check. The Kusto role-based access control model is used, where principals are ascribed to one or more security roles. Authorization is granted as long as one of the roles assigned to the user allows them to perform the specified action. For example, the Database User role grants security principals the right to read the data of a particular database, create tables in the database, and more.

The association of security principals to security roles can be defined individually or by using security groups that are defined in Microsoft Entra ID. For more information on how to assign security roles, see Security roles overview.

Group authorization

Authorization can be granted to Microsoft Entra ID groups by assigning one or more roles to the group.

When checking authorization for a user or application principal, the system first looks for an explicit role assignment that permits the specific action. If the role assignment doesn’t exists, then the system checks the principal’s membership in all groups that could authorize the action.

If the principal is a member of a group with appropriate permissions, the requested action is authorized. Otherwise, the action doesn’t pass the authorization check and is disallowed.

Force group membership refresh

Principals can force a refresh of group membership information. This capability is useful in scenarios where just-in-time (JIT) privileged access services, such as Microsoft Entra Privileged Identity Management (PIM), are used to obtain higher privileges on a resource.

Refresh for a specific group

Principals can force a refresh of group membership for a specific group. However, the following restrictions apply:

  • A refresh can be requested up to 10 times per hour per principal.
  • The requesting principal must be a member of the group at the time of the request.

The request results in an error if either of these conditions aren’t met.

To reevaluate the current principal’s membership of a group, run the following command:

.clear cluster cache groupmembership with (group='<GroupFQN>')

Use the group’s fully qualified name (FQN). For more information, see Referencing Microsoft Entra principals and groups.

Refresh for other principals

A privileged principal can request a refresh for other principals. The requesting principal must have AllDatabaseMonitor access for the target service. Privileged principals can also run the previous command without restrictions.

To refresh another principal’s group membership, run the following command:

.clear cluster cache groupmembership with (principal='<PrincipalFQN>', group='<GroupFQN>')

10.7.2 - Microsoft Entra application registration

This article describes how to create a Microsoft Entra app registration for authentication.

Microsoft Entra application authentication requires creating and registering an application with Microsoft Entra ID. A service principal is automatically created when the application registration is created in a Microsoft Entra tenant.

The app registration can either be created in the Azure portal, or programatically with Azure CLI. Choose the tab that fits your scenario.

Portal

Register the app

  1. Sign in to Azure portal and open the Microsoft Entra ID blade.

  2. Browse to App registrations and select New registration.

    Screenshot showing how to start a new app registration.

  3. Name the application, for example “example-app”.

  4. Select a supported account type, which determines who can use the application.

  5. Under Redirect URI, select Web for the type of application you want to create. The URI is optional and is left blank in this case.

    Screenshot showing how to register a new app registration.

  6. Select Register.

Set up authentication

There are two types of authentication available for service principals: password-based authentication (application secret) and certificate-based authentication. The following section describes using a password-based authentication for the application’s credentials. You can alternatively use an X509 certificate to authenticate your application. For more information, see How to configure Microsoft Entra certificate-based authentication.

Through the course of this section, you’ll copy the following values: Application ID and key value. Paste these values somewhere, like a text editor, for use in the step configure client credentials to the database.

  1. Browse to the Overview blade.

  2. Copy the Application (client) ID and the Directory (tenant) ID.

    [!NOTE] You’ll need the application ID and the tenant ID to authorize the service principal to access the database.

  3. In the Certificates & secrets blade, select New client secret.

    Screenshot showing how to start the creation of client secret.

  4. Enter a description and expiration.

  5. Select Add.

  6. Copy the key value.

    [!NOTE] When you leave this page, the key value won’t be accessible.

You’ve created your Microsoft Entra application and service principal.

Azure CLI

  1. Sign in to your Azure subscription via Azure CLI. Then authenticate in the browser.

    az login
    
  2. Choose the subscription to host the principal. This step is needed when you have multiple subscriptions.

    az account set --subscription YOUR_SUBSCRIPTION_GUID
    
  3. Create the service principal. In this example, the service principal is called my-service-principal.

    az ad sp create-for-rbac -n "my-service-principal" --role Contributor --scopes /subscriptions/{SubID}
    
  4. From the returned JSON data, copy the appId, password, and tenant for future use.

    {
      "appId": "00001111-aaaa-2222-bbbb-3333cccc4444",
      "displayName": "my-service-principal",
      "name": "my-service-principal",
      "password": "00001111-aaaa-2222-bbbb-3333cccc4444",
      "tenant": "00001111-aaaa-2222-bbbb-3333cccc4444"
    }
    

You’ve created your Microsoft Entra application and service principal.

Configure delegated permissions for the application - optional

If your application needs to access your database using the credentials of the calling user, configure delegated permissions for your application. For example, if you’re building a web API and you want to authenticate using the credentials of the user who is calling your API.

If you only need access to an authorized data resource, you can skip this section and continue to Grant a service principal access to the database.

  1. Browse to the API permissions blade of your App registration.

  2. Select Add a permission.

  3. Select APIs my organization uses.

  4. Search for and select Azure Data Explorer.

    Screenshot showing how to add Azure Data Explorer API permission.

  5. In Delegated permissions, select the user_impersonation box.

  6. Select Add permissions.

    Screenshot showing how to select delegated permissions with user impersonation.

Grant a service principal access to the database

Once your application registration is created, you need to grant the corresponding service principal access to your database. The following example gives viewer access. For other roles, see Kusto role-based access control.

  1. Use the values of Application ID and Tenant ID as copied in a previous step.

  2. Execute the following command in your query editor, replacing the placeholder values ApplicationID and TenantID with your actual values:

    .add database <DatabaseName> viewers ('aadapp=<ApplicationID>;<TenantID>') '<Notes>'
    

    For example:

    .add database Logs viewers ('aadapp=00001111-aaaa-2222-bbbb-3333cccc4444;9876abcd-e5f6-g7h8-i9j0-1234kl5678mn') 'App Registration'
    

    The last parameter is a string that shows up as notes when you query the roles associated with a database.

    [!NOTE] After creating the application registration, there might be a several minute delay until it can be referenced. If you receive an error that the application is not found, wait and try again.

For more information on roles, see Role-based access control.

Use application credentials to access a database

Use the application credentials to programmatically access your database by using the client library.

. . .
string applicationClientId = "<myClientID>";
string applicationKey = "<myApplicationKey>";
string authority = "<myApplicationTenantID>";
. . .
var kcsb = new KustoConnectionStringBuilder($"https://{clusterName}.kusto.windows.net/{databaseName}")
    .WithAadApplicationKeyAuthentication(
        applicationClientId,
        applicationKey,
        authority);
var client = KustoClientFactory.CreateCslQueryProvider(kcsb);
var queryResult = client.ExecuteQuery($"{query}");

[!NOTE] Specify the application id and key of the application registration (service principal) created earlier.

For more information, see How to authenticate with Microsoft Authentication Library (MSAL) in apps and use Azure Key Vault with .NET Core web app.

Troubleshooting

Invalid resource error

If your application is used to authenticate users, or applications for access, you must set up delegated permissions for the service application. Declare your application can authenticate users or applications for access. Not doing so will result in an error similar to the following, when an authentication attempt is made:

AADSTS650057: Invalid resource. The client has requested access to a resource which is not listed in the requested permissions in the client's application registration...

Your Microsoft Entra tenant administrator might enact a policy that prevents tenant users from giving consent to applications. This situation will result in an error similar to the following, when a user tries to sign in to your application:

AADSTS65001: The user or administrator has not consented to use the application with ID '<App ID>' named 'App Name'

You’ll need to contact your Microsoft Entra administrator to grant consent for all users in the tenant, or enable user consent for your specific application.

10.7.3 - Role-based access control

This article describes role-based access control.

Azure Data Explorer uses a role-based access control (RBAC) model in which principals get access to resources based on their assigned roles. Roles are defined for a specific cluster, database, table, external table, materialized view, or function. When defined for a cluster, the role applies to all databases in the cluster. When defined for a database, the role applies to all entities in the database.

Azure Resource Manager (ARM) roles, such as subscription owner or cluster owner, grant access permissions for resource administration. For data administration, you need the roles described in this document.

Real-Time Intelligence in Fabric uses a hybrid role-based access control (RBAC) model in which principals get access to resources based on their assigned roles granted from one or both of two sources: Fabric, and Kusto management commands. The user will have the union of the roles granted from both sources.

Within Fabric, roles can be assigned or inherited by assigning a role in a workspace, or by sharing a specific item based on the item permission model.

Fabric roles

RolePermissions granted on items
Workspace AdminAdmin RBAC role on all items in the workspace.
Workspace MemberAdmin RBAC role on all items in the workspace.
Workspace ContributorAdmin RBAC role on all items in the workspace.
Workspace ViewerViewer RBAC role on all items in the workspace.
Item EditorAdmin RBAC role on the item.
Item ViewerViewer RBAC role on the item.

Roles can further be defined on the data plane for a specific database, table, external table, materialized view, or function, by using management commands. In both cases, roles applied at a higher level (Workspace, Eventhouse) are inherited by lower levels (Database, Table).

Roles and permissions

The following table outlines the roles and permissions available at each scope.

The Permissions column displays the access granted to each role.

The Dependencies column lists the minimum roles required to obtain the role in that row. For example, to become a Table Admin, you must first have a role like Database User or a role that includes the permissions of Database User, such as Database Admin or AllDatabasesAdmin. When multiple roles are listed in the Dependencies column, only one of them is needed to obtain the role.

The How the role is obtained column offers ways that the role can be granted or inherited.

The Manage column offers ways to add or remove role principals.

ScopeRolePermissionsDependenciesManage
ClusterAllDatabasesAdminFull permission to all databases in the cluster. May show and alter certain cluster-level policies. Includes all permissions.Azure portal
ClusterAllDatabasesViewerRead all data and metadata of any database in the cluster.Azure portal
ClusterAllDatabasesMonitorExecute .show commands in the context of any database in the cluster.Azure portal
DatabaseAdminFull permission in the scope of a particular database. Includes all lower level permissions.Azure portal or management commands
DatabaseUserRead all data and metadata of the database. Create tables and functions, and become the admin for those tables and functions.Azure portal or management commands
DatabaseViewerRead all data and metadata, except for tables with the RestrictedViewAccess policy turned on.Azure portal or management commands
DatabaseUnrestrictedviewerRead all data and metadata, including in tables with the RestrictedViewAccess policy turned on.Database User or Database ViewerAzure portal or management commands
DatabaseIngestorIngest data to all tables in the database without access to query the data.Azure portal or management commands
DatabaseMonitorExecute .show commands in the context of the database and its child entities.Azure portal or management commands
TableAdminFull permission in the scope of a particular table.Database Usermanagement commands
TableIngestorIngest data to the table without access to query the data.Database User or Database Ingestormanagement commands
External TableAdminFull permission in the scope of a particular external table.Database User or Database Viewermanagement commands
Materialized viewAdminFull permission to alter the view, delete the view, and grant admin permissions to another principal.Database User or Table Adminmanagement commands
FunctionAdminFull permission to alter the function, delete the function, and grant admin permissions to another principal.Database User or Table Adminmanagement commands
ScopeRolePermissionsHow the role is obtained
EventhouseAllDatabasesAdminFull permission to all databases in the Eventhouse. May show and alter certain Eventhouse-level policies. Includes all permissions.- Inherited as workspace admin, workspace member, or workspace contributor.

Can’t be assigned with management commands.
DatabaseAdminFull permission in the scope of a particular database. Includes all lower level permissions.- Inherited as workspace admin, workspace member, or workspace contributor
- Item shared with editing permissions.
- Assigned with management commands
DatabaseUserRead all data and metadata of the database. Create tables and functions, and become the admin for those tables and functions.- Assigned with management commands
DatabaseViewerRead all data and metadata, except for tables with the RestrictedViewAccess policy turned on.- Item shared with viewing permissions.
- Assigned with management commands
DatabaseUnrestrictedviewerRead all data and metadata, including in tables with the RestrictedViewAccess policy turned on.- Assigned with management commands. Dependent on having Database User or Database Viewer.
DatabaseIngestorIngest data to all tables in the database without access to query the data.- Assigned with management commands
DatabaseMonitorExecute .show commands in the context of the database and its child entities.- Assigned with management commands
TableAdminFull permission in the scope of a particular table.- Inherited as workspace admin, workspace member, or workspace contributor
- Parent item (KQL Database) shared with editing permissions.
- Assigned with management commands. Dependent on having Database User on the parent database.
TableIngestorIngest data to the table without access to query the data.- Assigned with management commands. Dependent on having Database User or Database Ingestor on the parent database.
External TableAdminFull permission in the scope of a particular external table.- Assigned with management commands. Dependent on having Database User or Database Viewer on the parent database.

10.8 - Manage table roles

10.8.1 - Manage table security roles

Learn how to use management commands to view, add, and remove security roles on a table level.

Principals are granted access to resources through a role-based access control model, where their assigned security roles determine their resource access.

In this article, you’ll learn how to use management commands to view existing security roles as well as add and remove security roles on the table level.

Permissions

You must have at least Table Admin permissions to run these commands.

Table level security roles

The following table shows the possible security roles on the table level and describes the permissions granted for each role.

RolePermissions
adminsView, modify, and remove the table and table entities.
ingestorsIngest data to the table without access to query.

Show existing security roles

Before you add or remove principals, you can use the .show command to see a table with all of the principals and roles that are already set on the table.

Syntax

To show all roles:

.show table TableName principals

To show your roles:

.show table TableName principal roles

Parameters

NameTypeRequiredDescription
TableNamestring✔️The name of the table for which to list principals.

Example

The following command lists all security principals that have access to the StormEvents table.

.show table StormEvents principals

Example output

RolePrincipalTypePrincipalDisplayNamePrincipalObjectIdPrincipalFQN
Table StormEvents AdminMicrosoft Entra userAbbi Atkinscd709aed-a26c-e3953dec735eaaduser=abbiatkins@fabrikam.com

Add and drop security roles

This section provides syntax, parameters, and examples for adding and removing principals.

Syntax

Action table TableName Role ( Principal [, Principal…] ) [skip-results] [ Description ]

Parameters

NameTypeRequiredDescription
Actionstring✔️The command .add, .drop, or .set.
.add adds the specified principals, .drop removes the specified principals, and .set adds the specified principals and removes all previous ones.
TableNamestring✔️The name of the table for which to add principals.
Rolestring✔️The role to assign to the principal. For tables, this can be admins or ingestors.
Principalstring✔️One or more principals. For guidance on how to specify these principals, see Referencing security principals.
skip-resultsstringIf provided, the command won’t return the updated list of table principals.
DescriptionstringText to describe the change that will be displayed when using the .show command.

Examples

In the following examples, you’ll see how to add security roles, remove security roles, and add and remove security roles in the same command.

Add security roles with .add

The following example adds a principal to the admins role on the StormEvents table.

.add table StormEvents admins ('aaduser=imikeoein@fabrikam.com')

The following example adds an application to the ingestors role on the StormEvents table.

.add table StormEvents ingestors ('aadapp=4c7e82bd-6adb-46c3-b413-fdd44834c69b;fabrikam.com')

Remove security roles with .drop

The following example removes all principals in the group from the admins role on the StormEvents table.

.drop table StormEvents admins ('aadGroup=SomeGroupEmail@fabrikam.com')

Add new security roles and remove the old with .set

The following example removes existing ingestors and adds the provided principals as ingestors on the StormEvents table.

.set table StormEvents ingestors ('aaduser=imikeoein@fabrikam.com', 'aaduser=abbiatkins@fabrikam.com')

Remove all security roles with .set

The following command removes all existing ingestors on the StormEvents table.

.set table StormEvents ingestors none

10.8.2 - Manage view access to tables

Learn how to grant view access to tables in a database.

Principals gain access to resources, such as databases and tables, based on their assigned security roles. The viewer security role is only available at the database level, and assigning a principal this role gives them view access to all tables in the database.

In this article, you learn methods for controlling a principal’s table view access.

Structure data for controlled access

To control access more effectively, we recommend that you separate tables into different databases based on access privileges. For instance, create a distinct database for sensitive data and restrict access to specific principals by assigning them the relevant security roles.

Restricted View Access policy

To restrict access to specific tables, you can turn on the Restricted View Access policy for those tables. This policy ensures that only principals with the unrestrictedViewer role can access the table. Meanwhile, principals with the regular viewer role can’t view the table.

Row Level Security policy

The Row Level Security (RLS) policy allows you to restrict access to rows of data based on specific criteria and allows masking data in columns. When you create an RLS policy on a table, the restriction applies to all users, including database administrators and the RLS creator.

Create a follower database

Create a follower database and follow only the relevant tables that you’d like to share with the specific principal or set of principals.

Create a database shortcut in Fabric and follow only the relevant tables that you’d like to share with the specific principal or set of principals.

11 - Operations

11.1 - Estimate table size

Learn how to estimate table size.

Understanding the size of a table can be helpful for efficient resource management and optimized query performance. In this article, you’ll learn different methods to estimate table sizes and how to use them effectively.

Original size of ingested data

Use the .show table details to estimate the original data size of a table. For an example, see Use .show table details.

This command provides an estimation of the uncompressed size of data ingested into your table based on the assumption that the data was transferred in CSV format. The estimation is based on approximate lengths of numeric values, such as integers, longs, datetimes, and guids, by considering their string representations.

Example use case: Track the size of incoming data over time to make informed decisions about capacity planning.

Table size in terms of access bytes

Use the estimate_data_size() along with the sum() aggregation function to estimate table size based on data types and their respective byte sizes. For an example, see Use estimate_data_size().

This method provides a more precise estimation by considering the byte sizes of numeric values without formatting them as strings. For example, integer values require 4 bytes whereas long and datetime values require 8 bytes. By using this approach, you can accurately estimate the data size that would fit in memory.

Example use case: Determine the cost of a query in terms of bytes to be scanned.

Combined size of multiple tables

You can use the union operator along with the estimate_data_size() and sum() functions to estimate the combined size of multiple tables in terms of access bytes. For an example, see Use union with estimate_data_size().

Example use case: Assess the memory requirements for consolidating data from multiple tables into a single dataset.

Examples

Use .show table details

The following query estimates the original data size of the StormEvents table.

.show table StormEvents details
| project TotalOriginalSize

Output

TotalOriginalSize
60192011

Use estimate_data_size()

The following query estimates the original data size of the StormEvents table in bytes.

StormEvents
| extend sizeEstimateOfColumn = estimate_data_size(*)
| summarize totalSize=sum(sizeEstimateOfColumn)

Output

totalSize
58608932

Use union with estimate_data_size()

The following query estimates the data size based for all tables in the Samples database.

union withsource=_TableName *
| extend sizeEstimateOfColumn = estimate_data_size(*)
| summarize totalSize=sum(sizeEstimateOfColumn)
| extend sizeGB = format_bytes(totalSize,2,"GB")
totalSizesizeGB
17617824539261640.79 GB

11.2 - Journal management

This article describes Journal management.

Journal contains information about metadata operations done on your database.

The metadata operations can result from a management command that a user executed, or internal management commands that the system executed, such as drop extents by retention.

Taking a dependency on them isn’t recommended.

EventEventTimestampDatabaseEntityNameUpdatedEntityNameEntityVersionEntityContainerName
CREATE-TABLE2017-01-05 14:25:07InternalDbMyTable1MyTable1v7.0InternalDb
RENAME-TABLE2017-01-13 10:30:01InternalDbMyTable1MyTable2v8.0InternalDb
OriginalEntityStateUpdatedEntityStateChangeCommandPrincipal
.Name: MyTable1, Attributes: Name=’[MyTable1].[col1]’, Type=‘I32’.create table MyTable1 (col1:int)imike@fabrikam.com
.The database properties (too long to be displayed here).create database TestDB persist (@“https://imfbkm.blob.core.windows.net/md", @“https://imfbkm.blob.core.windows.net/data")Microsoft Entra app id=76263cdb-abcd-545644e9c404
Name: MyTable1, Attributes: Name=’[MyTable1].[col1]’, Type=‘I32’Name: MyTable2, Attributes: Name=’[MyTable1].[col1]’, Type=‘I32’.rename table MyTable1 to MyTable2rdmik@fabrikam.com
ItemDescription
EventThe metadata event name
EventTimestampThe event timestamp
DatabaseMetadata of this database was changed following the event
EntityNameThe entity name that the operation was executed on, before the change
UpdatedEntityNameThe new entity name after the change
EntityVersionThe new metadata version following the change
EntityContainerNameThe entity container name (entity=column, container=table)
OriginalEntityStateThe state of the entity (entity properties) before the change
UpdatedEntityStateThe new state after the change
ChangeCommandThe executed management command that triggered the metadata change
PrincipalThe principal (user/app) that executed the management command

.show journal

The .show journal command returns a list of metadata changes on databases or the cluster that the user has admin access to. The .show journal command returns a list of metadata changes on databases or the environment that the user has admin access to. Permissions

Everyone with permission can execute the command.

Results returned will include:

  • All journal entries of the user executing the command.

  • All journal entries of databases that the user executing the command has admin access to.

  • All cluster journal entries if the user executing the command is a Cluster AllDatabases Admin.

  • All journal entries specific to the environment level if the user executing the command has appropriate admin permissions.

.show database DatabaseName journal

The .show database DatabaseName journal command returns journal for the specific database metadata changes.

Permissions

Everyone with permission can execute the command.

Results returned include:

  • All journal entries of database DatabaseName if the user executing the command is a database admin in DatabaseName.
  • Otherwise, all the journal entries of database DatabaseName and of the user executing the command.

11.3 - System information

Learn how to use system information commands available to database admins and database monitors to explore usage, track operations and investigate ingestion failures.

This section summarizes commands that are available to Database Admins and Database Monitors to explore usage, track operations, and investigate ingestion failures. For more information on security roles, see Kusto role-based access control.

11.4 - Operations

11.5 - Queries and commands

11.6 - Statistics

12 - Workload groups

12.1 - Query consistency policy

Learn how to use the query consistency policy to control the consistency mode of queries.

A workload group’s query consistency policy allows specifying options that control the consistency mode of queries.

The policy object

Each option consists of:

  • A typed Value - the value of the limit.
  • IsRelaxable - a boolean value that defines if the option can be relaxed by the caller, as part of the request’s request properties. Default is true.

The following limits are configurable:

NameTypeDescriptionSupported valuesDefault valueMatching client request property
QueryConsistencyQueryConsistencyThe consistency mode to use.Strong, Weak, or WeakAffinitizedByQuery, WeakAffinitizedByDatabaseStrongqueryconsistency
CachedResultsMaxAgetimespanThe maximum age of cached query results that can be returned.A non-negative timespannullquery_results_cache_max_age

Example

"QueryConsistencyPolicy": {
  "QueryConsistency": {
    "IsRelaxable": true,
    "Value": "Weak"
  },
  "CachedResultsMaxAge": {
    "IsRelaxable": true,
    "Value": "05:00:00"
  }
}

Monitoring

You can monitor the latency of the metadata snapshot age on nodes serving as weak consistency service heads by using the Weak consistency latency metric. For more information, see Query metrics.

12.2 - Request limits policy

Learn how to use the request limits policy to limit the resources used by the request during its execution.

A workload group’s request limits policy allows limiting the resources used by the request during its execution.

The policy object

Each limit consists of:

  • A typed Value - the value of the limit.
  • IsRelaxable - a boolean value that defines if the limit can be relaxed by the caller, as part of the request’s request properties.

The following limits are configurable:

PropertyTypeDescriptionSupported valuesMatching client request property
DataScopestringThe query’s data scope. This value determines whether the query applies to all data or just the hot cache.All, HotCache, or nullquery_datascope
MaxMemoryPerQueryPerNodelongThe maximum amount of memory (in bytes) a query can allocate.[1, 50% of a single node’s total RAM]max_memory_consumption_per_query_per_node
MaxMemoryPerIteratorlongThe maximum amount of memory (in bytes) a query operator can allocate.[1, Min(32212254720, 50% of a single node’s total RAM)]maxmemoryconsumptionperiterator
MaxFanoutThreadsPercentageintThe percentage of threads on each node to fan out query execution to. When set to 100%, the cluster assigns all CPUs on each node. For example, 16 CPUs on a cluster deployed on Azure D14_v2 nodes.[1, 100]query_fanout_threads_percent
MaxFanoutNodesPercentageintThe percentage of nodes on the cluster to fan out query execution to. Functions in a similar manner to MaxFanoutThreadsPercentage.[1, 100]query_fanout_nodes_percent
MaxResultRecordslongThe maximum number of records a request is allowed to return to the caller, beyond which the results are truncated. The truncation limit affects the final result of the query, as delivered back to the client. However, the truncation limit doesn’t apply to intermediate results of subqueries, such as those that result from having cross-cluster references.[1, 9223372036854775807]truncationmaxrecords
MaxResultByteslongThe maximum data size (in bytes) a request is allowed to return to the caller, beyond which the results are truncated. The truncation limit affects the final result of the query, as delivered back to the client. However, the truncation limit doesn’t apply to intermediate results of subqueries, such as those that result from having cross-cluster references.[1, 9223372036854775807]truncationmaxsize
MaxExecutionTimetimespanThe maximum duration of a request.
Notes:
1) This can be used to place more limits on top of the default limits on execution time, but not extend them.
2) Timeout processing isn’t at the resolution of seconds, rather it’s designed to prevent a query from running for minutes.
3) The time it takes to read the payload back at the client isn’t treated as part of the timeout. It depends on how quickly the caller pulls the data from the stream.
4) Total execution time can exceed the configured value if aborting execution takes longer to complete.
[00:00:00, 01:00:00]servertimeout
PropertyTypeDescriptionSupported valuesMatching client request property
DataScopestringThe query’s data scope. This value determines whether the query applies to all data or just the hot cache.All, HotCache, or nullquery_datascope
MaxMemoryPerQueryPerNodelongThe maximum amount of memory (in bytes) a query can allocate.[1, 50% of a single node’s total RAM]max_memory_consumption_per_query_per_node
MaxMemoryPerIteratorlongThe maximum amount of memory (in bytes) a query operator can allocate.[1, Min(32212254720, 50% of a single node’s total RAM)]maxmemoryconsumptionperiterator
MaxFanoutThreadsPercentageintThe percentage of threads on each node to fan out query execution to. When set to 100%, the Eventhouse assigns all CPUs on each node. For example, 16 CPUs on an eventhouse deployed on Azure D14_v2 nodes.[1, 100]query_fanout_threads_percent
MaxFanoutNodesPercentageintThe percentage of nodes on the Eventhouse to fan out query execution to. Functions in a similar manner to MaxFanoutThreadsPercentage.[1, 100]query_fanout_nodes_percent
MaxResultRecordslongThe maximum number of records a request is allowed to return to the caller, beyond which the results are truncated. The truncation limit affects the final result of the query, as delivered back to the client. However, the truncation limit doesn’t apply to intermediate results of subqueries, such as the results from having cross-eventhouse references.[1, 9223372036854775807]truncationmaxrecords
MaxResultByteslongThe maximum data size (in bytes) a request is allowed to return to the caller, beyond which the results are truncated. The truncation limit affects the final result of the query, as delivered back to the client. However, the truncation limit doesn’t apply to intermediate results of subqueries, such as results from having cross-eventhouse references.[1, 9223372036854775807]truncationmaxsize
MaxExecutionTimetimespanThe maximum duration of a request.
Notes:
1) This can be used to place more limits on top of the default limits on execution time, but not extend them.
2) Timeout processing isn’t at the resolution of seconds, rather it’s designed to prevent a query from running for minutes.
3) The time it takes to read the payload back at the client isn’t treated as part of the timeout. It depends on how quickly the caller pulls the data from the stream.
4) Total execution time might exceed the configured value if aborting execution takes longer to complete.
[00:00:00, 01:00:00]servertimeout

CPU resource usage

Queries can use all the CPU resources within the cluster. By default, when multiple queries are running concurrently, the system employs a fair round-robin approach to distribute resources. This strategy is optimal for achieving high performance with ad-hoc queries. Queries can use all the CPU resources within the Eventhouse. By default, when multiple queries are running concurrently, the system employs a fair round-robin approach to distribute resources. This strategy is optimal for achieving high performance with ad-hoc queries.

However, there are scenarios where you might want to restrict the CPU resources allocated to a specific query. For instance, if you’re running a background job that can accommodate higher latencies. The request limits policy provides the flexibility to specify a lower percentage of threads or nodes to be used when executing distributed subquery operations. The default setting is 100%.

The default workload group

The default workload group has the following policy defined by default. This policy can be altered.

{
  "DataScope": {
    "IsRelaxable": true,
    "Value": "All"
  },
  "MaxMemoryPerQueryPerNode": {
    "IsRelaxable": true,
    "Value": < 50% of a single node's total RAM >
  },
  "MaxMemoryPerIterator": {
    "IsRelaxable": true,
    "Value": 5368709120
  },
  "MaxFanoutThreadsPercentage": {
    "IsRelaxable": true,
    "Value": 100
  },
  "MaxFanoutNodesPercentage": {
    "IsRelaxable": true,
    "Value": 100
  },
  "MaxResultRecords": {
    "IsRelaxable": true,
    "Value": 500000
  },
  "MaxResultBytes": {
    "IsRelaxable": true,
    "Value": 67108864
  },
  "MaxExecutiontime": {
    "IsRelaxable": true,
    "Value": "00:04:00"
  }
}

Example

The following JSON represents a custom requests limits policy object:

{
  "DataScope": {
    "IsRelaxable": true,
    "Value": "HotCache"
  },
  "MaxMemoryPerQueryPerNode": {
    "IsRelaxable": true,
    "Value": 2684354560
  },
  "MaxMemoryPerIterator": {
    "IsRelaxable": true,
    "Value": 2684354560
  },
  "MaxFanoutThreadsPercentage": {
    "IsRelaxable": true,
    "Value": 50
  },
  "MaxFanoutNodesPercentage": {
    "IsRelaxable": true,
    "Value": 50
  },
  "MaxResultRecords": {
    "IsRelaxable": true,
    "Value": 1000
  },
  "MaxResultBytes": {
    "IsRelaxable": true,
    "Value": 33554432
  },
  "MaxExecutiontime": {
    "IsRelaxable": true,
    "Value": "00:01:00"
  }
}

12.3 - Request queuing policy

Learn how to use the request queuing policy to control queuing of requests for delayed execution.

A workload group’s request queuing policy controls queuing of requests for delayed execution, once a certain threshold of concurrent requests is exceeded.

Queuing of requests can reduce the number of throttling errors during times of peak activity. It does so by queuing incoming requests up to a predefined short time period, while polling for available capacity during that time period.

The policy might be defined only for workload groups with a request rate limit policy that limits the max concurrent requests at the scope of the workload group.

Use the .alter-merge workload group management command, to enable request queuing.

The policy object

The policy includes a single property:

  • IsEnabled: A boolean indicating if the policy is enabled. The default value is false.

12.4 - Request rate limit policy

Learn how to use the request rate limit policy to limit the number of concurrent requests classified into a workload group.

The workload group’s request rate limit policy lets you limit the number of concurrent requests classified into the workload group, per workload group or per principal.

Rate limits are enforced at the level defined by the workload group’s Request rate limits enforcement policy.

The policy object

A request rate limit policy has the following properties:

NameSupported valuesDescription
IsEnabledtrue, falseIndicates if the policy is enabled or not.
ScopeWorkloadGroup, PrincipalThe scope to which the limit applies.
LimitKindConcurrentRequests, ResourceUtilizationThe kind of the request rate limit.
PropertiesProperty bagProperties of the request rate limit.

Concurrent requests rate limit

A request rate limit of kind ConcurrentRequests includes the following property:

NameTypeDescriptionSupported Values
MaxConcurrentRequestsintThe maximum number of concurrent requests.[0, 10000]

When a request exceeds the limit on maximum number of concurrent requests:

  • The request’s state, as presented by System information commands, will be Throttled.
  • The error message will include the origin of the throttling and the capacity that’s been exceeded.

The following table shows a few examples of concurrent requests that exceed the maximum limit and the error message that these requests return:

ScenarioError message
A throttled .create table command that was classified to the default workload group, which has a limit of 80 concurrent requests at the scope of the workload group.The management command was aborted due to throttling. Retrying after some backoff might succeed. CommandType: ‘TableCreate’, Capacity: 80, Origin: ‘RequestRateLimitPolicy/WorkloadGroup/default’.
A throttled query that was classified to a workload group named MyWorkloadGroup, which has a limit of 50 concurrent requests at the scope of the workload group.The query was aborted due to throttling. Retrying after some backoff might succeed. Capacity: 50, Origin: ‘RequestRateLimitPolicy/WorkloadGroup/MyWorkloadGroup’.
A throttled query that was classified to a workload group named MyWorkloadGroup, which has a limit of 10 concurrent requests at the scope of a principal.The query was aborted due to throttling. Retrying after some backoff might succeed. Capacity: 10, Origin: ‘RequestRateLimitPolicy/WorkloadGroup/MyWorkloadGroup/Principal/aaduser=9e04c4f5-1abd-48d4-a3d2-9f58615b4724;6ccf3fe8-6343-4be5-96c3-29a128dd9570’.
  • The HTTP response code will be 429. The subcode will be TooManyRequests.
  • The exception type will be QueryThrottledException for queries, and ControlCommandThrottledException for management commands.

Resource utilization rate limit

A request rate limit of kind ResourceUtilization includes the following properties:

NameTypeDescriptionSupported Values
ResourceKindResourceKindThe resource to limit.

When ResourceKind is TotalCpuSeconds, the limit is enforced based on post-execution reports of CPU utilization of completed requests. Requests that report utilization of 0.005 seconds of CPU or lower aren’t counted. The limit (MaxUtilization) represents the total CPU seconds that can be consumed by requests within a specified time window (TimeWindow). For example, a user running ad-hoc queries may have a limit of 1000 CPU seconds per hour. If this limit is exceeded, subsequent queries will be throttled, even if started concurrently, as the cumulative CPU seconds have surpassed the defined limit within the sliding window period.
RequestCount, TotalCpuSeconds
MaxUtilizationlongThe maximum of the resource that can be utilized.RequestCount: [1, 16777215]; TotalCpuSeconds: [1, 828000]
TimeWindowtimespanThe sliding time window during which the limit is applied.[00:00:01, 01:00:00]

When a request exceeds the limit on resources utilization:

  • The request’s state, as presented by System information commands, will be Throttled.
  • The error message will include the origin of the throttling and the quota that’s been exceeded. For example:

The following table shows a few examples of requests that exceed the resource utilization rate limit and the error message that these requests return:

ScenarioError message
A throttled request that was classified to a workload group named Automated Requests, which has a limit of 1000 requests per hour at the scope of a principal.The request was denied due to exceeding quota limitations. Resource: ‘RequestCount’, Quota: ‘1000’, TimeWindow: ‘01:00:00’, Origin: ‘RequestRateLimitPolicy/WorkloadGroup/Automated Requests/Principal/aadapp=9e04c4f5-1abd-48d4-a3d2-9f58615b4724;6ccf3fe8-6343-4be5-96c3-29a128dd9570’.
A throttled request, that was classified to a workload group named Automated Requests, which has a limit of 2000 total CPU seconds per hour at the scope of the workload group.The request was denied due to exceeding quota limitations. Resource: ‘TotalCpuSeconds’, Quota: ‘2000’, TimeWindow: ‘01:00:00’, Origin: ‘RequestRateLimitPolicy/WorkloadGroup/Automated Requests’.
  • The HTTP response code will be 429. The subcode will be TooManyRequests.
  • The exception type will be QuotaExceededException.

How consistency affects rate limits

With strong consistency, the default limit on maximum concurrent requests depends on the SKU of the cluster, and is calculated as: Cores-Per-Node x 10. For example, a cluster that’s set up with Azure D14_v2 nodes, where each node has 16 vCores, will have a default limit of 16 x 10 = 160.

With weak consistency, the effective default limit on maximum concurrent requests depends on the SKU of the cluster and number of query heads, and is calculated as: Cores-Per-Node x 10 x Number-Of-Query-Heads. For example, a cluster that’s set up with Azure D14_v2 and 5 query heads, where each node has 16 vCores, will have an effective default limit of 16 x 10 x 5 = 800. With strong consistency, the default limit on maximum concurrent requests depends on the SKU of the eventhouse, and is calculated as: Cores-Per-Node x 10. For example, a eventhouse that’s set up with Azure D14_v2 nodes, where each node has 16 vCores, will have a default limit of 16 x 10 = 160.

With weak consistency, the effective default limit on maximum concurrent requests depends on the SKU of the eventhouse and number of query heads, and is calculated as: Cores-Per-Node x 10 x Number-Of-Query-Heads. For example, a eventhouse that’s set up with Azure D14_v2 and 5 query heads, where each node has 16 vCores, will have an effective default limit of 16 x 10 x 5 = 800.

For more information, see Query consistency.

The default workload group

The default workload group has the following policy defined by default. This policy can be altered.

[
  {
    "IsEnabled": true,
    "Scope": "WorkloadGroup",
    "LimitKind": "ConcurrentRequests",
    "Properties": {
      "MaxConcurrentRequests": < Cores-Per-Node x 10 >
    }
  }
]

Examples

The following policies allow up to:

  • 500 concurrent requests for the workload group.
  • 25 concurrent requests per principal.
  • 50 requests per principal per hour.
[
  {
    "IsEnabled": true,
    "Scope": "WorkloadGroup",
    "LimitKind": "ConcurrentRequests",
    "Properties": {
      "MaxConcurrentRequests": 500
    }
  },
  {
    "IsEnabled": true,
    "Scope": "Principal",
    "LimitKind": "ConcurrentRequests",
    "Properties": {
      "MaxConcurrentRequests": 25
    }
  },
  {
    "IsEnabled": true,
    "Scope": "Principal",
    "LimitKind": "ResourceUtilization",
    "Properties": {
      "ResourceKind": "RequestCount",
      "MaxUtilization": 50,
      "TimeWindow": "01:00:00"
    }
  }
]

The following policies will block all requests classified to the workload group:

[
  {
    "IsEnabled": true,
    "Scope": "WorkloadGroup",
    "LimitKind": "ConcurrentRequests",
    "Properties": {
      "MaxConcurrentRequests": 0
    }
  },
]

12.5 - Request rate limits enforcement policy

Learn how to use the request rate limits enforcement policy to enforce request rate limits.

A workload group’s request rate limits enforcement policy controls how request rate limits are enforced.

The policy object

A request rate limit policy has the following properties:

NameSupported valuesDefault valueDescription
QueriesEnforcementLevelCluster, QueryHeadQueryHeadIndicates the enforcement level for queries.
CommandsEnforcementLevelCluster, DatabaseDatabaseIndicates the enforcement level for commands.

Request rate limits enforcement level

Request rate limits can be enforced at one of the following levels:

  • Cluster:

    • Rate limits are enforced by the single cluster admin node.
  • Database:

    • Rate limits are enforced by the database admin node that manages the database the request was sent to.
    • If there are multiple database admin nodes, the configured rate limit is effectively multiplied by the number of database admin nodes.
  • QueryHead:

    • Rate limits for queries are enforced by the query head node that the query was routed to.
    • This option affects queries that are sent with either strong or weak query consistency.
      • Strongly consistent queries run on the database admin node, and the configured rate limit is effectively multiplied by the number of database admin nodes.
      • For weakly consistent queries, the configured rate limit is effectively multiplied by the number of query head nodes.
    • This option doesn’t apply to management commands.
  • Cluster:

    • Rate limits are enforced by the single Eventhouse admin node.
  • Database:

    • Rate limits are enforced by the database admin node that manages the database the request was sent to.
    • If there are multiple database admin nodes, the configured rate limit is effectively multiplied by the number of database admin nodes.
  • QueryHead:

    • Rate limits for queries are enforced by the query head node that the query was routed to.
    • This option affects queries that are sent with either strong or weak query consistency.
      • Strongly consistent queries run on the database admin node, and the configured rate limit is effectively multiplied by the number of database admin nodes.
      • For weakly consistent queries, the configured rate limit is effectively multiplied by the number of query head nodes.
    • This option doesn’t apply to management commands.

Examples

Setup

  • The cluster has 10 nodes as follows:

    • one cluster admin node.
    • two database admin nodes (each manages 50% of the cluster’s databases).
    • 50% of the tail nodes (5 out of 10) can serve as query heads for weakly consistent queries.
  • The default workload group is defined with the following policies:

    "RequestRateLimitPolicies": [
        {
            "IsEnabled": true,
            "Scope": "WorkloadGroup",
            "LimitKind": "ConcurrentRequests",
            "Properties": {
                "MaxConcurrentRequests": 200
            }
        }
    ],
    "RequestRateLimitsEnforcementPolicy": {
        "QueriesEnforcementLevel": "QueryHead",
        "CommandsEnforcementLevel": "Database"
    }

Effective rate limits

The effective rate limits for the default workload group are:

  • The maximum number of concurrent cluster-scoped management commands is 200.

  • The maximum number of concurrent database-scoped management commands is
    2 (database admin nodes) x 200 (max per admin node) = 400.

  • The maximum number of concurrent strongly consistent queries is
    2 (database admin nodes) x 200 (max per admin node) = 400.

  • The maximum number of concurrent weakly consistent queries is
    5 (query heads) x 200 (max per query head) = 1000.

  • The maximum number of concurrent eventhouse-scoped management commands is 200.

  • The maximum number of concurrent database-scoped management commands is
    2 (database admin nodes) x 200 (max per admin node) = 400.

  • The maximum number of concurrent strongly consistent queries is
    2 (database admin nodes) x 200 (max per admin node) = 400.

  • The maximum number of concurrent weakly consistent queries is
    5 (query heads) x 200 (max per query head) = 1000.

12.6 - Workload groups

Learn how to use workload groups to govern incoming requests to the cluster.

Workload groups allow you to group together sets of management commands and queries based on shared characteristics, and apply policies to control per-request limits and request rate limits for each of these groups.

Together with workload group policies, workload groups serve as a resource governance system for incoming requests to the cluster. When a request is initiated, it gets classified into a workload group. The classification is based on a user-defined function defined as part of a request classification policy. The request follows the policies assigned to the designated workload group throughout its execution.

Workload groups are defined at the cluster level, and up to 10 custom groups can be defined in addition to the three built-in workload groups. Together with workload group policies, workload groups serve as a resource governance system for incoming requests to the eventhouse. When a request is initiated, it gets classified into a workload group. The classification is based on a user-defined function defined as part of a request classification policy. The request follows the policies assigned to the designated workload group throughout its execution.

Workload groups are defined at the eventhouse level, and up to 10 custom groups can be defined in addition to the three built-in workload groups.

Use cases for custom workload groups

The following list covers some common use cases for creating custom workload groups:

  • Protect against runaway queries: Create a workload group with a requests limits policy to set restrictions on resource usage and parallelism during query execution. For example, this policy can regulate result set size, memory per iterator, memory per node, execution time, and CPU resource usage.

  • Control the rate of requests: Create a workload group with a request rate limit policy to manage the behavior of concurrent requests from a specific principal or application. This policy can restrict the number of concurrent requests, request count within a time period, and total CPU seconds per time period. While your cluster comes with default limits, such as query limits, you have the flexibility to adjust these limits based on your requirements.

  • Create shared environments: Imagine a scenario where you have 3 different customer teams running queries and commands on a shared cluster, possibly even accessing shared databases. If you’re billing these teams based on their resource usage, you can create three distinct workload groups, each with unique limits. These workload groups would allow you to effectively manage and monitor the resource usage of each customer team.

  • Control the rate of requests: Create a workload group with a request rate limit policy to manage the behavior of concurrent requests from a specific principal or application. This policy can restrict the number of concurrent requests, request count within a time period, and total CPU seconds per time period. While your eventhouse comes with default limits, such as query limits, you have the flexibility to adjust these limits based on your requirements.

  • Create shared environments: Imagine a scenario where you have 3 different customer teams running queries and commands on a shared eventhouse, possibly even accessing shared databases. If you’re billing these teams based on their resource usage, you can create three distinct workload groups, each with unique limits. These workload groups would allow you to effectively manage and monitor the resource usage of each customer team.

  • Monitor resources utilization: Workload groups can help you create periodic reports on the resource consumption of a given principal or application. For instance, if these principals represent different clients, such reports can facilitate accurate billing. For more information, see Monitor requests by workload group.

Create and manage workload groups

Use the following commands to manage workload groups and their policies:

Workload group policies

The following policies can be defined per workload group:

Built-in workload groups

The pre-defined workload groups are:

Default workload group

Requests are classified into the default group under these conditions:

  • There are no criteria to classify a request.
  • An attempt was made to classify the request into a non-existent group.
  • A general classification failure has occurred.

You can:

  • Change the criteria used for routing these requests.
  • Change the policies that apply to the default workload group.
  • Classify requests into the default workload group.

To monitor what gets classified to the default workload group, see Monitor requests by workload group.

Internal workload group

The internal workload group is populated with requests that are for internal use only.

You can’t:

  • Change the criteria used for routing these requests.
  • Change the policies that apply to the internal workload group.
  • Classify requests into the internal workload group.

To monitor what gets classified to the internal workload group, see Monitor requests by workload group.

Materialized views workload group

The $materialized-views workload group applies to the materialized views materialization process. For more information on how materialized views work, see Materialized views overview.

You can change the following values in the workload group’s request limits policy:

  • MaxMemoryPerQueryPerNode
  • MaxMemoryPerIterator
  • MaxFanoutThreadsPercentage
  • MaxFanoutNodesPercentage

Monitor requests by workload group

System commands indicate the workload group into which a request was classified. You can use these commands to aggregate resources utilization by workload group for completed requests.

The same information can also be viewed and analyzed in Azure Monitor insights.

12.7 - Request classification policy

12.7.1 - Request classification policy

Learn how to use the request classification policy to assign incoming requests to a workload group.

The classification process assigns incoming requests to a workload group, based on the characteristics of the requests. Tailor the classification logic by writing a user-defined function, as part of a cluster-level request classification policy. The classification process assigns incoming requests to a workload group, based on the characteristics of the requests. Tailor the classification logic by writing a user-defined function, as part of an Eventhouse-level request classification policy.

In the absence of an enabled request classification policy, all requests are classified into the default workload group.

Policy object

The policy has the following properties:

  • IsEnabled: bool - Indicates if the policy is enabled or not.
  • ClassificationFunction: string - The body of the function to use for classifying requests.

Classification function

The classification of incoming requests is based on a user-defined function. The results of the function are used to classify requests into existing workload groups.

The user-defined function has the following characteristics and behaviors:

  • If IsEnabled is set to true in the policy, the user-defined function is evaluated for every new request.
  • The user-defined function gives workload group context for the request for the full lifetime of the request.
  • The request is given the default workload group context in the following situations:
    • The user-defined function returns an empty string, default, or the name of nonexistent workload group.
    • The function fails for any reason.
  • Only one user-defined function can be designated at any given time.

Requirements and limitations

A classification function:

  • Must return a single scalar value of type string. That is the name of the workload group to assign the request to.
  • Must not reference any other entity (database, table, or function).
    • Specifically - it might not use the following functions and operators:
      • cluster()
      • database()
      • table()
      • external_table()
      • externaldata
  • Has access to a special dynamic symbol, a property-bag named request_properties, with the following properties:
NameTypeDescriptionExamples
current_databasestringThe name of the request database."MyDatabase"
current_applicationstringThe name of the application that sent the request."Kusto.Explorer", "KusWeb"
current_principalstringThe fully qualified name of the principal identity that sent the request."aaduser=1793eb1f-4a18-418c-be4c-728e310c86d3;83af1c0e-8c6d-4f09-b249-c67a2e8fda65"
query_consistencystringFor queries: the consistency of the query - strongconsistency or weakconsistency. This property is set by the caller as part of the request’s request properties: The client request property to set is: queryconsistency."strongconsistency", "weakconsistency"
request_descriptionstringCustom text that the author of the request can include. The text is set by the caller as part of the request’s Client request properties: The client request property to set is: request_description."Some custom description"; automatically populated for dashboards: "dashboard:{dashboard_id};version:{version};sourceId:{source_id};sourceType:{tile/parameter}"
request_textstringThe obfuscated text of the request. Obfuscated string literals included in the query text are replaced by multiple of star (*) characters. Note: only the leading 65,536 characters of the request text are evaluated.".show version"
request_typestringThe type of the request - Command or Query."Command", "Query"

Examples

A single workload group

iff(request_properties.current_application == "Kusto.Explorer" and request_properties.request_type == "Query",
    "Ad-hoc queries",
    "default")

Multiple workload groups

case(current_principal_is_member_of('aadgroup=somesecuritygroup@contoso.com'), "First workload group",
     request_properties.current_database == "MyDatabase" and request_properties.current_principal has 'aadapp=', "Second workload group",
     request_properties.current_application == "Kusto.Explorer" and request_properties.request_type == "Query", "Third workload group",
     request_properties.current_application == "Kusto.Explorer", "Third workload group",
     request_properties.current_application == "KustoQueryRunner", "Fourth workload group",
     request_properties.request_description == "this is a test", "Fifth workload group",
     hourofday(now()) between (17 .. 23), "Sixth workload group",
     "default")

Management commands

Use the following management commands to manage a cluster’s request classification policy.

CommandDescription
.alter cluster request classification policyAlters cluster’s request classification policy
.alter-merge cluster request classification policyEnables or disables a cluster’s request classification policy
.delete cluster request classification policyDeletes the cluster’s request classification policy
.show cluster request classification policyShows the cluster’s request classification policy
Use the following management commands to manage an Eventhouse’s request classification policy.
CommandDescription
.alter cluster request classification policyAlters Eventhouse’s request classification policy
.alter-merge cluster request classification policyEnables or disables an Eventhouse’s request classification policy
.delete cluster request classification policyDeletes the Eventhouse’s request classification policy
.show cluster request classification policyShows the Eventhouse’s request classification policy

12.8 - Workload group commands

13 - Management commands overview

This article describes management commands.

This article describes the management commands, also known as control commands, used to manage Kusto. Management commands are requests to the service to retrieve information that is not necessarily data in the database tables, or to modify the service state, etc.

Differentiating management commands from queries

Kusto uses three mechanisms to differentiate queries and management commands: at the language level, at the protocol level, and at the API level. This is done for security purposes.

At the language level, the first character of the text of a request determines if the request is a management command or a query. Management commands must start with the dot (.) character, and no query may start by that character.

At the protocol level, different HTTP/HTTPS endpoints are used for control commands as opposed to queries.

At the API level, different functions are used to send management commands as opposed to queries.

Combining queries and management commands

Management commands can reference queries (but not vice-versa) or other management commands. There are several supported scenarios:

  • AdminThenQuery: A management command is executed, and its result (represented as a temporary data table) serves as the input to a query.
  • AdminFromQuery: Either a query or a .show admin command is executed, and its result (represented as a temporary data table) serves as the input to a management command.

Note that in all cases, the entire combination is technically a management command, not a query, so the text of the request must start with a dot (.) character, and the request must be sent to the management endpoint of the service.

Also note that query statements appear within the query part of the text (they can’t precede the command itself).

AdminThenQuery is indicated in one of two ways:

  • By using a pipe (|) character, the query therefore treats the results of the management command as if it were any other data-producing query operator.
  • By using a semicolon (;) character, which then introduces the results of the management command into a special symbol called $command_results, that one may then use in the query any number of times.

For example:

// 1. Using pipe: Count how many tables are in the database-in-scope:
.show tables
| count

// 2. Using semicolon: Count how many tables are in the database-in-scope:
.show tables;
$command_results
| count

// 3. Using semicolon, and including a let statement:
.show tables;
let useless=(n:string){strcat(n,'-','useless')};
$command_results | extend LastColumn=useless(TableName)

AdminFromQuery is indicated by the <| character combination. For example, in the following we first execute a query that produces a table with a single column (named str of type string) and a single row, and write it as the table name MyTable in the database in context:

.set MyTable <|
let text="Hello, World!";
print str=text