1 - Policies overview

Learn which policies are available for management.

The following table provides an overview of the policies for managing your environment:

PolicyDescription
Auto delete policySets an expiry date for the table. The table is automatically deleted at this expiry time.
Cache policyDefines how to prioritize resources. Allows customers to differentiate between hot data cache and cold data cache.
Callout policyManages the authorized domains for external calls.
Capacity policyControls the compute resources of data management operations.
Encoding policyDefines how data is encoded, compressed, and indexed.
Extent tags retention policyControls the mechanism that automatically removes extent tags from tables.
Ingestion batching policyGroups multiple data ingestion requests into batches for more efficient processing.
Ingestion time policyAdds a hidden datetime column to the table that records the time of ingestion.
ManagedIdentity policyControls which managed identities can be used for what purposes.
Merge policyDefines rules for merging data from different extents into a single extent.
Mirroring policyAllows you to manage your mirroring policy and mirroring policy operations.
Partitioning policyDefines rules for partitioning extents for a specific table or a materialized view.
Retention policyControls the mechanism that automatically removes data from tables or materialized views.
Restricted view access policyAdds an extra layer of permission requirements for principals to access and view the table.
Row level security policyDefines rules for access to rows in a table based on group membership or execution context.
Row order policyMaintains a specific order for rows within an extent.
Sandbox policyControls the usage and behavior of sandboxes, which are isolated environments for query execution.
Sharding policyDefines rules for how extents are created.
Streaming ingestion policyConfigurations for streaming data ingestion.
Update policyAllows for data to be appended to a target table upon adding data to a source table.
Query weak consistency policyControls the level of consistency for query results.

2 - Auto delete

2.1 - Auto delete policy

Learn about the auto delete policy to set an expiry date for the table.

An auto delete policy on a table sets an expiry date for the table. The table is automatically deleted at this expiry time. Unlike the retention policy, which determines when data (extents) are removed from a table, the auto delete policy drops the entire table.

The auto delete policy can be useful for temporary staging tables. Temporary staging tables are used for data preparation, until the data is moved to its permanent location. We recommend explicitly dropping temporary tables when they’re no longer needed. Only use the auto delete policy as a fallback mechanism in case the explicit deletion doesn’t occur.

Policy object

An auto delete policy includes the following properties:

  • ExpiryDate:

    • Date and time value indicating when the table should be deleted.
    • The deletion time is imprecise, and could occur few hours later than the time specified in the ExpiryDate property.
    • The value specified can’t be null and it must be greater than current time.
  • DeleteIfNotEmpty:

    • Boolean value indicating whether table should be dropped even if there are still extents in it.
    • Defaults to false.

For more information, see auto delete policy commands.

3 - Caching

3.1 - Caching policy (hot and cold cache)

This article describes caching policy (hot and cold cache).

To ensure fast query performance, a multi-tiered data cache system is used. Data is stored in reliable storage but parts of it are cached on processing nodes, SSD, or even in RAM for faster access.

The caching policy allows you to choose which data should be cached. You can differentiate between hot data cache and cold data cache by setting a caching policy on hot data. Hot data is kept in local SSD storage for faster query performance, while cold data is stored in reliable storage, which is cheaper but slower to access.

The cache uses 95% of the local SSD disk for hot data. If there isn’t enough space, the most recent data is preferentially kept in the cache. The remaining 5% is used for data that isn’t categorized as hot. This design ensures that queries loading lots of cold data won’t evict hot data from the cache.

The best query performance is achieved when all ingested data is cached. However, certain data might not warrant the expense of being kept in the hot cache. For instance, infrequently accessed old log records might be considered less crucial. In such cases, teams often opt for lower querying performance over paying to keep the data warm.

Use management commands to alter the caching policy at the database, table, or materialized view level.

Use management commands to alter the caching policy at the cluster, database, table, or materialized view level.

How caching policy is applied

When data is ingested, the system keeps track of the date and time of the ingestion, and of the extent that was created. The extent’s ingestion date and time value (or maximum value, if an extent was built from multiple preexisting extents), is used to evaluate the caching policy.

By default, the effective policy is null, which means that all the data is considered hot. A null policy at the table level means that the policy is inherited from the database. A non-null table-level policy overrides a database-level policy.

Scoping queries to hot cache

When running queries, you can limit the scope to only query data in hot cache.

There are several query possibilities:

  • Add a client request property called query_datascope to the query. Possible values: default, all, and hotcache.
  • Use a set statement in the query text: set query_datascope='...'. Possible values are the same as for the client request property.
  • Add a datascope=... text immediately after a table reference in the query body. Possible values are all and hotcache.

The default value indicates use of the default settings, which determine that the query should cover all data.

If there’s a discrepancy between the different methods, then set takes precedence over the client request property. Specifying a value for a table reference takes precedence over both.

For example, in the following query, all table references use hot cache data only, except for the second reference to “T” that is scoped to all the data:

set query_datascope="hotcache";
T | union U | join (T datascope=all | where Timestamp < ago(365d)) on X

Caching policy vs retention policy

Caching policy is independent of retention policy:

  • Caching policy defines how to prioritize resources. Queries for important data are faster.
  • Retention policy defines the extent of the queryable data in a table/database (specifically, SoftDeletePeriod).

Configure this policy to achieve the optimal balance between cost and performance, based on the expected query pattern.

Example:

  • SoftDeletePeriod = 56d
  • hot cache policy = 28d

In the example, the last 28 days of data is stored on the SSD and the additional 28 days of data is stored in Azure blob storage. You can run queries on the full 56 days of data.

4 - Callout

4.1 - Callout policy

Learn how to update a cluster’s callout policy to manage authorized domains for external calls.

Your cluster can communicate with external services in many different scenarios. Cluster administrators can manage the authorized domains for external calls by updating the cluster’s callout policy.

Supported properties of a callout

A callout policy is composed of the following properties:

NameTypeDescription
CalloutTypestringDefines the type of callout, and can be one of types listed in callout types.
CalloutUriRegexstringSpecifies the regular expression whose matches represent the domain of resources of the callout domain.
CanCallboolWhether the callout is permitted or denied external calls.

Types of callout

Callout policies are managed at cluster-level and are classified into the following types:

Callout policy typeDescription
kustoControls cross-cluster queries.
sqlControls the SQL plugin.
mysqlControls the MySQL plugin.
postgresqlControls the PostgreSql plugin.
azure_digital_twinsControls the Azure Digital Twins plugin.
cosmosdbControls the Cosmos DB plugin.
sandbox_artifactsControls sandboxed plugins (python and R).
external_dataControls access to external data through external tables or externaldata operator.
webapiControls access to http endpoints.
azure_openaiControls calls to Azure OpenAI plugins such as the embedding plugin ai_embed_text plugin.

Predefined callout policies

The following table shows a set of predefined callout policies that are preconfigured on your cluster to enable callouts to selected services:

ServiceDesignationPermitted domains
KustoCross cluster queries[a-z0-9]{3,22}\\.(\\w+\\.)?kusto(mfa)?\\.windows\\.net/?$
KustoCross cluster queries`^https://[a-z0-9]{3,22}\.[a-z0-9-]{1,50}\.(kusto\.azuresynapse
KustoCross cluster queries`^https://([A-Za-z0-9]+\.)?(ade
Azure DBSQL requests[a-z0-9][a-z0-9\\-]{0,61}[a-z0-9]?\\.database\\.windows\\.net/?$
Synapse AnalyticsSQL requests[a-z0-9-]{0,61}?(-ondemand)?\\.sql\\.azuresynapse(-dogfood)?\\.net/?$
External DataExternal data.*
Azure Digital TwinsAzure Digital Twins[A-Za-z0-9\\-]{3,63}\\.api\\.[A-Za-z0-9]+\\.digitaltwins\\.azure\\.net/?$

More predefined policies on your cluster may be observed with next query:

.show cluster policy callout 
| where EntityType == 'Cluster immutable policy'
| project Policy

Remarks

If an external resource of a given type matches more than one policy defined for such type, and at least one of the matched policies has their CanCall property set to false, access to the resource is denied.

5 - Capacity

5.1 - Capacity policy

Learn how to use the capacity policy to control the compute resources of data management operations on a cluster.

A capacity policy is used for controlling the compute resources of data management operations on the cluster.

The capacity policy object

The capacity policy is made of the following components:

To view the capacity of your cluster, use the .show capacity command.

Ingestion capacity

PropertyTypeDescription
ClusterMaximumConcurrentOperationslongThe maximum number of concurrent ingestion operations allowed in a cluster. This value caps the total ingestion capacity, as shown in the following formula.
CoreUtilizationCoefficientrealDetermines the percentage of cores to use in the ingestion capacity calculation.

Formula

The .show capacity command returns the cluster’s ingestion capacity based on the following formula:

Minimum(ClusterMaximumConcurrentOperations , Number of nodes in cluster * Maximum(1, Core count per node * CoreUtilizationCoefficient))

Extents merge capacity

PropertyTypeDescription
MinimumConcurrentOperationsPerNodelongThe minimal number of concurrent extents merge/rebuild operations on a single node. Default is 1.
MaximumConcurrentOperationsPerNodelongThe maximum number of concurrent extents merge/rebuild operations on a single node. Default is 5.
ClusterMaximumConcurrentOperationslongThe maximum number of concurrent extents merge/rebuild operations allowed in a cluster. This value caps the total merge capacity.

Formula

The .show capacity command returns the cluster’s extents merge capacity based on the following formula:

Minimum(Number of nodes in cluster * Concurrent operations per node, ClusterMaximumConcurrentOperations)

The effective value for Concurrent operations per node is automatically adjusted by the system in the range [MinimumConcurrentOperationsPerNode,MaximumConcurrentOperationsPerNode], as long as the success rate of the merge operations is 90% or higher.

Extents purge rebuild capacity

PropertyTypeDescription
MaximumConcurrentOperationsPerNodelongThe maximum number of concurrent rebuild extents for purge operations on a single node.

Formula

The .show capacity command returns the cluster’s extents purge rebuild capacity based on the following formula:

Number of nodes in cluster x MaximumConcurrentOperationsPerNode

Export capacity

PropertyTypeDescription
ClusterMaximumConcurrentOperationslongThe maximum number of concurrent export operations in a cluster. This value caps the total export capacity, as shown in the following formula.
CoreUtilizationCoefficientlongDetermines the percentage of cores to use in the export capacity calculation.

Formula

The .show capacity command returns the cluster’s export capacity based on the following formula:

Minimum(ClusterMaximumConcurrentOperations , Number of nodes in cluster * Maximum(1, Core count per node * CoreUtilizationCoefficient))

Extents partition capacity

PropertyTypeDescription
ClusterMinimumConcurrentOperationslongThe minimal number of concurrent extents partition operations in a cluster. Default is 1.
ClusterMaximumConcurrentOperationslongThe maximum number of concurrent extents partition operations in a cluster. Default is 32.

The effective value for Concurrent operations is automatically adjusted by the system in the range [ClusterMinimumConcurrentOperations,ClusterMaximumConcurrentOperations], as long as the success rate of the partitioning operations is 90% or higher.

Materialized views capacity policy

The policy can be used to change concurrency settings for materialized views. Changing the materialized views capacity policy can be useful when there’s more than a single materialized view defined on a cluster.

PropertyTypeDescription
ClusterMinimumConcurrentOperationslongThe minimal number of concurrent materialization operations in a cluster. Default is 1.
ClusterMaximumConcurrentOperationslongThe maximum number of concurrent materialization operations in a cluster. Default is 10.

By default, only a single materialization runs concurrently (see how materialized views work). The system adjusts the current concurrency in the range [ClusterMinimumConcurrentOperations,ClusterMaximumConcurrentOperations], based on the number of materialized views in the cluster and the cluster’s CPU. You can increase/decrease concurrency by altering this policy. For example, if the cluster has 10 materialized views, setting the ClusterMinimumConcurrentOperations to five ensures that at least five of them can materialize concurrently. You can view the effective value for the current concurrency using the .show capacity command

Stored query results capacity

PropertyTypeDescription
MaximumConcurrentOperationsPerDbAdminlongThe maximum number of concurrent ingestion operations in a cluster admin node.
CoreUtilizationCoefficientrealDetermines the percentage of cores to use in the stored query results creation calculation.

Formula

The .show capacity command returns the cluster’s stored query results creation capacity based on the following formula:

Minimum(MaximumConcurrentOperationsPerDbAdmin , Number of nodes in cluster * Maximum(1, Core count per node * CoreUtilizationCoefficient))

Streaming ingestion post processing capacity

PropertyTypeDescription
MaximumConcurrentOperationsPerNodelongThe maximum number of concurrent streaming ingestion post processing operations on each cluster node.

Formula

The .show capacity command returns the cluster’s streaming ingestion post processing capacity based on the following formula:

Number of nodes in cluster x MaximumConcurrentOperationsPerNode

Purge storage artifacts cleanup capacity

PropertyTypeDescription
MaximumConcurrentOperationsPerClusterlongThe maximum number of concurrent purge storage artifacts cleanup operations on cluster.

Formula

The .show capacity command returns the cluster’s purge storage artifacts cleanup capacity based on the following formula:

MaximumConcurrentOperationsPerCluster

Periodic storage artifacts cleanup capacity

PropertyTypeDescription
MaximumConcurrentOperationsPerClusterlongThe maximum number of concurrent periodic storage artifacts cleanup operations on cluster.

Formula

The .show capacity command returns the cluster’s periodic storage artifacts cleanup capacity based on the following formula:

MaximumConcurrentOperationsPerCluster

Query Acceleration capacity

PropertyTypeDescription
ClusterMaximumConcurrentOperationslongThe maximum number of concurrent query acceleration caching operations in a cluster. This value caps the total query acceleration caching capacity, as shown in the following formula.
CoreUtilizationCoefficientlongDetermines the percentage of cores to use in the query acceleration caching capacity calculation.

Formula

The .show capacity command returns the cluster’s query acceleration caching capacity based on the following formula:

Minimum(ClusterMaximumConcurrentOperations , Number of nodes in cluster * Maximum(1, Core count per node * CoreUtilizationCoefficient))

Defaults

The default capacity policy has the following JSON representation:

{
  "IngestionCapacity": {
    "ClusterMaximumConcurrentOperations": 512,
    "CoreUtilizationCoefficient": 0.75
  },
  "ExtentsMergeCapacity": {
    "MinimumConcurrentOperationsPerNode": 1,
    "MaximumConcurrentOperationsPerNode": 3
  },
  "ExtentsPurgeRebuildCapacity": {
    "MaximumConcurrentOperationsPerNode": 1
  },
  "ExportCapacity": {
    "ClusterMaximumConcurrentOperations": 100,
    "CoreUtilizationCoefficient": 0.25
  },
  "ExtentsPartitionCapacity": {
    "ClusterMinimumConcurrentOperations": 1,
    "ClusterMaximumConcurrentOperations": 32
  },
  "MaterializedViewsCapacity": {
    "ClusterMaximumConcurrentOperations": 1,
    "ExtentsRebuildCapacity": {
      "ClusterMaximumConcurrentOperations": 50,
      "MaximumConcurrentOperationsPerNode": 5
    }
  },
  "StoredQueryResultsCapacity": {
    "MaximumConcurrentOperationsPerDbAdmin": 250,
    "CoreUtilizationCoefficient": 0.75
  },
  "StreamingIngestionPostProcessingCapacity": {
    "MaximumConcurrentOperationsPerNode": 4
  },
  "PurgeStorageArtifactsCleanupCapacity": {
    "MaximumConcurrentOperationsPerCluster": 2
  },
  "PeriodicStorageArtifactsCleanupCapacity": {
    "MaximumConcurrentOperationsPerCluster": 2
  },
  "QueryAccelerationCapacity": {
    "ClusterMaximumConcurrentOperations": 100,
    "CoreUtilizationCoefficient": 0.5
  }
}

Management commands

Management commands throttling

Kusto limits the number of concurrent requests for the following user-initiated commands:

When the cluster detects that an operation exceeded the limit on concurrent requests:

  • The command’s state, as presented by System information commands, is Throttled.
  • The error message includes the command type, the origin of the throttling and the capacity that exceeded. For example:
    • For example: The management command was aborted due to throttling. Retrying after some backoff might succeed. CommandType: 'TableSetOrAppend', Capacity: 18, Origin: 'CapacityPolicy/Ingestion'.
  • The HTTP response code is 429. The subcode is TooManyRequests.
  • The exception type is ControlCommandThrottledException.

6 - Encoding policy

6.1 - Encoding policy

This article describes the encoding policy.

The encoding policy defines how data is encoded, compressed, and indexed. This policy applies to all columns of stored data. A default encoding policy is applied based on the column’s data type, and a background process adjusts the encoding policy automatically if necessary.

Scenarios

We recommend the default policy be maintained except for specific scenarios. It can be useful to modify the default column’s encoding policy to fine tune control over the performance/COGS trade-off. For example:

  • The default indexing applied to string columns is built for term searches. If you only query for specific values in the column, COGS might be reduced if the index is simplified using the encoding profile Identifier. For more information, see the string data type.
  • Fields that are never queried on or don’t need fast searches can disable indexing. You can use profile BigObject to turn off the indexes and increase maximal value size in dynamic or string columns. For example, use this profile to store HLL values returned by hll() function.

How it works

Encoding policy changes do not affect data that has already been ingested. Only new ingestion operations will be performed according to the new policy. The encoding policy applies to individual columns in a table, but can be set at the column level, table level (affecting all columns of the table), or database level.

7 - Extent tags policy

7.1 - Extent tags retention policy

This article describes extent tags retention policies.

The extent tags retention policy controls the mechanism that automatically removes extent tags from tables, based on the age of the extents.

It’s recommended to remove any tags that are no longer helpful, or were used temporarily as part of an ingestion pipeline, and may limit the system from reaching optimal performance. For example: old drop-by: tags, which prevent merging extents together.

The policy can be set at the table-level, or at the database-level. A database-level policy applies to all tables in the database that don’t override the policy.

The policy object

The extent tags retention policy is an array of policy objects. Each object includes the following properties:

Property nameTypeDescriptionExample
TagPrefixstringThe prefix of the tags to be automatically deleted, once RetentionPeriod is exceeded. The prefix must include a colon (:) as its final character, and may only include one colon.drop-by:, ingest-by:, custom_prefix:
RetentionPeriodtimespanThe duration for which it’s guaranteed that the tags aren’t dropped. This period is measured starting from the extent’s creation time.1.00:00:00

Example

The following policy will have any drop-by: tags older than three days and any ingest-by: tags older than two hours automatically dropped:

[
    {
        "TagPrefix": "drop-by:",
        "RetentionPeriod": "3.00:00:00"
    },
    {
        "TagPrefix": "ingest-by:",
        "RetentionPeriod": "02:00:00"
    }
]

Defaults

By default, when the policy isn’t defined, extent tags of any kind are retained as long as the extent isn’t dropped.

Management commands

The following management commands can be used to manage the extent tags retention policy:

8 - Ingestion batching

8.1 - IngestionBatching policy

Learn how to use the IngestionBatching policy to optimize batching for ingestion.

Overview

During the queued ingestion process, the service optimizes for throughput by batching small ingress data chunks together before ingestion. Batching reduces the resources consumed by the queued ingestion process and doesn’t require post-ingestion resources to optimize the small data shards produced by non-batched ingestion.

The downside to doing batching before ingestion is the forced delay. Therefore, the end-to-end time from requesting the data ingestion until the data ready for query is larger.

When you define the IngestionBatching policy, you’ll need to find a balance between optimizing for throughput and time delay. This policy applies to queued ingestion. It defines the maximum forced delay allowed when batching small blobs together. To learn more about using batching policy commands, and optimizing for throughput, see:

Sealing a batch

There’s an optimal size of about 1 GB of uncompressed data for bulk ingestion. Ingestion of blobs with much less data is suboptimal, so in queued ingestion the service will batch small blobs together.

The following list shows the basic batching policy triggers to seal a batch. A batch is sealed and ingested when the first condition is met:

  • Size: Batch size limit reached or exceeded
  • Count: Batch file number limit reached
  • Time: Batching time has expired

The IngestionBatching policy can be set on databases or tables. Default values are as follows: 5 minutes maximum delay time, 500 items, total size of 1 GB.

The following list shows conditions to seal batches related to single blob ingestion. A batch is sealed and ingested when the conditions are met:

  • SingleBlob_FlushImmediately: Ingest a single blob because ‘FlushImmediately’ was set
  • SingleBlob_IngestIfNotExists: Ingest a single blob because ‘IngestIfNotExists’ was set
  • SingleBlob_IngestByTag: Ingest a single blob because ‘ingest-by’ was set
  • SingleBlob_SizeUnknown: Ingest a single blob because blob size is unknown

If the SystemFlush condition is set, a batch will be sealed when a system flush is triggered. With the SystemFlush parameter set, the system flushes the data, for example due to database scaling or internal reset of system components.

Defaults and limits

TypePropertyDefaultLow latency settingMinimum valueMaximum value
Number of itemsMaximumNumberOfItems500500125,000
Data size (MB)MaximumRawDataSizeMB102410241004096
Time (TimeSpan)MaximumBatchingTimeSpan00:05:0000:00:20 - 00:00:3000:00:1000:30:00

The most effective way of controlling the end-to-end latency using ingestion batching policy is to alter its time boundary at table or database level, according to the higher bound of latency requirements. A database level policy affects all tables in that database that don’t have the table-level policy defined, and any newly created table.

Batch data size

The batching policy data size is set for uncompressed data. For Parquet, AVRO, and ORC files, an estimation is calculated based on file size. For compressed data, the uncompressed data size is evaluated as follows in descending order of accuracy:

  1. If the uncompressed size is provided in the ingestion source options, that value is used.
  2. When ingesting local files using SDKs, zip archives and gzip streams are inspected to assess their raw size.
  3. If previous options don’t provide a data size, a factor is applied to the compressed data size to estimate the uncompressed data size.

Batching latencies

Latencies can result from many causes that can be addressed using batching policy settings.

CauseSolution
Data latency matches the time setting, with too little data to reach the size or count limitReduce the time limit
Inefficient batching due to a large number of very small filesIncrease the size of the source files. If using Kafka Sink, configure it to send data in ~100 KB chunks or higher. If you have many small files, increase the count (up to 2000) in the database or table ingestion policy.
Batching a large amount of uncompressed dataThis is common when ingesting Parquet files. Incrementally decrease size for the table or database batching policy towards 250 MB and check for improvement.
Backlog because the database is under scaledAccept any Azure advisor suggestions to scale aside or scale up your database. Alternatively, manually scale your database to see if the backlog is closed. If these options don’t work, contact support for assistance.

9 - Ingestion time

9.1 - IngestionTime policy

This article describes IngestionTime policy.

The IngestionTime policy is an optional policy that can be set (enabled) on tables.

When enabled, Kusto adds a hidden datetime column to the table, called $IngestionTime. Now, whenever new data is ingested, the time of ingestion is recorded in the hidden column. That time is measured just before the data is committed.

Since the ingestion time column is hidden, you can’t directly query for its value. Instead, a special function called ingestion_time() retrieves that value. If there’s no datetime column in the table, or the IngestionTime policy wasn’t enabled when a record was ingested, a null value is returned.

The IngestionTime policy is designed for two main scenarios:

  • To allow users to estimate the latency in ingesting data. Many tables with log data have a timestamp column. The timestamp value gets filled by the source and indicates the time when the record was produced. By comparing that column’s value with the ingestion time column, you can estimate the latency for getting the data in.

    [!NOTE] The calculated value is only an estimate, because the source and Kusto don’t necessarily have their clocks synchronized.

  • To support Database Cursors that let users issue consecutive queries, the query is limited to the data that was ingested since the previous query.

For more information. see the management commands for managing the IngestionTime policy.

For more information. see the management commands for managing the IngestionTime policy.

10 - Managed identity

10.1 - Kusto ManagedIdentity policy

Learn about the ManagedIdentity policy to control managed identities.

ManagedIdentity is a policy that controls which managed identities can be used for what purposes. For example, you can configure a policy that allows a specific managed identity to be used for accessing a storage account for ingestion purposes.

This policy can be enabled at the cluster and database levels. The policy is additive, meaning that for every operation that involves a managed identity, the operation will be permitted if the usage is allowed at either the cluster or database level.

Permissions

Creating or altering a managed identity policy requires AllDatabasesAdmin permissions.

The ManagedIdentity policy object

A cluster or database may have zero or more ManagedIdentity policy objects associated with it. Each ManagedIdentity policy object has the following user-definable properties: DisplayName and AllowedUsages. Other properties are automatically populated from the managed identity associated with the specified ObjectId and displayed for convenience.

The following table describes the properties of the ManagedIdentity policy object:

PropertyTypeRequiredDescription
ObjectIdstring✔️Either the actual object ID of the managed identity or the reserved keyword system to reference the System Managed Identity of the cluster on which the command is run.
ClientIdstringNot applicableThe client ID of the managed identity.
TenantIdstringNot applicableThe tenant ID of the managed identity.
DisplayNamestringNot applicableThe display name of the managed identity.
IsSystemboolNot applicableA Boolean value indicating true if the identity is a System Managed Identity; false if otherwise.
AllowedUsagesstring✔️A list of comma-separated allowed usage values for the managed identity. See managed identity usages.

The following is an example of a ManagedIdentity policy object:

{
  "ObjectId": "<objectID>",
  "ClientId": "<clientID>",
  "TenantId": "<tenantID",
  "DisplayName": "myManagedIdentity",
  "IsSystem": false,
  "AllowedUsages": "NativeIngestion, ExternalTable"
}

Managed identity usages

The following values specify authentication to a usage using the configured managed identity:

ValueDescription
AllAll current and future usages are allowed.
AutomatedFlowsRun a Continuous Export or Update Policy automated flow on behalf of a managed identity.
AzureAIAuthenticate to an Azure OpenAI service using the ai_embed_text plugin with a managed identity.
DataConnectionAuthenticate to data connections to an Event Hub or an Event Grid.
ExternalTableAuthenticate to external tables using connection strings configured with a managed identity.
NativeIngestionAuthenticate to an SDK for native ingestion from an external source.
SandboxArtifactsAuthenticate to external artifacts referenced in sandboxed plugins (e.g., Python) with a managed identity. This usage needs to be defined on the cluster level managed identity policy.
SqlRequestAuthenticate to an external database using the sql_request or cosmosdb_request plugin with a managed identity.

11 - Merge policy

11.1 - Extents merge policy

Learn how to use the merge policy to define how extents are merged.

The merge policy defines if and how Extents (data shards) should get merged.

There are two types of merge operations: Merge, which rebuilds indexes, and Rebuild, which completely reingests the data.

Both operation types result in a single extent that replaces the source extents.

By default, Rebuild operations are preferred. If there are extents that don’t fit the criteria for being rebuilt, then an attempt will be made to merge them.

Merge policy properties

The merge policy contains the following properties:

  • RowCountUpperBoundForMerge:
    • Defaults to 16,000,000.
    • Maximum allowed row count of the merged extent.
    • Applies to Merge operations, not Rebuild.
  • OriginalSizeMBUpperBoundForMerge:
    • Defaults to 30,000.
    • Maximum allowed original size (in MBs) of the merged extent.
    • Applies to Merge operations, not Rebuild.
  • MaxExtentsToMerge:
    • Defaults to 100.
    • Maximum allowed number of extents to be merged in a single operation.
    • Applies to Merge operations.
    • This value shouldn’t be changed.
  • AllowRebuild:
    • Defaults to ’true'.
    • Defines whether Rebuild operations are enabled (in which case, they’re preferred over Merge operations).
  • AllowMerge:
    • Defaults to ’true'.
    • Defines whether Merge operations are enabled, in which case, they’re less preferred than Rebuild operations.
  • MaxRangeInHours:
    • Defaults to 24.
    • The maximum allowed difference, in hours, between any two different extents’ creation times, so that they can still be merged.
    • Timestamps are of extent creation, and don’t relate to the actual data contained in the extents.
    • Applies to both Merge and Rebuild operations.
    • In materialized views: defaults to 336 (14 days), unless recoverability is disabled in the materialized view’s effective retention policy.
    • This value should be set according to the effective retention policy SoftDeletePeriod, or cache policy DataHotSpan values. Take the lower value of SoftDeletePeriod and DataHotSpan. Set the MaxRangeInHours value to between 2-3% of it. See the examples .
  • Lookback:
    • Defines the timespan during which extents are considered for rebuild/merge.
    • Supported values:
      • Default - The system-managed default. This is the recommended and default value, whose period is currently set to 14 days.
      • All - All extents, hot and cold, are included.
      • HotCache - Only hot extents are included.
      • Custom - Only extents whose age is under the provided CustomPeriod are included. CustomPeriod is a timespan value in the format dd.hh:mm.

Default policy example

The following example shows the default policy:

{
  "RowCountUpperBoundForMerge": 16000000,
  "OriginalSizeMBUpperBoundForMerge": 30000,
  "MaxExtentsToMerge": 100,,
  "MaxRangeInHours": 24,
  "AllowRebuild": true,
  "AllowMerge": true,
  "Lookback": {
    "Kind": "Default",
    "CustomPeriod": null
  }
}

MaxRangeInHours examples

min(SoftDeletePeriod (Retention Policy), DataHotSpan (Cache Policy))Max Range in hours (Merge Policy)
7 days (168 hours)4
14 days (336 hours)8
30 days (720 hours)18
60 days (1,440 hours)36
90 days (2,160 hours)60
180 days (4,320 hours)120
365 days (8,760 hours)250

When a database is created, it’s set with the default merge policy values mentioned above. The policy is by default inherited by all tables created in the database, unless their policies are explicitly overridden at table-level.

For more information, see management commands that allow you to manage merge policies for databases or tables.

12 - Mirroring policy

12.1 - Mirroring policy

Learn how to use the mirroring policy.

The mirroring policy commands allow you to view, change, partition, and delete your table mirroring policy. They also provide a way to check the mirroring latency by reviewing the operations mirroring status.

Management commands

The policy object

The mirroring policy includes the following properties:

PropertyDescriptionValuesDefault
FormatThe format of your mirrored files.Valid value is parquet.parquet
ConnectionStringsAn array of connection strings that help configure and establish connections. This value is autopopulated.
IsEnabledDetermines whether the mirroring policy is enabled. When the mirroring policy is disabled and set to false, the underlying mirroring data is retained in the database.true, false, null.null
PartitionsA comma-separated list of columns used to divide the data into smaller partitions.See Partitions formatting.

Data types mapping

To ensure compatibility and optimize queries, ensure that your data types are properly mapped to the parquet data types.

Event house to Delta parquet data types mapping

Event house data types are mapped to Delta Parquet data types using the following rules:

Event house data typeDelta data type
boolboolean
datetimetimestamp OR date (for date-bound partition definitions)
dynamicstring
guidstring
intinteger
longlong
realdouble
stringstring
timespanlong
decimaldecimal(38,18)

For more information on Event house data types, see Scalar data types.

Example policy

{
  "Format": "parquet",
  "IsEnabled": true,
  "Partitions": null,
}

13 - Partitioning policy

13.1 - Partitioning policy

Learn how to use the partitioning policy to improve query performance.

The partitioning policy defines if and how extents (data shards) should be partitioned for a specific table or a materialized view.

The policy triggers an additional background process that takes place after the creation of extents, following data ingestion. This process includes reingesting data from the source extents and producing homogeneous extents, in which all values of the column designated as the partition key reside within a single partition.

The primary objective of the partitioning policy is to enhance query performance in specific supported scenarios.

Supported scenarios

The following are the only scenarios in which setting a data partitioning policy is recommended. In all other scenarios, setting the policy isn’t advised.

  • Frequent filters on a medium or high cardinality string or guid column:
    • For example: multitenant solutions, or a metrics table where most or all queries filter on a column of type string or guid, such as the TenantId or the MetricId.
    • Medium cardinality is at least 10,000 distinct values.
    • Set the hash partition key to be the string or guid column, and set the PartitionAssignmentMode property to uniform.
  • Frequent aggregations or joins on a high cardinality string or guid column:
    • For example, IoT information from many different sensors, or academic records of many different students.
    • High cardinality is at least 1,000,000 distinct values, where the distribution of values in the column is approximately even.
    • In this case, set the hash partition key to be the column frequently grouped-by or joined-on, and set the PartitionAssignmentMode property to ByPartition.
  • Out-of-order data ingestion:
    • Data ingested into a table might not be ordered and partitioned into extents (shards) according to a specific datetime column that represents the data creation time and is commonly used to filter data. This could be due to a backfill from heterogeneous source files that include datetime values over a large time span.
    • In this case, set the uniform range datetime partition key to be the datetime column.
    • If you need retention and caching policies to align with the datetime values in the column, instead of aligning with the time of ingestion, set the OverrideCreationTime property to true.

Partition keys

The following kinds of partition keys are supported.

KindColumn TypePartition propertiesPartition value
Hashstring or guidFunction, MaxPartitionCount, Seed, PartitionAssignmentModeFunction(ColumnName, MaxPartitionCount, Seed)
Uniform rangedatetimeRangeSize, Reference, OverrideCreationTimebin_at(ColumnName, RangeSize, Reference)

Hash partition key

If the policy includes a hash partition key, all homogeneous extents that belong to the same partition will be assigned to the same data node.

  • A hash-modulo function is used to partition the data.
  • Data in homogeneous (partitioned) extents is ordered by the hash partition key.
    • You don’t need to include the hash partition key in the row order policy, if one is defined on the table.
  • Queries that use the shuffle strategy, and in which the shuffle key used in join, summarize or make-series is the table’s hash partition key, are expected to perform better because the amount of data required to move across nodes is reduced.

Partition properties

PropertyDescriptionSupported value(s)Recommended value
FunctionThe name of a hash-modulo function to use.XxHash64
MaxPartitionCountThe maximum number of partitions to create (the modulo argument to the hash-modulo function) per time period.In the range (1,2048].Higher values lead to greater overhead of the data partitioning process, and a higher number of extents for each time period. The recommended value is 128. Higher values will significantly increase the overhead of partitioning the data post-ingestion, and the size of metadata - and are therefore not recommended.
SeedUse for randomizing the hash value.A positive integer.1, which is also the default value.
PartitionAssignmentModeThe mode used for assigning partitions to nodes.ByPartition: All homogeneous (partitioned) extents that belong to the same partition are assigned to the same node.
Uniform: An extents’ partition values are disregarded. Extents are assigned uniformly to the nodes.
If queries don’t join or aggregate on the hash partition key, use Uniform. Otherwise, use ByPartition.

Hash partition key example

A hash partition key over a string-typed column named tenant_id. It uses the XxHash64 hash function, with MaxPartitionCount set to the recommended value 128, and the default Seed of 1.

{
  "ColumnName": "tenant_id",
  "Kind": "Hash",
  "Properties": {
    "Function": "XxHash64",
    "MaxPartitionCount": 128,
    "Seed": 1,
    "PartitionAssignmentMode": "Uniform"
  }
}

Uniform range datetime partition key

In these cases, you can reshuffle the data between extents so that each extent includes records from a limited time range. This process results in filters on the datetime column being more effective at query time.

The partition function used is bin_at() and isn’t customizable.

Partition properties

PropertyDescriptionRecommended value
RangeSizeA timespan scalar constant that indicates the size of each datetime partition.Start with the value 1.00:00:00 (one day). Don’t set a shorter value, because it may result in the table having a large number of small extents that can’t be merged.
ReferenceA datetime scalar constant that indicates a fixed point in time, according to which datetime partitions are aligned.Start with 1970-01-01 00:00:00. If there are records in which the datetime partition key has null values, their partition value is set to the value of Reference.
OverrideCreationTimeA bool indicating whether or not the result extent’s minimum and maximum creation times should be overridden by the range of the values in the partition key.Defaults to false. Set to true if data isn’t ingested in-order of time of arrival. For example, a single source file may include datetime values that are distant, and/or you may want to enforce retention or caching based on the datetime values rather than the time of ingestion.

When OverrideCreationTime is set to true, extents may be missed in the merge process. Extents are missed if their creation time is older than the Lookback period of the table’s Extents merge policy. To make sure that the extents are discoverable, set the Lookback property to HotCache.

Uniform range datetime partition example

The snippet shows a uniform datetime range partition key over a datetime typed column named timestamp. It uses datetime(2021-01-01) as its reference point, with a size of 7d for each partition, and doesn’t override the extents’ creation times.

{
  "ColumnName": "timestamp",
  "Kind": "UniformRange",
  "Properties": {
    "Reference": "2021-01-01T00:00:00",
    "RangeSize": "7.00:00:00",
    "OverrideCreationTime": false
  }
}

The policy object

By default, a table’s data partitioning policy is null, in which case data in the table won’t be repartitioned after it’s ingested.

The data partitioning policy has the following main properties:

  • PartitionKeys:

  • EffectiveDateTime:

    • The UTC datetime from which the policy is effective.
    • This property is optional. If it isn’t specified, the policy will take effect for data ingested after the policy was applied.

Data partitioning example

Data partitioning policy object with two partition keys.

  1. A hash partition key over a string-typed column named tenant_id.
    • It uses the XxHash64 hash function, with MaxPartitionCount set to the recommended value 128, and the default Seed of 1.
  2. A uniform datetime range partition key over a datetime type column named timestamp.
    • It uses datetime(2021-01-01) as its reference point, with a size of 7d for each partition.
{
  "PartitionKeys": [
    {
      "ColumnName": "tenant_id",
      "Kind": "Hash",
      "Properties": {
        "Function": "XxHash64",
        "MaxPartitionCount": 128,
        "Seed": 1,
        "PartitionAssignmentMode": "Uniform"
      }
    },
    {
      "ColumnName": "timestamp",
      "Kind": "UniformRange",
      "Properties": {
        "Reference": "2021-01-01T00:00:00",
        "RangeSize": "7.00:00:00",
        "OverrideCreationTime": false
      }
    }
  ]
}

Additional properties

The following properties can be defined as part of the policy. These properties are optional and we recommend not changing them.

PropertyDescriptionRecommended valueDefault value
MinRowCountPerOperationMinimum target for the sum of row count of the source extents of a single data partitioning operation.0
MaxRowCountPerOperationMaximum target for the sum of the row count of the source extents of a single data partitioning operation.Set a value lower than 5M if you see that the partitioning operations consume a large amount of memory or CPU per operation.0, with a default target of 5,000,000 records.
MaxOriginalSizePerOperationMaximum target for the sum of the original size (in bytes) of the source extents of a single data partitioning operation.If the partitioning operations consume a large amount of memory or CPU per operation, set a value lower than 5 GB.0, with a default target of 5,368,709,120 bytes (5 GB).

The data partitioning process

  • Data partitioning runs as a post-ingestion background process.
    • A table that is continuously ingested into is expected to always have a “tail” of data that is yet to be partitioned (nonhomogeneous extents).
  • Data partitioning runs only on hot extents, regardless of the value of the EffectiveDateTime property in the policy.
    • If partitioning cold extents is required, you need to temporarily adjust the caching policy.

You can monitor the partitioning status of tables with defined policies in a database by using the .show database extents partitioning statistics command and partitioning metrics.

Partitioning capacity

  • The data partitioning process results in the creation of more extents. The extents merge capacity may gradually increase, so that the process of merging extents can keep up.

  • If there’s a high ingestion throughput, or a large enough number of tables that have a partitioning policy defined, then the Extents partition capacity may gradually increase, so that the process of partitioning extents can keep up.

  • To avoid consuming too many resources, these dynamic increases are capped. You may be required to gradually and linearly increase them beyond the cap, if they’re used up entirely.

    • If increasing the capacities causes a significant increase in the use of the cluster’s resources, you can scale the cluster up/out, either manually, or by enabling autoscale.

Limitations

  • Attempts to partition data in a database that already has more than 5,000,000 extents will be throttled.
    • In such cases, the EffectiveDateTime property of partitioning policies of tables in the database will be automatically delayed by several hours, so that you can reevaluate your configuration and policies.

Outliers in partitioned columns

  • The following situations can contribute to imbalanced distribution of data across nodes, and degrade query performance:
    • If a hash partition key includes values that are much more prevalent than others, for example, an empty string, or a generic value (such as null or N/A).
    • The values represent an entity (such as tenant_id) that is more prevalent in the dataset.
  • If a uniform range datetime partition key has a large enough percentage of values that are “far” from the majority of the values in the column, the overhead of the data partitioning process is increased and may lead to many small extents to keep track of. An example of such a situation is datetime values from the distant past or future.

In both of these cases, either “fix” the data, or filter out any irrelevant records in the data before or at ingestion time, to reduce the overhead of the data partitioning. For example, use an update policy.

14 - Query acceleration policy

14.1 - Query acceleration policy (preview)

Learn how to use the query acceleration policy to accelerate queries over external delta tables.

An external table is a schema entity that references data stored external to a Kusto database. Queries run over external tables can be less performant than on data that is ingested due to various factors such as network calls to fetch data from storage, the absence of indexes, and more. Query acceleration allows specifying a policy on top of external delta tables. This policy defines a number of days to accelerate data for high-performance queries.

Query acceleration is supported in Azure Data Explorer over Azure Data Lake Store Gen2 or Azure blob storage external tables.

Query acceleration is supported in Eventhouse over OneLake, Azure Data Lake Store Gen2, or Azure blob storage external tables.

To enable query acceleration in the Fabric UI, see Query acceleration over OneLake shortcuts.

Limitations

  • The number of columns in the external table can’t exceed 900.
  • Delta tables with checkpoint V2 are not supported.
  • Query performance over accelerated external delta tables which have partitions may not be optimal during preview.
  • The feature assumes delta tables with static advanced features, for example column mapping doesn’t change, partitions don’t change, and so on. To change advanced features, first disable the policy, and once the change is made, re-enable the policy.
  • Schema changes on the delta table must also be followed with the respective .alter external delta table schema, which might result in acceleration starting from scratch if there was breaking schema change.
  • Index-based pruning isn’t supported for partitions.
  • Parquet files larger than 1 GB won’t be cached.
  • Query acceleration isn’t supported for external tables with impersonation authentication.

Known issues

  • Data in the external delta table that is optimized with the OPTIMIZE function will need to be reaccelearted.
  • If you run frequent MERGE/UPDATE/DELETE operations in delta, the underlying parquet files may be rewritten with changes and Kusto will skip accelerating such files, causing retrieval during query time.
  • The system assumes that all artifacts under the delta table directory have the same access level to the selected users. Different files having different access permissions under the delta table directory might result with unexpected behavior.

Commands for query acceleration

15 - Query week consistency policy

15.1 - Query weak consistency policy

Learn how to use the query weak consistency policy to configure the weak consistency service.

The query weak consistency policy is a cluster-level policy object that configures the weak consistency service.

Management commands

The policy object

The query weak consistency policy includes the following properties:

PropertyDescriptionValuesDefault
PercentageOfNodesThe percentage of nodes in the cluster that execute the query weak consistency service (the selected nodes will execute the weakly consistent queries).An integer between 1 to 100, or -1 for default value (which is currently 20%).-1
MinimumNumberOfNodesMinimum number of nodes that execute the query weak consistency service (will determine the number of nodes in case PercentageOfNodes*#NodesInCluster is smaller).A positive integer, or -1 for default value (which is currently 2). Smaller or equal to MaximumNumberOfNodes.-1
MaximumNumberOfNodesMaximum number of nodes that execute the query weak consistency service (will determine the number of nodes in case PercentageOfNodes*#NodesInCluster is greater).A positive integer, or -1 for default value (which is currently 30). Greater or equal to MinimumNumberOfNodes.-1
SuperSlackerNumberOfNodesThresholdIf the total number of nodes in the cluster exceeds this number, nodes that execute the weak consistency service will become ‘super slacker’, meaning they won’t have data on them (in order to reduce load). See Warning below.A positive integer that is greater than or equal to 4, or -1 for default value (currently no threshold - weak consistency nodes won’t become ‘super slacker’).-1
EnableMetadataPrefetchWhen set to true, database metadata will be pre-loaded when the cluster comes up, and reloaded every few minutes, on all weak consistency nodes. When set to false, database metadata load will be triggered by queries (on demand), so some queries might be delayed (until the database metadata is pulled from storage). Database metadata must be reloaded from storage to query the database, when its age is greater than MaximumLagAllowedInMinutes. See Warning and Important below.true or falsefalse
MaximumLagAllowedInMinutesThe maximum duration (in minutes) that weakly consistent metadata is allowed to lag behind. If metadata is older than this value, the most up-to-date metadata will be pulled from storage (when the database is queried, or periodically if EnableMetadataPrefech is enabled). See Warning below.An integer between 1 to 60, or -1 for default value (currently 5 minutes).-1
RefreshPeriodInSecondsThe refresh period (in seconds) to update a database metadata on each weak consistency node. See Warning below.An integer between 30 to 1800, or -1 for default value (currently 120 seconds).-1

Default policy

The default policy is:

{
  "PercentageOfNodes": -1,
  "MinimumNumberOfNodes": -1,
  "MaximumNumberOfNodes": -1,
  "SuperSlackerNumberOfNodesThreshold": -1,
  "EnableMetadataPrefetch": false,
  "MaximumLagAllowedInMinutes": -1,
  "RefreshPeriodInSeconds": -1
}

16 - Restricted view access

16.1 - Restricted view access policy

Learn how to use the restricted view access policy to limit the principals who can query specified tables in a database.

The restricted view access policy is an optional security feature that governs view permissions on a table. By default, the policy is disabled. When enabled, the policy adds an extra layer of permission requirements for principals to access and view the table.

For a table with an enabled restricted view access policy, only principals assigned the UnrestrictedViewer role have the necessary permissions to view the table. Even principals with roles like Table Admin or Database Admin are restricted unless granted the UnrestrictedViewer role.

While the restricted view access policy is specific to individual tables, the UnrestrictedViewer role operates at the database level. Thereby, a principal with the UnrestrictedViewer role has view permissions for all tables within the database. For more detailed information on managing table view access, see Manage view access to tables.

Limitations

17 - Retention policy

17.1 - Retention policy

Learn how to use the retention policy to control how data is removed.

The retention policy controls the mechanism that automatically removes data from tables or materialized views. It’s useful to remove data that continuously flows into a table, and whose relevance is age-based. For example, the policy can be used for a table that holds diagnostics events that may become uninteresting after two weeks.

The retention policy can be configured for a specific table or materialized view, or for an entire database. The policy then applies to all tables in the database that don’t override it. When the policy is configured both at the database and table level, the retention policy in the table takes precedence over the database policy.

Setting up a retention policy is important when continuously ingesting data, which will limit costs.

Data that is “outside” the retention policy is eligible for removal. There’s no specific guarantee when removal occurs. Data may “linger” even if the retention policy is triggered.

The retention policy is most commonly set to limit the age of the data since ingestion. For more information, see SoftDeletePeriod.

deleted before the limit is exceeded, but deletion isn’t immediate following that point.

The policy object

A retention policy includes the following properties:

  • SoftDeletePeriod:
    • Time span for which it’s guaranteed that the data is kept available to query. The period is measured starting from the time the data was ingested.
    • Defaults to 1,000 years.
    • When altering the soft-delete period of a table or database, the new value applies to both existing and new data.
  • Recoverability:
    • Data recoverability (Enabled/Disabled) after the data was deleted.
    • Defaults to Enabled.
    • If set to Enabled, the data will be recoverable for 14 days after it’s been soft-deleted.
    • It is not possible to configure the recoverability period.

Management commands

Defaults

By default, when a database or a table is created, it doesn’t have a retention policy defined. Normally, the database is created and then immediately has its retention policy set by its creator according to known requirements. When you run a .show command for the retention policy of a database or table that hasn’t had its policy set, Policy appears as null.

The default retention policy, with the default values mentioned above, can be applied using the following command.

.alter database DatabaseName policy retention "{}"
.alter table TableName policy retention "{}"
.alter materialized-view ViewName policy retention "{}"

The command results in the following policy object applied to the database or table.

{
  "SoftDeletePeriod": "365000.00:00:00", "Recoverability":"Enabled"
}

Clearing the retention policy of a database or table can be done using the following command.

.delete database DatabaseName policy retention
.delete table TableName policy retention

Examples

For an environment that has a database named MyDatabase, with tables MyTable1, MyTable2, and MySpecialTable.

Soft-delete period of seven days and recoverability disabled

Set all tables in the database to have a soft-delete period of seven days and disabled recoverability.

  • Option 1 (Recommended): Set a database-level retention policy, and verify there are no table-level policies set.

    .delete table MyTable1 policy retention        // optional, only if the table previously had its policy set
    .delete table MyTable2 policy retention        // optional, only if the table previously had its policy set
    .delete table MySpecialTable policy retention  // optional, only if the table previously had its policy set
    .alter-merge database MyDatabase policy retention softdelete = 7d recoverability = disabled
    .alter-merge materialized-view ViewName policy retention softdelete = 7d 
    
  • Option 2: For each table, set a table-level retention policy, with a soft-delete period of seven days and recoverability disabled.

    .alter-merge table MyTable1 policy retention softdelete = 7d recoverability = disabled
    .alter-merge table MyTable2 policy retention softdelete = 7d recoverability = disabled
    .alter-merge table MySpecialTable policy retention softdelete = 7d recoverability = disabled
    

Soft-delete period of seven days and recoverability enabled

  • Set tables MyTable1 and MyTable2 to have a soft-delete period of seven days and recoverability disabled.

  • Set MySpecialTable to have a soft-delete period of 14 days and recoverability enabled.

  • Option 1 (Recommended): Set a database-level retention policy, and set a table-level retention policy.

    .delete table MyTable1 policy retention   // optional, only if the table previously had its policy set
    .delete table MyTable2 policy retention   // optional, only if the table previously had its policy set
    .alter-merge database MyDatabase policy retention softdelete = 7d recoverability = disabled
    .alter-merge table MySpecialTable policy retention softdelete = 14d recoverability = enabled
    
  • Option 2: For each table, set a table-level retention policy, with the relevant soft-delete period and recoverability.

    .alter-merge table MyTable1 policy retention softdelete = 7d recoverability = disabled
    .alter-merge table MyTable2 policy retention softdelete = 7d recoverability = disabled
    .alter-merge table MySpecialTable policy retention softdelete = 14d recoverability = enabled
    

Soft-delete period of seven days, and MySpecialTable keeps its data indefinitely

Set tables MyTable1 and MyTable2 to have a soft-delete period of seven days, and have MySpecialTable keep its data indefinitely.

  • Option 1: Set a database-level retention policy, and set a table-level retention policy, with a soft-delete period of 1,000 years, the default retention policy, for MySpecialTable.

    .delete table MyTable1 policy retention   // optional, only if the table previously had its policy set
    .delete table MyTable2 policy retention   // optional, only if the table previously had its policy set
    .alter-merge database MyDatabase policy retention softdelete = 7d
    .alter table MySpecialTable policy retention "{}" // this sets the default retention policy
    
  • Option 2: For tables MyTable1 and MyTable2, set a table-level retention policy, and verify that the database-level and table-level policy for MySpecialTable aren’t set.

    .delete database MyDatabase policy retention   // optional, only if the database previously had its policy set
    .delete table MySpecialTable policy retention   // optional, only if the table previously had its policy set
    .alter-merge table MyTable1 policy retention softdelete = 7d
    .alter-merge table MyTable2 policy retention softdelete = 7d
    
  • Option 3: For tables MyTable1 and MyTable2, set a table-level retention policy. For table MySpecialTable, set a table-level retention policy with a soft-delete period of 1,000 years, the default retention policy.

    .alter-merge table MyTable1 policy retention softdelete = 7d
    .alter-merge table MyTable2 policy retention softdelete = 7d
    .alter table MySpecialTable policy retention "{}"
    

18 - Row level security policy

18.1 - Row level security policy

Learn how to use the Row Level Security policy to control access to rows in a database table.

Use group membership or execution context to control access to rows in a database table.

Row Level Security (RLS) simplifies the design and coding of security. It lets you apply restrictions on data row access in your application. For example, limit user access to rows relevant to their department, or restrict customer access to only the data relevant to their company.

The access restriction logic is located in the database tier, rather than away from the data in another application tier. The database system applies the access restrictions every time data access is attempted from any tier. This logic makes your security system more reliable and robust by reducing the surface area of your security system.

RLS lets you provide access to other applications and users, only to a certain portion of a table. For example, you might want to:

  • Grant access only to rows that meet some criteria
  • Anonymize data in some of the columns
  • All of the above

For more information, see management commands for managing the Row Level Security policy.

Limitations

  • There’s no limit on the number of tables on which Row Level Security policy can be configured.
  • Row Level Security policy cannot be configured on External Tables.
  • The RLS policy can’t be enabled on a table under the following circumstances:
  • The RLS query can’t reference other tables that have Row Level Security policy enabled.
  • The RLS query can’t reference tables located in other databases.

Examples

Limit access to Sales table

In a table named Sales, each row contains details about a sale. One of the columns contains the name of the salesperson. Instead of giving your salespeople access to all records in Sales, enable a Row Level Security policy on this table to only return records where the salesperson is the current user:

Sales | where SalesPersonAadUser == current_principal()

You can also mask the email address:

Sales | where SalesPersonAadUser == current_principal() | extend EmailAddress = "****"

If you want every sales person to see all the sales of a specific country/region, you can define a query similar to:

let UserToCountryMapping = datatable(User:string, Country:string)
[
  "john@domain.com", "USA",
  "anna@domain.com", "France"
];
Sales
| where Country in ((UserToCountryMapping | where User == current_principal_details()["UserPrincipalName"] | project Country))

If you have a group that contains the managers, you might want to give them access to all rows. Here’s the query for the Row Level Security policy.

let IsManager = current_principal_is_member_of('aadgroup=sales_managers@domain.com');
let AllData = Sales | where IsManager;
let PartialData = Sales | where not(IsManager) and (SalesPersonAadUser == current_principal()) | extend EmailAddress = "****";
union AllData, PartialData

Expose different data to members of different Microsoft Entra groups

If you have multiple Microsoft Entra groups, and you want the members of each group to see a different subset of data, use this structure for an RLS query.

Customers
| where (current_principal_is_member_of('aadgroup=group1@domain.com') and <filtering specific for group1>) or
        (current_principal_is_member_of('aadgroup=group2@domain.com') and <filtering specific for group2>) or
        (current_principal_is_member_of('aadgroup=group3@domain.com') and <filtering specific for group3>)

Apply the same RLS function on multiple tables

First, define a function that receives the table name as a string parameter, and references the table using the table() operator.

For example:

.create-or-alter function RLSForCustomersTables(TableName: string) {
    table(TableName)
    | ...
}

Then configure RLS on multiple tables this way:

.alter table Customers1 policy row_level_security enable "RLSForCustomersTables('Customers1')"
.alter table Customers2 policy row_level_security enable "RLSForCustomersTables('Customers2')"
.alter table Customers3 policy row_level_security enable "RLSForCustomersTables('Customers3')"

Produce an error upon unauthorized access

If you want nonauthorized table users to receive an error instead of returning an empty table, use the assert() function. The following example shows you how to produce this error in an RLS function:

.create-or-alter function RLSForCustomersTables() {
    MyTable
    | where assert(current_principal_is_member_of('aadgroup=mygroup@mycompany.com') == true, "You don't have access")
}

You can combine this approach with other examples. For example, you can display different results to users in different Microsoft Entra groups, and produce an error for everyone else.

Control permissions on follower databases

The RLS policy that you configure on the production database will also take effect in the follower databases. You can’t configure different RLS policies on the production and follower databases. However, you can use the current_cluster_endpoint() function in your RLS query to achieve the same effect, as having different RLS queries in follower tables.

For example:

.create-or-alter function RLSForCustomersTables() {
    let IsProductionCluster = current_cluster_endpoint() == "mycluster.eastus.kusto.windows.net";
    let DataForProductionCluster = TempTable | where IsProductionCluster;
    let DataForFollowerClusters = TempTable | where not(IsProductionCluster) | extend EmailAddress = "****";
    union DataForProductionCluster, DataForFollowerClusters
}

Control permissions on shortcut databases

The RLS policy that you configure on the production database will also take effect in the shortcut databases. You can’t configure different RLS policies on the production and shortcut databases. However, you can use the current_cluster_endpoint() function in your RLS query to achieve the same effect, as having different RLS queries in shortcut tables.

For example:

.create-or-alter function RLSForCustomersTables() {
    let IsProductionCluster = current_cluster_endpoint() == "mycluster.eastus.kusto.windows.net";
    let DataForProductionCluster = TempTable | where IsProductionCluster;
    let DataForFollowerClusters = TempTable | where not(IsProductionCluster) | extend EmailAddress = "****";
    union DataForProductionCluster, DataForFollowerClusters
}

More use cases

  • A call center support person may identify callers by several digits of their social security number. This number shouldn’t be fully exposed to the support person. An RLS policy can be applied on the table to mask all but the last four digits of the social security number in the result set of any query.
  • Set an RLS policy that masks personally identifiable information (PII), and enables developers to query production environments for troubleshooting purposes without violating compliance regulations.
  • A hospital can set an RLS policy that allows nurses to view data rows for their patients only.
  • A bank can set an RLS policy to restrict access to financial data rows based on an employee’s business division or role.
  • A multi-tenant application can store data from many tenants in a single tableset (which is efficient). They would use an RLS policy to enforce a logical separation of each tenant’s data rows from every other tenant’s rows, so each tenant can see only its data rows.

Performance impact on queries

When an RLS policy is enabled on a table, there will be some performance impact on queries that access that table. Access to the table will be replaced by the RLS query that’s defined on that table. The performance impact of an RLS query will normally consist of two parts:

  • Membership checks in Microsoft Entra ID: Checks are efficient. You can check membership in tens, or even hundreds of groups without major impact on the query performance.
  • Filters, joins, and other operations that are applied on the data: Impact depends on the complexity of the query

For example:

let IsRestrictedUser = current_principal_is_member_of('aadgroup=some_group@domain.com');
let AllData = MyTable | where not(IsRestrictedUser);
let PartialData = MyTable | where IsRestrictedUser and (...);
union AllData, PartialData

If the user isn’t part of some_group@domain.com, then IsRestrictedUser is evaluated to false. The query that is evaluated is similar to this one:

let AllData = MyTable;           // the condition evaluates to `true`, so the filter is dropped
let PartialData = <empty table>; // the condition evaluates to `false`, so the whole expression is replaced with an empty table
union AllData, PartialData       // this will just return AllData, as PartialData is empty

Similarly, if IsRestrictedUser evaluates to true, then only the query for PartialData will be evaluated.

Improve query performance when RLS is used

Performance impact on ingestion

There’s no performance impact on ingestion.

19 - Row order policy

19.1 - Row order policy

Learn how to use the row order policy to order rows in an extent.

The row order policy sets the preferred arrangement of rows within an extent. The policy is optional and set at the table level.

The main purpose of the policy is to improve the performance of queries that are narrowed to a small subset of values in ordered columns. Additionally, it may contribute to improvements in compression.

Use management commands to alter, alter-merge delete, or show the row order policy for a table.

When to set the policy

It’s appropriate to set the policy under the following conditions:

  • Most queries filter on specific values of a certain large-dimension column, such as an “application ID” or a “tenant ID”
  • The data ingested into the table is unlikely to be preordered according to this column

Performance considerations

There are no hardcoded limits set on the amount of columns, or sort keys, that can be defined as part of the policy. However, every additional column adds some overhead to the ingestion process, and as more columns are added, the effective return diminishes.

20 - Sandbox policy

20.1 - Sandbox policy

This article describes Sandbox policy.

Certain plugins run within sandboxes whose available resources are limited and controlled for security and for resource governance.

Sandboxes run on the nodes of your cluster. Some of their limitations are defined in sandbox policies, where each sandbox kind can have its own policy.

Sandbox policies are managed at cluster-level and affect all the nodes in the cluster.

Permissions

You must have AllDatabasesAdmin permissions to run this command.

The policy object

A sandbox policy has the following properties.

  • SandboxKind: Defines the type of the sandbox (such as, PythonExecution, RExecution).
  • IsEnabled: Defines if sandboxes of this type may run on the cluster’s nodes.
    • The default value is false.
  • InitializeOnStartup: Defines whether sandboxes of this type are initialized on startup, or lazily, upon first use.
    • The default value is false. To ensure consistent performance and avoid any delays for running queries following service restart, set this property to true.
  • TargetCountPerNode: Defines how many sandboxes of this type are allowed to run on the cluster’s nodes.
    • Values can be between one and twice the number of processors per node.
    • The default value is 16.
  • MaxCpuRatePerSandbox: Defines the maximum CPU rate as a percentage of all available cores that a single sandbox can use.
    • Values can be between 1 and 100.
    • The default value is 50.
  • MaxMemoryMbPerSandbox: Defines the maximum amount of memory (in megabytes) that a single sandbox can use.
    • For Hyper-V technology sandboxes, values can be between 200 and 32768 (32 GB). The default value is 1024 (1 GB). The maximum memory of all sandboxes on a node (TargetCountPerNode * MaxMemoryMbPerSandbox) is 32768 (32 GB).
    • For legacy sandboxes, values can be between 200 and 65536 (64 GB). The default value is 20480 (20 GB).

If a policy isn’t explicitly defined for a sandbox kind, an implicit policy with the default values and IsEnabled set to true applies.

Example

The following policy sets different limits for PythonExecution and RExecution sandboxes:

[
  {
    "SandboxKind": "PythonExecution",
    "IsEnabled": true,
    "InitializeOnStartup": false,
    "TargetCountPerNode": 4,
    "MaxCpuRatePerSandbox": 55,
    "MaxMemoryMbPerSandbox": 8192
  },
  {
    "SandboxKind": "RExecution",
    "IsEnabled": true,
    "InitializeOnStartup": false,
    "TargetCountPerNode": 2,
    "MaxCpuRatePerSandbox": 50,
    "MaxMemoryMbPerSandbox": 10240
  }
]

20.2 - Sandboxes

This article describes Sandboxes.

Kusto can run sandboxes for specific flows that must be run in a secure and isolated environment. Examples of these flows are user-defined scripts that run using the Python plugin or the R plugin.

Sandboxes are run locally (meaning, processing is done close to the data), with no extra latency for remote calls.

Prerequisites and limitations

  • Sandboxes must run on VM sizes supporting nested virtualization, which implemented using Hyper-V technology and have no limitations.
  • The image for running the sandboxes is deployed to every cluster node and requires dedicated SSD space to run.
    • The estimated size is between 10-20 GB.
    • This affects the cluster’s data capacity, and may affect the cost of the cluster.

Runtime

  • A sandboxed query operator may use one or more sandboxes for its execution.
    • A sandbox is only used for a single query and is disposed of once that query completes.
    • When a node is restarted, for example, as part of a service upgrade, all running sandboxes on it are disposed of.
  • Each node maintains a predefined number of sandboxes that are ready for running incoming requests.
    • Once a sandbox is used, a new one is automatically made available to replace it.
  • If there are no pre-allocated sandboxes available to serve a query operator, it will be throttled until new sandboxes are available. For more information, see Errors. New sandbox allocation could take up to 10-15 seconds per sandbox, depending on the SKU and available resources on the data node.

Sandbox parameters

Some of the parameters can be controlled using a cluster-level sandbox policy, for each kind of sandbox.

  • Number of sandboxes per node: The number of sandboxes per node is limited.
    • Requests that are made when there’s no available sandbox will be throttled.
  • Initialize on startup: if set to false (default), sandboxes are lazily initialized on a node, the first time a query requires a sandbox for its execution. Otherwise, if set to true, sandboxes are initialized as part of service startup.
    • This means that the first execution of a plugin that uses sandboxes on a node will include a short warm-up period.
  • CPU: The maximum rate of CPU a sandbox can consume of its host’s processors is limited (default is 50%).
    • When the limit is reached, the sandbox’s CPU use is throttled, but execution continues.
  • Memory: The maximum amount of RAM a sandbox can consume of its host’s RAM is limited.
    • Default memory for Hyper-V technology is 1 GB, and for legacy sandboxes 20 GB.
    • Reaching the limit results in termination of the sandbox, and a query execution error.

Sandbox limitations

  • Network: A sandbox can’t interact with any resource on the virtual machine (VM) or outside of it.
    • A sandbox can’t interact with another sandbox.

Errors

ErrorCodeStatusMessagePotential reason
E_SB_QUERY_THROTTLED_ERRORTooManyRequests (429)The sandboxed query was aborted because of throttling. Retrying after some backoff might succeedThere are no available sandboxes on the target node. New sandboxes should become available in a few seconds
E_SB_QUERY_THROTTLED_ERRORTooManyRequests (429)Sandboxes of kind ‘{kind}’ haven’t yet been initializedThe sandbox policy has recently changed. New sandboxes obeying the new policy will become available in a few seconds
InternalServiceError (520)The sandboxed query was aborted due to a failure in initializing sandboxesAn unexpected infrastructure failure.

VM Sizes supporting nested virtualization

The following table lists all modern VM sizes that support Hyper-V sandbox technology.

NameCategory
Standard_L8s_v3storage-optimized
Standard_L16s_v3storage-optimized
Standard_L8as_v3storage-optimized
Standard_L16as_v3storage-optimized
Standard_E8as_v5storage-optimized
Standard_E16as_v5storage-optimized
Standard_E8s_v4storage-optimized
Standard_E16s_v4storage-optimized
Standard_E8s_v5storage-optimized
Standard_E16s_v5storage-optimized
Standard_E2ads_v5compute-optimized
Standard_E4ads_v5compute-optimized
Standard_E8ads_v5compute-optimized
Standard_E16ads_v5compute-optimized
Standard_E2d_v4compute-optimized
Standard_E4d_v4compute-optimized
Standard_E8d_v4compute-optimized
Standard_E16d_v4compute-optimized
Standard_E2d_v5compute-optimized
Standard_E4d_v5compute-optimized
Standard_E8d_v5compute-optimized
Standard_E16d_v5compute-optimized
Standard_D32d_v4compute-optimized

21 - Sharding policy

21.1 - Data sharding policy

Learn how to use the data sharding policy to define if and how extents in the database are created.

The sharding policy defines if and how extents (data shards) in your cluster are created. You can only query data in an extent once it’s created.

The data sharding policy contains the following properties:

  • ShardEngineMaxRowCount:

    • Maximum row count for an extent created by an ingestion or rebuild operation.
    • Defaults to 1,048,576.
    • Not in effect for merge operations.
      • If you must limit the number of rows in extents created by merge operations, adjust the RowCountUpperBoundForMerge property in the entity’s extents merge policy.
  • ShardEngineMaxExtentSizeInMb:

    • Maximum allowed compressed data size (in megabytes) for an extent created by a merge or rebuild operation.
    • Defaults to 8,192 (8 GB).
  • ShardEngineMaxOriginalSizeInMb:

    • Maximum allowed original data size (in megabytes) for an extent created by a rebuild operation.
    • In effect only for rebuild operations.
    • Defaults to 3,072 (3 GB).

When a database is created, it contains the default data sharding policy. This policy is inherited by all tables created in the database (unless the policy is explicitly overridden at the table level).

Use the sharding policy management commands to manage data sharding policies for databases and tables.

22 - Streaming ingestion policy

22.1 - Streaming ingestion policy

Learn how to use the streaming ingestion policy to optimize operational processing of many tables where the stream of data is small.

Streaming ingestion target scenarios

Streaming ingestion should be used for the following scenarios:

  • Latency of less than a few seconds is required.
  • To optimize operational processing of many tables where the stream of data into each table is relatively small (a few records per second), but the overall data ingestion volume is high (thousands of records per second).

If the stream of data into each table is high (over 4 GB per hour), consider using queued ingestion.

Streaming ingestion policy definition

The streaming ingestion policy contains the following properties:

  • IsEnabled:
    • defines the status of streaming ingestion functionality for the table/database
    • mandatory, no default value, must explicitly be set to true or false
  • HintAllocatedRate:
    • if set provides a hint on the hourly volume of data in gigabytes expected for the table. This hint helps the system adjust the amount of resources that are allocated for a table in support of streaming ingestion.
    • default value null (unset)

To enable streaming ingestion on a table, define the streaming ingestion policy with IsEnabled set to true. This definition can be set on a table itself or on the database. Defining this policy at the database level applies the same settings to all existing and future tables in the database. If the streaming ingestion policy is set at both the table and database levels, the table level setting takes precedence. This setting means that streaming ingestion can be generally enabled for the database but specifically disabled for certain tables, or the other way around.

Set the data rate hint

The streaming ingestion policy can provide a hint about the hourly volume of data expected for the table. This hint helps the system adjust the amount of resources allocated for this table in support of streaming ingestion. Set the hint if the rate of streaming data ingresses into the table exceeds 1 Gb/hour. If setting HintAllocatedRate in the streaming ingestion policy for the database, set it by the table with the highest expected data rate. It isn’t recommended to set the effective hint for a table to a value much higher than the expected peak hourly data rate. This setting might have an adverse effect on the query performance.

23 - Update policy

23.1 - Common scenarios for using table update policies

Learn about common scenarios that can use table update policies to perform complex transformations and save the results to destination tables.

This section describes some well-known scenarios that use update policies. Consider adopting these scenarios when your circumstances are similar.

In this article, you learn about the following common scenarios:

Medallion architecture data enrichment

Update policies on tables provide an efficient way to apply rapid transformations and are compatible with the medallion lakehouse architecture in Fabric.

In the medallion architecture, when raw data lands in a landing table (bronze layer), an update policy can be used to apply initial transformations and save the enriched output to a silver layer table. This process can cascade, where the data from the silver layer table can trigger another update policy to further refine the data and hydrate a gold layer table.

The following diagram illustrates an example of a data enrichment update policy named Get_Values. The enriched data is output to a silver layer table, which includes a calculated timestamp value and lookup values based on the raw data.

Diagram showing the medallion architecture data enrichment scenario using update policies solution.

Data routing

A special case of data enrichment occurs when a raw data element contains data that must be routed to a different table based on one or more attributes of the data itself.

Consider an example that uses the same base data as the previous scenario, but this time there are three messages. The first message is a device telemetry message, the second message is a device alarm message, and the third message is an error.

To handle this scenario, three update policies are used. The Get_Telemetry update policy filters the device telemetry message, enriches the data, and saves it to the Device_Telemetry table. Similarly, the Get_Alarms update policy saves the data to the Device_Alarms table. Lastly, the Log_Error update policy sends unknown messages to the Error_Log table, allowing operators to detect malformed messages or unexpected schema evolution.

The following diagram depicts the example with the three update policies.

Diagram showing the data routing scenario using update policies solution.

Optimize data models

Update policies on tables are built for speed. Tables typically conform to star schema design, which supports the development of data models that are optimized for performance and usability.

Querying tables in a star schema often requires joining tables. However, table joins can lead to performance issues, especially when querying high volumes of data. To improve query performance, you can flatten the model by storing denormalized data at ingestion time.

Joining tables at ingestion time has the added benefit of operating on a small batch of data, resulting in a reduced computational cost of the join. This approach can massively improve the performance of downstream queries.

For example, you can enrich raw telemetry data from a device by looking up values from a dimension table. An update policy can perform the lookup at ingestion time and save the output to a denormalized table. Furthermore, you can extend the output with data sourced from a reference data table.

The following diagram depicts the example, which comprises an update policy named Enrich_Device_Data. It extends the output data with data sourced from the Site reference data table.

Diagram showing the optimized data models scenario using update policies solution.

23.2 - Run an update policy with a managed identity

This article describes how to use a managed identity for update policy.

The update policy must be configured with a managed identity in the following scenarios:

  • When the update policy query references tables in other databases
  • When the update policy query references tables with an enabled row level security policy

An update policy configured with a managed identity is performed on behalf of the managed identity.

In this article, you learn how to configure a system-assigned or user-assigned managed identity and create an update policy using that identity.

Prerequisites

Configure a managed identity

There are two types of managed identities:

  • System-assigned: A system-assigned identity is connected to your cluster and is removed when the cluster is removed. Only one system-assigned identity is allowed per cluster.

  • User-assigned: A user-assigned managed identity is a standalone Azure resource. Multiple user-assigned identities can be assigned to your cluster.

Select one of the following tabs to set up your preferred managed identity type.

User-assigned

  1. Follow the steps to Add a user-assigned identity.

  2. In the Azure portal, in the left menu of your managed identity resource, select Properties. Copy and save the Tenant Id and Principal ID for use in the following steps.

    Screenshot of Azure portal area with managed identity IDs.

  3. Run the following .alter-merge policy managed_identity command, replacing <objectId> with the managed identity Principal ID from the previous step. This command sets a managed identity policy on the cluster that allows the managed identity to be used with the update policy.

    .alter-merge cluster policy managed_identity ```[
        {
          "ObjectId": "<objectId>",
          "AllowedUsages": "AutomatedFlows"
        }
    ]```
    

    [!NOTE] To set the policy on a specific database, use database <DatabaseName> instead of cluster.

  4. Run the following command to grant the managed identity Database Viewer permissions over all databases referenced by the update policy query.

    .add database <DatabaseName> viewers ('aadapp=<objectId>;<tenantId>')
    

    Replace <DatabaseName> with the relevant database, <objectId> with the managed identity Principal ID from step 2, and <tenantId> with the Microsoft Entra ID Tenant Id from step 2.

System-assigned

  1. Follow the steps to Add a system-assigned identity.

  2. Copy and save the Object ID for use in a later step.

  3. Run the following .alter-merge policy managed_identity command. This command sets a managed identity policy on the cluster that allows the managed identity to be used with the update policy.

    .alter-merge cluster policy managed_identity ```[
        {
          "ObjectId": "system",
          "AllowedUsages": "AutomatedFlows"
        }
    ]```
    

    [!NOTE] To set the policy on a specific database, use database <DatabaseName> instead of cluster.

  4. Run the following command to grant the managed identity Database Viewer permissions over all databases referenced by the update policy query.

    .add database <DatabaseName> viewers ('aadapp=<objectId>')
    

    Replace <DatabaseName> with the relevant database and <objectId> with the managed identity Object ID you saved earlier.

Create an update policy

Select one of the following tabs to create an update policy that runs on behalf of a user-assigned or system-assigned managed identity.

User-assigned

Run the .alter table policy update command with the ManagedIdentity property set to the managed identity object ID.

For example, the following command alters the update policy of the table MyTable in the database MyDatabase. It’s important to note that both the Source and Query parameters should only reference objects within the same database where the update policy is defined. However, the code contained within the function specified in the Query parameter can interact with tables located in other databases. For example, the function MyUpdatePolicyFunction() can access OtherTable in OtherDatabase on behalf of a user-assigned managed identity. <objectId> should be a managed identity object ID.

.alter table MyDatabase.MyTable policy update
```
[
    {
        "IsEnabled": true,
        "Source": "MyTable",
        "Query": "MyUpdatePolicyFunction()",
        "IsTransactional": false,
        "PropagateIngestionProperties": false,
        "ManagedIdentity": "<objectId>"
    }
]
```

System-assigned

Run the .alter table policy update command with the ManagedIdentity property set to the managed identity object ID.

For example, the following command alters the update policy of the table MyTable in the database MyDatabase. It’s important to note that both the Source and Query parameters should only reference objects within the same database where the update policy is defined. However, the code contained within the function specified in the Query parameter can interact with tables located in other databases. For example, the function MyUpdatePolicyFunction() can access OtherTable in OtherDatabase on behalf of your system-assigned managed identity.

.alter table MyDatabase.MyTable policy update
```
[
    {
        "IsEnabled": true,
        "Source": "MyTable",
        "Query": "MyUpdatePolicyFunction()",
        "IsTransactional": false,
        "PropagateIngestionProperties": false,
        "ManagedIdentity": "system"
    }
]
```

23.3 - Update policy overview

Learn how to trigger an update policy to add data to a source table.

Update policies are automation mechanisms triggered when new data is written to a table. They eliminate the need for special orchestration by running a query to transform the ingested data and save the result to a destination table. Multiple update policies can be defined on a single table, allowing for different transformations and saving data to multiple tables simultaneously. The target tables can have a different schema, retention policy, and other policies from the source table.

For example, a high-rate trace source table can contain data formatted as a free-text column. The target table can include specific trace lines, with a well-structured schema generated from a transformation of the source table’s free-text data using the parse operator. For more information, common scenarios.

The following diagram depicts a high-level view of an update policy. It shows two update policies that are triggered when data is added to the second source table. Once they’re triggered, transformed data is added to the two target tables.

Diagram shows an overview of the update policy.

An update policy is subject to the same restrictions and best practices as regular ingestion. The policy scales-out according to the cluster size, and is more efficient when handling bulk ingestion. An update policy is subject to the same restrictions and best practices as regular ingestion. The policy scales-out according to the Eventhouse size, and is more efficient when handling bulk ingestion.

Ingesting formatted data improves performance, and CSV is preferred because of it’s a well-defined format. Sometimes, however, you have no control over the format of the data, or you want to enrich ingested data, for example, by joining records with a static dimension table in your database.

Update policy query

If the update policy is defined on the target table, multiple queries can run on data ingested into a source table. If there are multiple update policies, the order of execution isn’t necessarily known.

Query limitations

  • The policy-related query can invoke stored functions, but:
    • It can’t perform cross-cluster queries.
    • It can’t access external data or external tables.
    • It can’t make callouts (by using a plugin).
  • The query doesn’t have read access to tables that have the RestrictedViewAccess policy enabled.
  • For update policy limitations in streaming ingestion, see streaming ingestion limitations.
  • The policy-related query can invoke stored functions, but:
    • It can’t perform cross-eventhouse queries.
    • It can’t access external data or external tables.
    • It can’t make callouts (by using a plugin).
  • The query doesn’t have read access to tables that have the RestrictedViewAccess policy enabled.
  • By default, the Streaming ingestion policy is enabled for all tables in the Eventhouse. To use functions with the join operator in an update policy, the streaming ingestion policy must be disabled. Use the .alter table TableName policy streamingingestion PolicyObject command to disable it.

When referencing the Source table in the Query part of the policy, or in functions referenced by the Query part:

  • Don’t use the qualified name of the table. Instead, use TableName.
  • Don’t use database("<DatabaseName>").TableName or cluster("<ClusterName>").database("<DatabaseName>").TableName.
  • Don’t use the qualified name of the table. Instead, use TableName.
  • Don’t use database("<DatabaseName>").TableName or cluster("<EventhouseName>").database("<DatabaseName>").TableName.

The update policy object

A table can have zero or more update policy objects associated with it. Each such object is represented as a JSON property bag, with the following properties defined.

PropertyTypeDescription
IsEnabledboolStates if update policy is true - enabled, or false - disabled
SourcestringName of the table that triggers invocation of the update policy
QuerystringA query used to produce data for the update
IsTransactionalboolStates if the update policy is transactional or not, default is false. If the policy is transactional and the update policy fails, the source table isn’t updated.
PropagateIngestionPropertiesboolStates if properties specified during ingestion to the source table, such as extent tags and creation time, apply to the target table.
ManagedIdentitystringThe managed identity on behalf of which the update policy runs. The managed identity can be an object ID, or the system reserved word. The update policy must be configured with a managed identity when the query references tables in other databases or tables with an enabled row level security policy. For more information, see Use a managed identity to run a update policy.

Management commands

Update policy management commands include:

Update policy is initiated following ingestion

Update policies take effect when data is ingested or moved to a source table, or extents are created in a source table. These actions can be done using any of the following commands:

Remove data from source table

After ingesting data to the target table, you can optionally remove it from the source table. Set a soft-delete period of 0sec (or 00:00:00) in the source table’s retention policy, and the update policy as transactional. The following conditions apply:

  • The source data isn’t queryable from the source table
  • The source data doesn’t persist in durable storage as part of the ingestion operation
  • Operational performance improves. Post-ingestion resources are reduced for background grooming operations on extents in the source table.

Performance impact

Update policies can affect performance, and ingestion for data extents is multiplied by the number of target tables. It’s important to optimize the policy-related query. You can test an update policy’s performance impact by invoking the policy on already-existing extents, before creating or altering the policy, or on the function used with the query.

Evaluate resource usage

Use .show queries, to evaluate resource usage (CPU, memory, and so on) with the following parameters:

  • Set the Source property, the source table name, as MySourceTable
  • Set the Query property to call a function named MyFunction()
// '_extentId' is the ID of a recently created extent, that likely hasn't been merged yet.
let _extentId = toscalar(
    MySourceTable
    | project ExtentId = extent_id(), IngestionTime = ingestion_time()
    | where IngestionTime > ago(10m)
    | top 1 by IngestionTime desc
    | project ExtentId
);
// This scopes the source table to the single recent extent.
let MySourceTable =
    MySourceTable
    | where ingestion_time() > ago(10m) and extent_id() == _extentId;
// This invokes the function in the update policy (that internally references `MySourceTable`).
MyFunction

Transactional settings

The update policy IsTransactional setting defines whether the update policy is transactional and can affect the behavior of the policy update, as follows:

  • IsTransactional:false: If the value is set to the default value, false, the update policy doesn’t guarantee consistency between data in the source and target table. If an update policy fails, data is ingested only to the source table and not to the target table. In this scenario, ingestion operation is successful.
  • IsTransactional:true: If the value is set to true, the setting does guarantee consistency between data in the source and target tables. If an update policy fails, data isn’t ingested to the source or target table. In this scenario, the ingestion operation is unsuccessful.

Handling failures

When policy updates fail, they’re handled differently based on whether the IsTransactional setting is true or false. Common reasons for update policy failures are:

  • A mismatch between the query output schema and the target table.
  • Any query error.

You can view policy update failures using the .show ingestion failures command with the following command: In any other case, you can manually retry ingestion.

.show ingestion failures
| where FailedOn > ago(1hr) and OriginatesFromUpdatePolicy == true

Example of extract, transform, load

You can use update policy settings to perform extract, transform, load (ETL).

In this example, use an update policy with a simple function to perform ETL. First, we create two tables:

  • The source table - Contains a single string-typed column into which data is ingested.
  • The target table - Contains the desired schema. The update policy is defined on this table.
  1. Let’s create the source table:

    .create table MySourceTable (OriginalRecord:string)
    
  2. Next, create the target table:

    .create table MyTargetTable (Timestamp:datetime, ThreadId:int, ProcessId:int, TimeSinceStartup:timespan, Message:string)
    
  3. Then create a function to extract data:

    .create function
     with (docstring = 'Parses raw records into strongly-typed columns', folder = 'UpdatePolicyFunctions')
         ExtractMyLogs()
        {
        MySourceTable
        | parse OriginalRecord with "[" Timestamp:datetime "] [ThreadId:" ThreadId:int "] [ProcessId:" ProcessId:int "] TimeSinceStartup: " TimeSinceStartup:timespan " Message: " Message:string
        | project-away OriginalRecord
    }
    
  4. Now, set the update policy to invoke the function that we created:

    .alter table MyTargetTable policy update
    @'[{ "IsEnabled": true, "Source": "MySourceTable", "Query": "ExtractMyLogs()", "IsTransactional": true, "PropagateIngestionProperties": false}]'
    
  5. To empty the source table after data is ingested into the target table, define the retention policy on the source table to have 0s as its SoftDeletePeriod.

     .alter-merge table MySourceTable policy retention softdelete = 0s