This is the multi-page printable view of this section. Click here to print.
Advanced data management
1 - .clear cluster cache external-artifacts command
.clear cluster cache external-artifacts
command to clear cached external-artifacts of language plugins.Clears cached external-artifacts of language plugins.
This command is useful when you update external-artifact files stored in external storage, as the cache may retain the previous versions. In such scenarios, executing this command will clear the cache entries and ensure that subsequent queries run with the latest version of the artifacts.
Permissions
You must have at least Database Admin permissions to run this command.
Syntax
.clear
cluster
cache
external-artifacts
(
ArtifactURI [,
… ] )
Parameters
Name | Type | Required | Description |
---|---|---|---|
ArtifactURI | string | ✔️ | The URI for the external-artifact to clear from the cache. |
Returns
This command returns a table with the following columns:
Column | Type | Description |
---|---|---|
ExternalArtifactUri | string | The external artifact URI. |
State | string | The result of the clear operation on the external artifact. |
Example
.clear cluster cache external-artifacts ("https://kustoscriptsamples.blob.core.windows.net/samples/R/sample_script.r", "https://kustoscriptsamples.blob.core.windows.net/samples/python/sample_script.py")
ExternalArtifactUri | State |
---|---|
https://kustoscriptsamples.blob.core.windows.net/samples/R/sample_script.r | Cleared successfully on all nodes |
https://kustoscriptsamples.blob.core.windows.net/samples/python/sample_script.py | Cleared successfully on all nodes |
Related content
2 - Data purge
The data platform supports the ability to delete individual records, by using Kusto .purge
and related commands. You can also purge an entire table or purge records in a materialized view.
Purge guidelines
Carefully design your data schema and investigate relevant policies before storing personal data.
- In a best-case scenario, the retention period on this data is sufficiently short and data is automatically deleted.
- If retention period usage isn’t possible, isolate all data that is subject to privacy rules in a few tables. Optimally, use just one table and link to it from all other tables. This isolation allows you to run the data purge process on a few tables holding sensitive data, and avoid all other tables.
- The caller should make every attempt to batch the execution of
.purge
commands to 1-2 commands per table per day. Don’t issue multiple commands with unique user identity predicates. Instead, send a single command whose predicate includes all user identities that require purging.
Purge process
The process of selectively purging data happens in the following steps:
Phase 1: Give an input with a table name and a per-record predicate, indicating which records to delete. Kusto scans the table looking to identify data extents that would participate in the data purge. The extents identified are those having one or more records for which the predicate returns true.
Phase 2: (Soft Delete) Replace each data extent in the table (identified in step (1)) with a reingested version. The reingested version shouldn’t have the records for which the predicate returns true. If new data isn’t being ingested into the table, then by the end of this phase, queries will no longer return data for which the predicate returns true. The duration of the purge soft delete phase depends on the following parameters:
- The number of records that must be purged
- Record distribution across the data extents in the cluster
- The number of nodes in the cluster
- The spare capacity it has for purge operations
- Several other factors
The duration of phase 2 can vary between a few seconds to many hours.
Phase 3: (Hard Delete) Work back all storage artifacts that may have the “poison” data, and delete them from storage. This phase is done at least five days after the completion of the previous phase, but no longer than 30 days after the initial command. These timelines are set to follow data privacy requirements.
Issuing a .purge
command triggers this process, which takes a few days to complete. If the density of records for which the predicate applies is sufficiently large, the process will effectively reingest all the data in the table. This reingestion has a significant impact on performance and COGS (cost of goods sold).
Purge limitations and considerations
The purge process is final and irreversible. It isn’t possible to undo this process or recover data that has been purged. Commands such as undo table drop can’t recover purged data. Rollback of the data to a previous version can’t go to before the latest purge command.
Before running the purge, verify the predicate by running a query and checking that the results match the expected outcome. You can also use the two-step process that returns the expected number of records that will be purged.
The
.purge
command is executed against the Data Management endpoint:https://ingest-[YourClusterName].[region].kusto.windows.net
. The command requires database admin permissions on the relevant databases.Due to the purge process performance impact, and to guarantee that purge guidelines have been followed, the caller is expected to modify the data schema so that minimal tables include relevant data, and batch commands per table to reduce the significant COGS impact of the purge process.
The
predicate
parameter of the .purge command is used to specify which records to purge.Predicate
size is limited to 1 MB. When constructing thepredicate
:- Use the ‘in’ operator, for example,
where [ColumnName] in ('Id1', 'Id2', .. , 'Id1000')
. - Note the limits of the ‘in’ operator (list can contain up to
1,000,000
values). - If the query size is large, use
externaldata
operator, for examplewhere UserId in (externaldata(UserId:string) ["https://...blob.core.windows.net/path/to/file?..."])
. The file stores the list of IDs to purge. - The total query size, after expanding all
externaldata
blobs (total size of all blobs), can’t exceed 64 MB.
- Use the ‘in’ operator, for example,
Purge performance
Only one purge request can be executed on the cluster, at any given time. All other requests are queued in Scheduled
state.
Monitor the purge request queue size, and keep within adequate limits to match the requirements applicable for your data.
To reduce purge execution time:
Follow the purge guidelines to decrease the amount of purged data.
Adjust the caching policy since purge takes longer on cold data.
Scale out the cluster
Increase cluster purge capacity, after careful consideration, as detailed in Extents purge rebuild capacity.
Trigger the purge process
Purge table TableName records command
Purge command may be invoked in two ways for differing usage scenarios:
Programmatic invocation: A single step that is intended to be invoked by applications. Calling this command directly triggers purge execution sequence.
Syntax
// Connect to the Data Management service #connect "https://ingest-[YourClusterName].[region].kusto.windows.net" // To purge table records .purge table [TableName] records in database [DatabaseName] with (noregrets='true') <| [Predicate] // To purge materialized view records .purge materialized-view [MaterializedViewName] records in database [DatabaseName] with (noregrets='true') <| [Predicate]
Human invocation: A two-step process that requires an explicit confirmation as a separate step. First invocation of the command returns a verification token, which should be provided to run the actual purge. This sequence reduces the risk of inadvertently deleting incorrect data.
[!NOTE] The first step in the two-step invocation requires running a query on the entire dataset, to identify records to be purged. This query may time-out or fail on large tables, especially with significant amount of cold cache data. In case of failures, validate the predicate yourself and after verifying correctness use the single-step purge with the
noregrets
option.
Syntax
// Connect to the Data Management service - this command only works in Kusto.Explorer
#connect "https://ingest-[YourClusterName].[region].kusto.windows.net"
// Step #1 - retrieve a verification token (no records will be purged until step #2 is executed)
.purge table [TableName] records in database [DatabaseName] <| [Predicate]
// Step #2 - input the verification token to execute purge
.purge table [TableName] records in database [DatabaseName] with (verificationtoken=h'<verification token from step #1>') <| [Predicate]
To purge a materialized view, replace the table
keyword with materialized-view
, and replace TableName with the MaterializedViewName.
Parameters | Description |
---|---|
DatabaseName | Name of the database |
TableName / MaterializedViewName | Name of the table / materialized view to purge. |
Predicate | Identifies the records to purge. See purge predicate limitations. |
noregrets | If set, triggers a single-step activation. |
verificationtoken | In the two-step activation scenario (noregrets isn’t set), this token can be used to execute the second step and commit the action. If verificationtoken isn’t specified, it will trigger the command’s first step. Information about the purge will be returned with a token that should be passed back to the command to do step #2. |
Purge predicate limitations
- The predicate must be a simple selection (for example, where [ColumnName] == ‘X’ / where [ColumnName] in (‘X’, ‘Y’, ‘Z’) and [OtherColumn] == ‘A’).
- Multiple filters must be combined with an ‘and’, rather than separate
where
clauses (for example,where [ColumnName] == 'X' and OtherColumn] == 'Y'
and notwhere [ColumnName] == 'X' | where [OtherColumn] == 'Y'
). - The predicate can’t reference tables other than the table being purged (TableName). The predicate can only include the selection statement (
where
). It can’t project specific columns from the table (output schema when running ‘table
| Predicate’ must match table schema). - System functions (such as,
ingestion_time()
,extent_id()
) aren’t supported.
Example: Two-step purge
To start purge in a two-step activation scenario, run step #1 of the command:
// Connect to the Data Management service
#connect "https://ingest-[YourClusterName].[region].kusto.windows.net"
.purge table MyTable records in database MyDatabase <| where CustomerId in ('X', 'Y')
.purge materialized-view MyView records in database MyDatabase <| where CustomerId in ('X', 'Y')
Output
NumRecordsToPurge | EstimatedPurgeExecutionTime | VerificationToken |
---|---|---|
1,596 | 00:00:02 | e43c7184ed22f4f23c7a9d7b124d196be2e570096987e5baadf65057fa65736b |
Then, validate the NumRecordsToPurge before running step #2.
To complete a purge in a two-step activation scenario, use the verification token returned from step #1 to run step #2:
.purge table MyTable records in database MyDatabase
with(verificationtoken=h'e43c7....')
<| where CustomerId in ('X', 'Y')
.purge materialized-view MyView records in database MyDatabase
with(verificationtoken=h'e43c7....')
<| where CustomerId in ('X', 'Y')
Output
OperationId | DatabaseName | TableName | ScheduledTime | Duration | LastUpdatedOn | EngineOperationId | State | StateDetails | EngineStartTime | EngineDuration | Retries | ClientRequestId | Principal |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
c9651d74-3b80-4183-90bb-bbe9e42eadc4 | MyDatabase | MyTable | 2019-01-20 11:41:05.4391686 | 00:00:00.1406211 | 2019-01-20 11:41:05.4391686 | Scheduled | 0 | KE.RunCommand;1d0ad28b-f791-4f5a-a60f-0e32318367b7 | AAD app id=… |
Example: Single-step purge
To trigger a purge in a single-step activation scenario, run the following command:
// Connect to the Data Management service
#connect "https://ingest-[YourClusterName].[region].kusto.windows.net"
.purge table MyTable records in database MyDatabase with (noregrets='true') <| where CustomerId in ('X', 'Y')
.purge materialized-view MyView records in database MyDatabase with (noregrets='true') <| where CustomerId in ('X', 'Y')
Output
OperationId | DatabaseName | TableName | ScheduledTime | Duration | LastUpdatedOn | EngineOperationId | State | StateDetails | EngineStartTime | EngineDuration | Retries | ClientRequestId | Principal |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
c9651d74-3b80-4183-90bb-bbe9e42eadc4 | MyDatabase | MyTable | 2019-01-20 11:41:05.4391686 | 00:00:00.1406211 | 2019-01-20 11:41:05.4391686 | Scheduled | 0 | KE.RunCommand;1d0ad28b-f791-4f5a-a60f-0e32318367b7 | AAD app id=… |
Cancel purge operation command
If needed, you can cancel pending purge requests.
Syntax
// Cancel of a single purge operation
.cancel purge <OperationId>
// Cancel of all pending purge requests in a database
.cancel all purges in database <DatabaseName>
// Cancel of all pending purge requests, for all databases
.cancel all purges
Example: Cancel a single purge operation
.cancel purge aa894210-1c60-4657-9d21-adb2887993e1
Output
The output of this command is the same as the ‘show purges OperationId’ command output, showing the updated status of the purge operation being canceled.
If the attempt is successful, the operation state is updated to Canceled
. Otherwise, the operation state isn’t changed.
OperationId | DatabaseName | TableName | ScheduledTime | Duration | LastUpdatedOn | EngineOperationId | State | StateDetails | EngineStartTime | EngineDuration | Retries | ClientRequestId | Principal |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
c9651d74-3b80-4183-90bb-bbe9e42eadc4 | MyDatabase | MyTable | 2019-01-20 11:41:05.4391686 | 00:00:00.1406211 | 2019-01-20 11:41:05.4391686 | Canceled | 0 | KE.RunCommand;1d0ad28b-f791-4f5a-a60f-0e32318367b7 | AAD app id=… |
Example: Cancel all pending purge operations in a database
.cancel all purges in database MyDatabase
Output
The output of this command is the same as the show purges command output, showing all operations in the database with their updated status.
Operations that were canceled successfully will have their status updated to Canceled
. Otherwise, the operation state isn’t changed.
OperationId | DatabaseName | TableName | ScheduledTime | Duration | LastUpdatedOn | EngineOperationId | State | StateDetails | EngineStartTime | EngineDuration | Retries | ClientRequestId | Principal |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5a34169e-8730-49f5-9694-7fde3a7a0139 | MyDatabase | MyTable | 2021-03-03 05:07:29.7050198 | 00:00:00.2971331 | 2021-03-03 05:07:30.0021529 | Canceled | 0 | KE.RunCommand;1d0ad28b-f791-4f5a-a60f-0e32318367b7 | AAD app id=… | ||||
2fa7c04c-6364-4ce1-a5e5-1ab921f518f5 | MyDatabase | MyTable | 2021-03-03 05:05:03.5035478 | 00:00:00.1406211 | 2021-03-03 05:05:03.6441689 | InProgress | 0 | KE.RunCommand;1d0ad28b-f791-4f5a-a60f-0e32318367b7 | AAD app id=… |
Track purge operation status
Status = ‘Completed’ indicates successful completion of the first phase of the purge operation, that is records are soft-deleted and are no longer available for querying. Customers aren’t expected to track and verify the second phase (hard-delete) completion. This phase is monitored internally.
Show purges command
Show purges
command shows purge operation status by specifying the operation ID within the requested time period.
.show purges <OperationId>
.show purges [in database <DatabaseName>]
.show purges from '<StartDate>' [in database <DatabaseName>]
.show purges from '<StartDate>' to '<EndDate>' [in database <DatabaseName>]
Properties | Description | Mandatory/Optional |
---|---|---|
OperationId | The Data Management operation ID outputted after executing single phase or second phase. | Mandatory |
StartDate | Lower time limit for filtering operations. If omitted, defaults to 24 hours before current time. | Optional |
EndDate | Upper time limit for filtering operations. If omitted, defaults to current time. | Optional |
DatabaseName | Database name to filter results. | Optional |
Examples
.show purges
.show purges c9651d74-3b80-4183-90bb-bbe9e42eadc4
.show purges from '2018-01-30 12:00'
.show purges from '2018-01-30 12:00' to '2018-02-25 12:00'
.show purges from '2018-01-30 12:00' to '2018-02-25 12:00' in database MyDatabase
Output
OperationId | DatabaseName | TableName | ScheduledTime | Duration | LastUpdatedOn | EngineOperationId | State | StateDetails | EngineStartTime | EngineDuration | Retries | ClientRequestId | Principal |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
c9651d74-3b80-4183-90bb-bbe9e42eadc4 | MyDatabase | MyTable | 2019-01-20 11:41:05.4391686 | 00:00:33.6782130 | 2019-01-20 11:42:34.6169153 | a0825d4d-6b0f-47f3-a499-54ac5681ab78 | Completed | Purge completed successfully (storage artifacts pending deletion) | 2019-01-20 11:41:34.6486506 | 00:00:04.4687310 | 0 | KE.RunCommand;1d0ad28b-f791-4f5a-a60f-0e32318367b7 | AAD app id=… |
OperationId
- the DM operation ID returned when executing purge.DatabaseName
** - database name (case sensitive).TableName
- table name (case sensitive).ScheduledTime
- time of executing purge command to the DM service.Duration
- total duration of the purge operation, including the execution DM queue wait time.EngineOperationId
- the operation ID of the actual purge executing in the engine.State
- purge state, can be one of the following values:Scheduled
- purge operation is scheduled for execution. If job remains Scheduled, there’s probably a backlog of purge operations. See purge performance to clear this backlog. If a purge operation fails on a transient error, it will be retried by the DM and set to Scheduled again (so you may see an operation transition from Scheduled to InProgress and back to Scheduled).InProgress
- the purge operation is in-progress in the engine.Completed
- purge completed successfully.BadInput
- purge failed on bad input and won’t be retried. This failure may be due to various issues such as a syntax error in the predicate, an illegal predicate for purge commands, a query that exceeds limits (for example, over 1M entities in anexternaldata
operator or over 64 MB of total expanded query size), and 404 or 403 errors forexternaldata
blobs.Failed
- purge failed and won’t be retried. This failure may happen if the operation was waiting in the queue for too long (over 14 days), due to a backlog of other purge operations or a number of failures that exceed the retry limit. The latter will raise an internal monitoring alert and will be investigated by the team.
StateDetails
- a description of the State.EngineStartTime
- the time the command was issued to the engine. If there’s a large difference between this time and ScheduledTime, there’s usually a significant backlog of purge operations and the cluster isn’t keeping up with the pace.EngineDuration
- time of actual purge execution in the engine. If purge was retried several times, it’s the sum of all the execution durations.Retries
- number of times the operation was retried by the DM service due to a transient error.ClientRequestId
- client activity ID of the DM purge request.Principal
- identity of the purge command issuer.
Purging an entire table
Purging a table includes dropping the table, and marking it as purged so that the hard delete process described in Purge process runs on it.
Dropping a table without purging it doesn’t delete all its storage artifacts. These artifacts are deleted according to the hard retention policy initially set on the table.
The purge table allrecords
command is quick and efficient and is preferable to the purge records process, if applicable for your scenario.
Purge table TableName allrecords command
Similar to ‘.purge table records ’ command, this command can be invoked in a programmatic (single-step) or in a manual (two-step) mode.
Programmatic invocation (single-step):
Syntax
// Connect to the Data Management service #connect "https://ingest-[YourClusterName].[Region].kusto.windows.net" .purge table [TableName] in database [DatabaseName] allrecords with (noregrets='true')
Human invocation (two-steps):
Syntax
// Connect to the Data Management service #connect "https://ingest-[YourClusterName].[Region].kusto.windows.net" // Step #1 - retrieve a verification token (the table will not be purged until step #2 is executed) .purge table [TableName] in database [DatabaseName] allrecords // Step #2 - input the verification token to execute purge .purge table [TableName] in database [DatabaseName] allrecords with (verificationtoken=h'<verification token from step #1>')
Parameters Description DatabaseName
Name of the database. TableName
Name of the table. noregrets
If set, triggers a single-step activation. verificationtoken
In two-step activation scenario ( noregrets
isn’t set), this token can be used to execute the second step and commit the action. Ifverificationtoken
isn’t specified, it will trigger the command’s first step. In this step, a token is returned to pass back to the command and do step #2.
Example: Two-step purge
To start purge in a two-step activation scenario, run step #1 of the command:
// Connect to the Data Management service #connect "https://ingest-[YourClusterName].[Region].kusto.windows.net" .purge table MyTable in database MyDatabase allrecords
Output
VerificationToken
e43c7184ed22f4f23c7a9d7b124d196be2e570096987e5baadf65057fa65736b To complete a purge in a two-step activation scenario, use the verification token returned from step #1 to run step #2:
.purge table MyTable in database MyDatabase allrecords with (verificationtoken=h'eyJT.....')
The output is the same as the ‘.show tables’ command output (returned without the purged table).
Output
TableName DatabaseName Folder DocString OtherTable MyDatabase — —
Example: Single-step purge
To trigger a purge in a single-step activation scenario, run the following command:
// Connect to the Data Management service
#connect "https://ingest-[YourClusterName].[Region].kusto.windows.net"
.purge table MyTable in database MyDatabase allrecords with (noregrets='true')
The output is the same as the ‘.show tables’ command output (returned without the purged table).
Output
TableName | DatabaseName | Folder | DocString |
---|---|---|---|
OtherTable | MyDatabase | — | — |
Related content
3 - Delete data
Delete data from a table is supported in several ways. Use the following information to help you choose which deletion method is best for your use case.
Use case | Considerations | Method |
---|---|---|
Delete all data from a table. | Use the .clear table data command | |
Routinely delete old data. | Use if you need an automated deletion solution. | Use a retention policy |
Bulk delete specific data by extents. | Only use if you’re an expert user. | Use the .drop extents command |
Delete records based on their content. | - Storage artifacts that contain the deleted records aren’t necessarily deleted. - Deleted records can’t be recovered (regardless of any retention or recoverability settings). - Use if you need a quick way to delete records. | Use soft delete |
Delete records based on their content. | - Storage artifacts that contain the deleted records are deleted. - Deleted records can’t be recovered (regardless of any retention or recoverability settings). - Requires significant system resources and time to complete. | Use purge |
Use case | Considerations | Method |
---|---|---|
Delete all data from a table. | Use the .clear table data command | |
Routinely delete old data. | Use if you need an automated deletion solution. | Use a retention policy |
Bulk delete specific data by extents. | Only use if you’re an expert user. | Use the .drop extents command |
Delete records based on their content. | - Storage artifacts that contain the deleted records aren’t necessarily deleted. - Deleted records can’t be recovered (regardless of any retention or recoverability settings). - Use if you need a quick way to delete records. | Use soft delete |
The following sections describe the different deletion methods.
Delete all data in a table
To delete all data in a table, use the .clear table data command. This command is the most efficient way to remove all data from a table.
Syntax:
.clear table <TableName> data
Delete data using a retention policy
Automatically delete data based on a retention policy. You can set the retention policy at the database or table level. There’s no guarantee as to when the deletion occurs, but it will not be deleted before the retention period. This is an efficient and convenient way to remove old data.
Consider a database or table that is set for 90 days of retention. If only 60 days of data are needed, delete the older data as follows:
.alter-merge database <DatabaseName> policy retention softdelete = 60d
.alter-merge table <TableName> policy retention softdelete = 60d
Delete data by dropping extents
Extent (data shard) is the internal structure where data is stored. Each extent can hold up to millions of records. Extents can be deleted individually or as a group using drop extent(s) commands.
Examples
You can delete all rows in a table or just a specific extent.
Delete all rows in a table:
.drop extents from TestTable
Delete a specific extent:
.drop extent e9fac0d2-b6d5-4ce3-bdb4-dea052d13b42
Delete individual rows
Both purge and soft delete can be used for deleting individual rows. Soft delete doesn’t necessarily delete the storage artifacts that contain records to delete, and purge does delete all such storage artifacts.
Both methods prevent deleted records from being recovered, regardless of any retention or recoverability settings. The deletion process is final and irreversible.
Soft delete
With soft delete, data isn’t necessarily deleted from storage artifacts. This method marks all matching records as deleted, so that they’ll be filtered out in queries, and doesn’t require significant system resources.
Purge
With purge, extents that have one or more records to be deleted, are replaced with new extents in which those records don’t exist. This deletion process isn’t immediate, requires significant system resources, and can take a whole day to complete.
Soft delete can be used for deleting individual rows. Data isn’t necessarily deleted from storage artifacts. Soft delete prevent deleted records from being recovered, regardless of any retention or recoverability settings. The deletion process is final and irreversible. This method marks all matching records as deleted, so that they’ll be filtered out in queries, and doesn’t require significant system resources.
4 - Follower commands
Management commands for managing your follower configuration. These commands run synchronously but are applied on the next periodic schema refresh, which may result in a short delay until the new configuration is applied.
The follower commands include database level commands and table level commands.
Permissions
You must have at least Database Admin permissions to run this command.
Database policy overrides
A leader database can override the following database-level policies in the follower cluster: Caching policy and Authorized principals.
Caching policy
The default caching policy for the follower cluster uses the leader cluster database and table-level caching policies.
Option | Description |
---|---|
None | The caching policies used are those policies defined in the source database in the leader cluster. |
replace | The source database in the leader cluster database and table-level caching policies are removed (set to null ). These policies are replaced by the database and table-level override policies, if defined. |
union(default) | The source database in the leader cluster database and table-level caching policies are combined with the policies defined in the database and table-level override policies. |
Authorized principals
Option | Description |
---|---|
None | The authorized principals are defined in the source database of the leader cluster. |
replace | The override authorized principals replace the authorized principals from the source database in the leader cluster. |
union(default) | The override authorized principals are combined with the authorized principals from the source database in the leader cluster. |
Table and materialized views policy overrides
By default, tables and materialized views in a database that is being followed by a follower cluster keep the source entity’s caching policy.
However, table and materialized view caching policies can be overridden in the follower cluster.
Use the replace
option to override the source entity’s caching policy.
Database level commands
.show follower database
Shows a database (or databases) followed from other leader cluster, which have one or more database-level overrides configured.
Syntax
.show
follower
database
DatabaseName
.show
follower
databases
(
DatabaseName1,
…,
DatabaseNameN)
Output
Output parameter | Type | Description |
---|---|---|
DatabaseName | string | The name of the database being followed. |
LeaderClusterMetadataPath | string | The path to the leader cluster’s metadata container. |
CachingPolicyOverride | string | An override caching policy for the database, serialized as JSON, or null. |
AuthorizedPrincipalsOverride | string | An override collection of authorized principals for the database, serialized as JSON, or null. |
AuthorizedPrincipalsModificationKind | string | The modification kind to apply using AuthorizedPrincipalsOverride (none , union , or replace ). |
CachingPoliciesModificationKind | string | The modification kind to apply using database or table-level caching policy overrides (none , union , or replace ). |
IsAutoPrefetchEnabled | bool | Whether new data is pre-fetched upon each schema refresh. |
TableMetadataOverrides | string | If defined, A JSON serialization of table-level property overrides. |
.alter follower database policy caching
Alters a follower database caching policy, to override the one set on the source database in the leader cluster.
Notes
- The default
modification kind
for caching policies isunion
. To change themodification kind
, use the.alter follower database caching-policies-modification-kind
command. - Viewing the policy or effective policies after the change can be done using the
.show
commands: - Viewing the override settings on the follower database after the change is made can be done using
.show follower database
Syntax
.alter
follower
database
DatabaseName policy
caching
hot
=
HotDataSpan
Example
.alter follower database MyDb policy caching hot = 7d
.delete follower database policy caching
Deletes a follower database override caching policy. This deletion causes the policy set on the source database in the leader cluster the effective one.
Notes
- Viewing the policy or effective policies after the change can be done using the
.show
commands: - Viewing the override settings on the follower database after the change can be done using
.show follower database
Syntax
.delete
follower
database
DatabaseName policy
caching
Example
.delete follower database MyDB policy caching
.add follower database principals
Adds authorized principal(s) to the follower database collection of override authorized principals. Notes
- The default
modification kind
for such authorized principals isnone
. To change themodification kind
use alter follower database principals-modification-kind. - Viewing the effective collection of principals after the change can be done using the
.show
commands: - Viewing the override settings on the follower database after the change can be done using
.show follower database
Syntax
.add
follower
database
DatabaseName (admins
| users
| viewers
| monitors
) Role (
principal1,
…,
principalN)
['
notes'
]
Example
.add follower database MyDB viewers ('aadgroup=mygroup@microsoft.com') 'My Group'
.drop follower database principals
Drops authorized principal(s) from the follower database collection of override authorized principals.
Syntax
.drop
follower
database
DatabaseName
(admins
| users
| viewers
| monitors
) (
principal1,
…,
principalN)
Example
.drop follower database MyDB viewers ('aadgroup=mygroup@microsoft.com')
.alter follower database principals-modification-kind
Alters the follower database authorized principals modification kind.
Syntax
.alter
follower
database
DatabaseName
principals-modification-kind
= (none
| union
| replace
)
Example
.alter follower database MyDB principals-modification-kind = union
.alter follower database caching-policies-modification-kind
Alters the caching policies modification kind for the follower database, table, and materialized views.
Syntax
.alter
follower
database
DatabaseName caching-policies-modification-kind
= (none
| union
| replace
)
Example
.alter follower database MyDB caching-policies-modification-kind = union
.alter follower database prefetch-extents
The follower cluster can wait for new data to be fetched from the underlying storage to the nodes’ SSD (cache) before making this data queryable.
The following command alters the follower database configuration of pre-fetching new extents upon each schema refresh.
Syntax
.alter
follower
database
DatabaseName prefetch-extents
= (true
| false
)
Example
.alter follower database MyDB prefetch-extents = false
Tables and materialized views commands
Alter follower table or materialized view caching policy
Alters a table’s or a materialized view’s caching policy on the follower database, to override the policy set on the source database in the leader cluster.
Syntax
.alter
follower
database
DatabaseName table TableName policy
caching
hot
=
HotDataSpan
.alter
follower
database
DatabaseName tables (
TableName1,
…,
TableNameN)
policy
caching
hot
=
HotDataSpan
.alter
follower
database
DatabaseName materialized-view ViewName policy
caching
hot
=
HotDataSpan
.alter
follower
database
DatabaseName materialized-views (
ViewName1,
…,
ViewNameN)
policy
caching
hot
=
HotDataSpan
Examples
.alter follower database MyDb tables (Table1, Table2) policy caching hot = 7d
.alter follower database MyDb materialized-views (View1, View2) policy caching hot = 7d
Delete follower table or materialized view caching policy
Deletes an override for a table’s or a materialized-view’s caching policy on the follower database. The policy set on the source database in the leader cluster will now be the effective policy.
Syntax
.delete
follower
database
DatabaseName table
TableName policy
caching
.delete
follower
database
DatabaseName tables
(
TableName1,
…,
TableNameN)
policy
caching
.delete
follower
database
DatabaseName materialized-view
ViewName policy
caching
.delete
follower
database
DatabaseName materialized-views
(
ViewName1,
…,
ViewNameN)
policy
caching
Example
.delete follower database MyDB tables (Table1, Table2) policy caching
.delete follower database MyDB materialized-views (View1, View2) policy caching
Sample configuration
The following are sample steps to configure a follower database.
In this example:
Our follower cluster,
MyFollowerCluster
will be following databaseMyDatabase
from the leader cluster,MyLeaderCluster
.MyDatabase
hasN
tables:MyTable1
,MyTable2
,MyTable3
, …MyTableN
(N
> 3).- On
MyLeaderCluster
:
MyTable1
caching policyMyTable2
caching policyMyTable3
…MyTableN
caching policyMyDatabase
Authorized principalshot data span = 7d
hot data span = 30d
hot data span = 365d
Viewers = aadgroup=scubadivers@contoso.com
; Admins =aaduser=jack@contoso.com
- On
MyFollowerCluster
we want:
MyTable1
caching policyMyTable2
caching policyMyTable3
…MyTableN
caching policyMyDatabase
Authorized principalshot data span = 1d
hot data span = 3d
hot data span = 0d
(nothing is cached)Admins = aaduser=jack@contoso.com
, Viewers =aaduser=jill@contoso.com
Steps to execute
Prerequisite: Set up cluster MyFollowerCluster
to follow database MyDatabase
from cluster MyLeaderCluster
.
Show the current configuration
See the current configuration according to which MyDatabase
is being followed on MyFollowerCluster
:
.show follower database MyDatabase
| evaluate narrow() // just for presentation purposes
Column | Value |
---|---|
DatabaseName | MyDatabase |
LeaderClusterMetadataPath | https://storageaccountname.blob.core.windows.net/cluster |
CachingPolicyOverride | null |
AuthorizedPrincipalsOverride | [] |
AuthorizedPrincipalsModificationKind | None |
IsAutoPrefetchEnabled | False |
TableMetadataOverrides | |
CachingPoliciesModificationKind | Union |
Override authorized principals
Replace the collection of authorized principals for MyDatabase
on MyFollowerCluster
with a collection that includes only one Microsoft Entra user as the database admin, and one Microsoft Entra user as a database viewer:
.add follower database MyDatabase admins ('aaduser=jack@contoso.com')
.add follower database MyDatabase viewers ('aaduser=jill@contoso.com')
.alter follower database MyDatabase principals-modification-kind = replace
Only those two specific principals are authorized to access MyDatabase
on MyFollowerCluster
.show database MyDatabase principals
Role | PrincipalType | PrincipalDisplayName | PrincipalObjectId | PrincipalFQN | Notes |
---|---|---|---|---|---|
Database MyDatabase Admin | Microsoft Entra user | Jack Kusto (upn: jack@contoso.com) | 12345678-abcd-efef-1234-350bf486087b | aaduser=87654321-abcd-efef-1234-350bf486087b;55555555-4444-3333-2222-2d7cd011db47 | |
Database MyDatabase Viewer | Microsoft Entra user | Jill Kusto (upn: jack@contoso.com) | abcdefab-abcd-efef-1234-350bf486087b | aaduser=54321789-abcd-efef-1234-350bf486087b;55555555-4444-3333-2222-2d7cd011db47 |
.show follower database MyDatabase
| mv-expand parse_json(AuthorizedPrincipalsOverride)
| project AuthorizedPrincipalsOverride.Principal.FullyQualifiedName
AuthorizedPrincipalsOverride_Principal_FullyQualifiedName |
---|
aaduser=87654321-abcd-efef-1234-350bf486087b;55555555-4444-3333-2222-2d7cd011db47 |
aaduser=54321789-abcd-efef-1234-350bf486087b;55555555-4444-3333-2222-2d7cd011db47 |
Override Caching policies
Replace the collection of database and table-level caching policies for MyDatabase
on MyFollowerCluster
by setting all tables to not have their data cached, excluding two specific tables - MyTable1
, MyTable2
- that will have their data cached for periods of 1d
and 3d
, respectively:
.alter follower database MyDatabase policy caching hot = 0d
.alter follower database MyDatabase table MyTable1 policy caching hot = 1d
.alter follower database MyDatabase table MyTable2 policy caching hot = 3d
.alter follower database MyDatabase caching-policies-modification-kind = replace
Only those two specific tables have data cached, and the rest of the tables have a hot data period of 0d
:
.show tables details
| summarize TableNames = make_list(TableName) by CachingPolicy
CachingPolicy | TableNames |
---|---|
{“DataHotSpan”:{“Value”:“1.00:00:00”},“IndexHotSpan”:{“Value”:“1.00:00:00”}} | [“MyTable1”] |
{“DataHotSpan”:{“Value”:“3.00:00:00”},“IndexHotSpan”:{“Value”:“3.00:00:00”}} | [“MyTable2”] |
{“DataHotSpan”:{“Value”:“0.00:00:00”},“IndexHotSpan”:{“Value”:“0.00:00:00”}} | [“MyTable3”,…,“MyTableN”] |
.show follower database MyDatabase
| mv-expand parse_json(TableMetadataOverrides)
| project TableMetadataOverrides
TableMetadataOverrides |
---|
{“MyTable1”:{“CachingPolicyOverride”:{“DataHotSpan”:{“Value”:“1.00:00:00”},“IndexHotSpan”:{“Value”:“1.00:00:00”}}}} |
{“MyTable2”:{“CachingPolicyOverride”:{“DataHotSpan”:{“Value”:“3.00:00:00”},“IndexHotSpan”:{“Value”:“3.00:00:00”}}}} |
Summary
See the current configuration where MyDatabase
is being followed on MyFollowerCluster
:
.show follower database MyDatabase
| evaluate narrow() // just for presentation purposes
Column | Value |
---|---|
DatabaseName | MyDatabase |
LeaderClusterMetadataPath | https://storageaccountname.blob.core.windows.net/cluster |
CachingPolicyOverride | {“DataHotSpan”:{“Value”:“00:00:00”},“IndexHotSpan”:{“Value”:“00:00:00”}} |
AuthorizedPrincipalsOverride | [{“Principal”:{“FullyQualifiedName”:“aaduser=87654321-abcd-efef-1234-350bf486087b”,…},{“Principal”:{“FullyQualifiedName”:“aaduser=54321789-abcd-efef-1234-350bf486087b”,…}] |
AuthorizedPrincipalsModificationKind | Replace |
IsAutoPrefetchEnabled | False |
TableMetadataOverrides | {“MyTargetTable”:{“CachingPolicyOverride”:{“DataHotSpan”:{“Value”:“3.00:00:00”}…},“MySourceTable”:{“CachingPolicyOverride”:{“DataHotSpan”:{“Value”:“1.00:00:00”},…}}} |
CachingPoliciesModificationKind | Replace |
5 - Data soft delete
5.1 - Data soft delete
The ability to delete individual records is supported. Record deletion is commonly achieved using one of the following methods:
- To delete records with a system guarantee that the storage artifacts containing these records are deleted as well, use
.purge
- To delete records without such a guarantee, use
.delete
as described in this article - this command marks records as deleted but doesn’t necessarily delete the data from storage artifacts. This deletion method is faster than purge.
For information on how to use the command, see Syntax
Use cases
This deletion method should only be used for the unplanned deletion of individual records. For example, if you discover that an IoT device is reporting corrupt telemetry for some time, you should consider using this method to delete the corrupt data.
If you need to frequently delete records for deduplication or updates, we recommend using materialized views. See choose between materialized views and soft delete for data deduplication.
Deletion process
The soft delete process is performed using the following steps:
- Run predicate query: The table is scanned to identify data extents that contain records to be deleted. The extents identified are those with one or more records returned by the predicate query.
- Extents replacement: The identified extents are replaced with new extents that point to the original data blobs, and also have a new hidden column of type
bool
that indicates per record whether it was deleted or not. Once completed, if no new data is ingested, the predicate query won’t return any records if run again.
Limitations and considerations
The deletion process is final and irreversible. It isn’t possible to undo this process or recover data that has been deleted, even though the storage artifacts aren’t necessarily deleted following the operation.
Soft delete is supported for native tables and materialized views. It isn’t supported for external tables.
Before running soft delete, verify the predicate by running a query and checking that the results match the expected outcome. You can also run the command in
whatif
mode, which returns the number of records that are expected to be deleted.Don’t run multiple parallel soft delete operations on the same table, as this may result in failures of some or all the commands. However, it’s possible to run multiple parallel soft delete operations on different tables.
Don’t run soft delete and purge commands on the same table in parallel. First wait for one command to complete and only then run the other command.
Soft delete is executed against your cluster URI:
https://[YourClusterName].[region].kusto.windows.net
. The command requires database admin permissions on the relevant database.Deleting records from a table that is a source table of a materialized view, can have an impact on the materialized view. If records being deleted were not yet processed by the materialization cycle, these records will be missing in the view, since they will never be processed. Similarly, the deletion will not have an impact on the materialized view if the records have already been processed.
Limitations on the predicate:
- It must contain at least one
where
operator. - It can only reference the table from which records are to be deleted.
- Only the following operators are allowed:
extend
,order
,project
,take
andwhere
. Withintoscalar()
, thesummarize
operator is also allowed.
- It must contain at least one
Deletion performance
The main considerations that can impact the deletion process performance are:
- Run predicate query: The performance of this step is very similar to the performance of the predicate itself. It might be slightly faster or slower depending on the predicate, but the difference is expected to be insignificant.
- Extents replacement: The performance of this step depends on the following:
- Record distribution across the data extents in the cluster
- The number of nodes in the cluster
Unlike .purge
, the .delete
command doesn’t reingest the data. It just marks records that are returned by the predicate query as deleted and is therefore much faster.
Query performance after deletion
Query performance isn’t expected to noticeably change following the deletion of records.
Performance degradation isn’t expected because the filter that is automatically added on all queries that filter out records that were deleted is efficient.
However, query performance is also not guaranteed to improve. While performance improvement may happen for some types of queries, it may not happen for some others. In order to improve query performance, extents in which most of the records are deleted are periodically compacted by replacing them with new extents that only contain the records that haven’t been deleted.
Impact on COGS (cost of goods sold)
In most cases, the deletion of records won’t result in a change of COGS.
- There will be no decrease, because no records are actually deleted. Records are only marked as deleted using a hidden column of type
bool
, the size of which is negligible. - In most cases, there will be no increase because the
.delete
operation doesn’t require the provisioning of extra resources. - In some cases, extents in which the majority of the records are deleted are periodically compacted by replacing them with new extents that only contain the records that haven’t been deleted. This causes the deletion of the old storage artifacts that contain a large number of deleted records. The new extents are smaller and therefore consume less space in both the Storage account and in the hot cache. However, in most cases, the effect of this on COGS is negligible.
5.2 - Data soft delete command
To soft delete individual records without a system guarantee that the storage artifacts containing these records are deleted as well, use the following command. This command marks records as deleted but doesn’t necessarily delete the data from storage artifacts. For more information, see Soft delete.
To delete individual records with a system guarantee that the storage artifacts containing these records are deleted as well, see Data purge.
Syntax
.delete
[async
] table
TableName records
[with
(
propertyName =
propertyValue [,
…])
] <|
Predicate
Parameters
Name | Type | Required | Description |
---|---|---|---|
async | string | If specified, indicates that the command runs in asynchronous mode. | |
TableName | string | ✔️ | The name of the table from which to delete records. |
propertyName, propertyValue | string | A comma-separated list of key-value property pairs. See supported properties. | |
Predicate | string | ✔️ | The predicate that returns records to delete, which is specified as a query. See note. |
Supported properties
Name | Type | Description |
---|---|---|
whatif | bool | If true , returns the number of records that will be deleted in every shard, without actually deleting any records. The default is false . |
Returns
The output of the command contains information about which extents were replaced.
Example: delete records of a given user
To delete all the records that contain data of a given user:
.delete table MyTable records <| MyTable | where UserId == 'X'
Example: check how many records would be deleted from a table
To determine the number of records that would be deleted by the operation without actually deleting them, check the value in the RecordsMatchPredicate column when running the command in whatif
mode:
.delete table MyTable records with (whatif=true) <| MyTable | where UserId == 'X'
.delete materialized-view records - soft delete command
When soft delete is executed on materialized views, the same concepts and limitations apply.
Syntax - materialized views
.delete
[async
] materialized-view
MaterializedViewName records
[with
(
propertyName =
propertyValue [,
…])
] <|
Predicate
Parameters - materialized views
Name | Type | Required | Description |
---|---|---|---|
async | string | If specified, indicates that the command runs in asynchronous mode. | |
MaterializedViewName | string | ✔️ | The name of the materialized view from which to delete records. |
propertyName, propertyValue | string | A comma-separated list of key-value property pairs. See supported properties. | |
Predicate | string | ✔️ | The predicate that returns records to delete. Specified as a query. |
Supported properties - materialized views
Name | Type | Description |
---|---|---|
whatif | bool | If true , returns the number of records that will be deleted in every shard, without actually deleting any records. The default is false . |
Example - materialized views
To delete all the materialized view records that contain data of a given user:
.delete materialized-view MyMaterializedView records <| MyMaterializedView | where UserId == 'X'
Example: check how many records would be deleted from a materialized view
To determine the number of records that would be deleted by the operation without actually deleting them, check the value in the RecordsMatchPredicate column while running the command in whatif
mode:
.delete materialized-view MyMaterializedView records with (whatif=true) <| MyMaterializedView | where UserId == 'X'
Related content
6 - Extents (data shards)
6.1 - Extent tags
An extent tag is a string that describes properties common to all data in an extent. For example, during data ingestion, you can append an extent tag to signify the source of the ingested data. Then, you can use this tag for analysis.
Extents can hold multiple tags as part of their metadata. When extents merge, their tags also merge, ensuring consistent metadata representation.
To see the tags associated with an extent, use the .show extents command. For a granular view of tags associated with records within an extent, use the extent-tags() function.
drop-by
extent tags
Tags that start with a drop-by:
prefix can be used to control which other extents to merge with. Extents that have the same set of drop-by:
tags can be merged together, but they won’t be merged with other extents if they have a different set of drop-by:
tags.
Examples
Determine which extents can be merged together
If:
- Extent 1 has the following tags:
drop-by:blue
,drop-by:red
,green
. - Extent 2 has the following tags:
drop-by:red
,yellow
. - Extent 3 has the following tags:
purple
,drop-by:red
,drop-by:blue
.
Then:
- Extents 1 and 2 won’t be merged together, as they have a different set of
drop-by
tags. - Extents 2 and 3 won’t be merged together, as they have a different set of
drop-by
tags. - Extents 1 and 3 can be merged together, as they have the same set of
drop-by
tags.
Use drop-by
tags as part of extent-level operations
The following query issues a command to drop extents according to their drop-by:
tag.
.ingest ... with @'{"tags":"[\"drop-by:2016-02-17\"]"}'
.drop extents <| .show table MyTable extents where tags has "drop-by:2016-02-17"
ingest-by
extent tags
Tags with the prefix ingest-by:
can be used together with the ingestIfNotExists
property to ensure that data is ingested only once.
The ingestIfNotExists
property prevents duplicate ingestion by checking if an extent with the specified ingest-by:
tag already exists. Typically, an ingest command contains an ingest-by:
tag and the ingestIfNotExists
property with the same value.
Examples
Add a tag on ingestion
The following command ingests the data and adds the tag ingest-by:2016-02-17
.
.ingest ... with (tags = '["ingest-by:2016-02-17"]')
Prevent duplicate ingestion
The following command ingests the data so long as no extent in the table has the ingest-by:2016-02-17
tag.
.ingest ... with (ingestIfNotExists = '["2016-02-17"]')
Prevent duplicate ingestion and add a tag to any new data
The following command ingests the data so long as no extent in the table has the ingest-by:2016-02-17
tag. Any newly ingested data gets the ingest-by:2016-02-17
tag.
.ingest ... with (ingestIfNotExists = '["2016-02-17"]', tags = '["ingest-by:2016-02-17"]')
Limitations
- Extent tags can only be applied to records within an extent. Consequently, tags can’t be set on streaming ingestion data before it is stored in extents.
- Extent tags can’t be stored on data in external tables or materialized views.
Related content
6.2 - Extents (data shards)
Tables are partitioned into extents, or data shards. Each extent is a horizontal segment of the table that contains data and metadata such as its creation time and optional tags. The union of all these extents contains the entire dataset of the table. Extents are evenly distributed across nodes in the cluster, and they’re cached in both local SSD and memory for optimized performance.
Extents are immutable, meaning they can be queried, reassigned to a different node, or dropped out of the table but never modified. Data modification happens by creating new extents and transactionally swapping old extents with the new ones. The immutability of extents provides benefits such as increased robustness and easy reversion to previous snapshots.
Extents hold a collection of records that are physically arranged in columns, enabling efficient encoding and compression of the data. To maintain query efficiency, smaller extents are merged into larger extents according to the configured merge policy and sharding policy. Merging extents reduces management overhead and leads to index optimization and improved compression.
The common extent lifecycle is as follows:
- The extent is created by an ingestion operation.
- The extent is merged with other extents.
- The merged extent (possibly one that tracks its lineage to other extents) is eventually dropped because of a retention policy.
Extent creation time
Two datetime values are tracked per extent: MinCreatedOn
and MaxCreatedOn
. These values are initially the same but may change when the extent is merged with other extents. When the extent is merged with other extents, the new values are according to the original minimum and maximum values of the merged extents.
The creation time of an extent is used for the following purposes:
- Retention: Extents created earlier are dropped earlier.
- Caching: Extents created recently are kept in hot cache.
- Sampling: Recent extents are preferred when using query operations such as take.
To overwrite the creation time of an extent, provide an alternate creationTime
in the data ingestion properties. This can be useful for retention purposes, such as if you want to reingest data but don’t want it to appear as if it arrived late.