This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Best practices for KQL queries

1 - Best practices for Kusto Query Language queries

This article describes Query best practices.

Here are several best practices to follow to make your query run faster.

In short

ActionUseDon’t useNotes
Reduce the amount of data being queriedUse mechanisms such as the where operator to reduce the amount of data being processed.For more information on efficient ways to reduce the amount of data being processed, see Reduce the amount of data being processed.
Avoid using redundant qualified referencesWhen referencing local entities, use the unqualified name.For more information, see Avoid using redundant qualified references.
datetime columnsUse the datetime data type.Don’t use the long data type.In queries, don’t use Unix time conversion functions, such as unixtime_milliseconds_todatetime(). Instead, use update policies to convert Unix time to the datetime data type during ingestion.
String operatorsUse the has operator.Don’t use containsWhen looking for full tokens, has works better, since it doesn’t look for substrings.
Case-sensitive operatorsUse ==.Don’t use =~.Use case-sensitive operators when possible.
Use in.Don’t use in~.
Use contains_cs.Don’t use contains.Using has/has_cs is preferred to contains/contains_cs.
Searching textLook in a specific column.Don’t use *.* does a full text search across all columns.
Extract fields from dynamic objects across millions of rowsMaterialize your column at ingestion time if most of your queries extract fields from dynamic objects across millions of rows.With this method you only pay once for column extraction.
Lookup for rare keys/values in dynamic objectsUse `MyTablewhere DynamicColumn has “Rare value”where DynamicColumn.SomeKey == “Rare value”`.
let statement with a value that you use more than onceUse the materialize() function.For more information on how to use materialize(), see materialize(). For more information, see Optimize queries that use named expressions.
Apply type conversions on more than one billion recordsReshape your query to reduce the amount of data fed into the conversion.Don’t convert large amounts of data if it can be avoided.
New queriesUse limit [small number] or count at the end.Running unbound queries over unknown datasets can yield a return of gigabytes of results, resulting in a slow response and a busy environment.
Case-insensitive comparisonsUse Col =~ "lowercasestring".Don’t use tolower(Col) == "lowercasestring".
Compare data already in lowercase (or uppercase)Col == "lowercasestring" (or Col == "UPPERCASESTRING").Avoid using case insensitive comparisons.
Filtering on columnsFilter on a table column.Don’t filter on a calculated column.
Use `Twhere predicate(Expression)`Don’t use `T
summarize operatorUse the hint.shufflekey=<key> when the group by keys of the summarize operator have high cardinality.High cardinality is ideally more than one million.
join operatorSelect the table with the fewest rows as the first one (left-most in query).
Use in instead of left semi join for filtering by a single column.
Join across clustersRun the query on the “right” side of the join across remote environments, such as clusters or Eventhouses, where most of the data is located.
Join when left side is small and right side is largeUse hint.strategy=broadcast.Small refers to up to 100 megabytes (MB) of data.
Join when right side is small and left side is largeUse the lookup operator instead of the join operatorIf the right side of the lookup is larger than several tens of MB, the query fails.
Join when both sides are too largeUse hint.shufflekey=<key>.Use when the join key has high cardinality.
Extract values on column with strings sharing the same format or patternUse the parse operator.Don’t use several extract() statements.For example, values like "Time = <time>, ResourceId = <resourceId>, Duration = <duration>, ....".
extract() functionUse when parsed strings don’t all follow the same format or pattern.Extract the required values by using a REGEX.
materialize() functionPush all possible operators that reduce the materialized dataset and still keep the semantics of the query.For example, filters, or project only required columns. For more information, see Optimize queries that use named expressions.
Use materialized viewsUse materialized views for storing commonly used aggregations. Prefer using the materialized_view() function to query materialized part only.materialized_view('MV')

Reduce the amount of data being processed

A query’s performance depends directly on the amount of data it needs to process. The less data is processed, the quicker the query (and the fewer resources it consumes). Therefore, the most important best-practice is to structure the query in such a way that reduces the amount of data being processed.

In order of importance:

  • Only reference tables whose data is needed by the query. For example, when using the union operator with wildcard table references, it’s better from a performance point-of-view to only reference a handful of tables, instead of using a wildcard (*) to reference all tables and then filter data out using a predicate on the source table name.

  • Take advantage of a table’s data scope if the query is relevant only for a specific scope. The table() function provides an efficient way to eliminate data by scoping it according to the caching policy (the DataScope parameter).

  • Apply the where query operator immediately following table references.

  • When using the where query operator, the order in which you place the predicates, whether you use a single where operator, or multiple consecutive where operators, can have a significant effect on the query performance.

  • Apply predicates that act upon datetime table columns first. Kusto includes an efficient index on such columns, often completely eliminating whole data shards without needing to access those shards.

  • Then apply predicates that act upon string and dynamic columns, especially such predicates that apply at the term-level. Order the predicates by the selectivity. For example, searching for a user ID when there are millions of users is highly selective and usually involves a term search, for which the index is very efficient.

  • Then apply predicates that are selective and are based on numeric columns.

  • Last, for queries that scan a table column’s data (for example, for predicates such as contains "@!@!", that have no terms and don’t benefit from indexing), order the predicates such that the ones that scan columns with less data are first. Doing so reduces the need to decompress and scan large columns.

Avoid using redundant qualified references

Reference entities such as tables and materialized views by name.

For example, the table T can be referenced as simply T (the unqualified name), or by using a database qualifier (for example, database("DB").T when the table is in a database called DB), or by using a fully qualified name (for example, cluster("<serviceURL>").database("DB").T).

For example, the table T can be referenced as simply T (the unqualified name), or by using a database qualifier (for example, database("DB").T when the table is in a database called DB), or by using a fully qualified name (for example, cluster("X.Y.kusto.windows.net").database("DB").T).

It’s a best practice to avoid using name qualifications when they’re redundant, for the following reasons:

  1. Unqualified names are easier to identify (for a human reader) as belonging to the database-in-scope.

  2. Referencing database-in-scope entities is always at least as fast, and in some cases much faster, then entities that belong to other databases. This is especially true when those databases are in a different cluster. This is especially true when those databases are in a different Eventhouse. Avoiding qualified names helps the reader to do the right thing.

2 - Named expressions

Learn how to optimally use named expressions.

This article discusses how to optimize repeat use of named expressions in a query.

In Kusto Query Language, you can bind names to complex expressions in several different ways:

When you reference these named expressions in a query, the following steps occur:

  1. The calculation within the named expression is evaluated. This calculation produces either a scalar or tabular value.
  2. The named expression is replaced with the calculated value.

If the same bound name is used multiple times, then the underlying calculation will be repeated multiple times. When is this a concern?

  • When the calculations consume many resources and are used many times.
  • When the calculation is non-deterministic, but the query assumes all invocations to return the same value.

Mitigation

To mitigate these concerns, you can materialize the calculation results in memory during the query. Depending on the way the named calculation is defined, you’ll use different materialization strategies:

Tabular functions

Use the following strategies for tabular functions:

  • let statements and function parameters: Use the materialize() function.
  • as operator: Set the hint.materialized hint value to true.

For example, the following query uses the non-deterministic tabular sample operator:

Behavior without using the materialize function

range x from 1 to 100 step 1
| sample 1
| as T
| union T

Output

x
63
92

Behavior using the materialize function

range x from 1 to 100 step 1
| sample 1
| as hint.materialized=true T
| union T

Output

x
95
95

Scalar functions

Non-deterministic scalar functions can be forced to calculate exactly once by using toscalar().

For example, the following query uses the non-deterministic function, rand():

let x = () {rand(1000)};
let y = () {toscalar(rand(1000))};
print x, x, y, y

Output

print_0print_1print_2print_3
1661377070