This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Tabular operators

1 - Join operator

1.1 - join flavors

1.1.1 - fullouter join

Learn how to use the fullouter join flavor to merge the rows of two tables.

A fullouter join combines the effect of applying both left and right outer-joins. For columns of the table that lack a matching row, the result set contains null values. For those records that do match, a single row is produced in the result set containing fields populated from both tables.

Diagram that shows how the join works.

Syntax

LeftTable | join kind=fullouter [ Hints ] RightTable on Conditions

Returns

Schema: All columns from both tables, including the matching keys.
Rows: All records from both tables with unmatched cells populated with null.

Example

This example query combines rows from both tables X and Y, filling in missing values with NULL where there’s no match in the other table. This allows you to see all possible combinations of keys from both tables.

let X = datatable(Key:string, Value1:long)
[
    'a',1,
    'b',2,
    'b',3,
    'c',4
];
let Y = datatable(Key:string, Value2:long)
[
    'b',10,
    'c',20,
    'c',30,
    'd',40
];
X | join kind=fullouter Y on Key

Output

KeyValue1Key1Value2
b3b10
b2b10
c4c20
c4c30
d40
a1

1.1.2 - inner join

Learn how to use the inner join flavor to merge the rows of two tables.

The inner join flavor is like the standard inner join from the SQL world. An output record is produced whenever a record on the left side has the same join key as the record on the right side.

Diagram that shows how the join works.

Syntax

LeftTable | join kind=inner [ Hints ] RightTable on Conditions

Returns

Schema: All columns from both tables, including the matching keys.
Rows: Only matching rows from both tables.

Example

The example query combines rows from tables X and Y where the keys match, showing only the rows that exist in both tables.

let X = datatable(Key:string, Value1:long)
[
    'a',1,
    'b',2,
    'b',3,
    'k',5,
    'c',4
];
let Y = datatable(Key:string, Value2:long)
[
    'b',10,
    'c',20,
    'c',30,
    'd',40,
    'k',50
];
X | join kind=inner Y on Key

Output

KeyValue1Key1Value2
b3b10
b2b10
c4c20
c4c30
k5k50

1.1.3 - innerunique join

Learn how to use the innerunique join flavor to merge the rows of two tables.

The innerunique join flavor removes duplicate keys from the left side. This behavior ensures that the output contains a row for every combination of unique left and right keys.

By default, the innerunique join flavor is used if the kind parameter isn’t specified. This default implementation is useful in log/trace analysis scenarios, where you aim to correlate two events based on a shared correlation ID. It allows you to retrieve all instances of the phenomenon while disregarding duplicate trace records that contribute to the correlation.

Diagram that shows how the join works.

Syntax

LeftTable | join kind=innerunique [ Hints ] RightTable on Conditions

Returns

Schema: All columns from both tables, including the matching keys.
Rows: All deduplicated rows from the left table that match rows from the right table.

Examples

Review the examples and run them in your Data Explorer query page.

Use the default innerunique join

The example query combines rows from tables X and Y where the keys match, showing only the rows that exist in both tables

let X = datatable(Key:string, Value1:long)
[
    'a',1,
    'b',2,
    'b',3,
    'c',4
];
let Y = datatable(Key:string, Value2:long)
[
    'b',10,
    'c',20,
    'c',30,
    'd',40
];
X | join Y on Key

Output

KeyValue1Key1Value2
b2b10
c4c20
c4c30

The query executed the default join, which is an inner join after deduplicating the left side based on the join key. The deduplication keeps only the first record. The resulting left side of the join after deduplication is:

KeyValue1
a1
b2
c4

Two possible outputs from innerunique join

let t1 = datatable(key: long, value: string)  
    [
    1, "val1.1",  
    1, "val1.2"  
];
let t2 = datatable(key: long, value: string)  
    [  
    1, "val1.3",
    1, "val1.4"  
];
t1
| join kind = innerunique
    t2
    on key

Output

keyvaluekey1value1
1val1.11val1.3
1val1.11val1.4
let t1 = datatable(key: long, value: string)  
    [
    1, "val1.1",  
    1, "val1.2"  
];
let t2 = datatable(key: long, value: string)  
    [  
    1, "val1.3", 
    1, "val1.4"  
];
t1
| join kind = innerunique
    t2
    on key

Output

keyvaluekey1value1
1val1.21val1.3
1val1.21val1.4
  • Kusto is optimized to push filters that come after the join, towards the appropriate join side, left or right, when possible.
  • Sometimes, the flavor used is innerunique and the filter is propagated to the left side of the join. The flavor is automatically propagated and the keys that apply to that filter appear in the output.
  • Use the previous example and add a filter where value == "val1.2" . It gives the second result and will never give the first result for the datasets:
let t1 = datatable(key: long, value: string)  
    [
    1, "val1.1",  
    1, "val1.2"  
];
let t2 = datatable(key: long, value: string)  
    [  
    1, "val1.3", 
    1, "val1.4"  
];
t1
| join kind = innerunique
    t2
    on key
| where value == "val1.2"

Output

keyvaluekey1value1
1val1.21val1.3
1val1.21val1.4

Get extended sign-in activities

Get extended activities from a login that some entries mark as the start and end of an activity.

let Events = MyLogTable | where type=="Event" ;
Events
| where Name == "Start"
| project Name, City, ActivityId, StartTime=timestamp
| join (Events
    | where Name == "Stop"
        | project StopTime=timestamp, ActivityId)
    on ActivityId
| project City, ActivityId, StartTime, StopTime, Duration = StopTime - StartTime
let Events = MyLogTable | where type=="Event" ;
Events
| where Name == "Start"
| project Name, City, ActivityIdLeft = ActivityId, StartTime=timestamp
| join (Events
        | where Name == "Stop"
        | project StopTime=timestamp, ActivityIdRight = ActivityId)
    on $left.ActivityIdLeft == $right.ActivityIdRight
| project City, ActivityId, StartTime, StopTime, Duration = StopTime - StartTime

1.1.4 - leftanti join

Learn how to use the leftanti join flavor to merge the rows of two tables.

The leftanti join flavor returns all records from the left side that don’t match any record from the right side. The anti join models the “NOT IN” query.

Diagram that shows how the join works.

Syntax

LeftTable | join kind=leftanti [ Hints ] RightTable on Conditions

Returns

Schema: All columns from the left table.
Rows: All records from the left table that don’t match records from the right table.

Example

The example query combines rows from tables X and Y where there is no match in Y for the keys in X, effectively filtering out any rows in X that have corresponding rows in Y.

let X = datatable(Key:string, Value1:long)
[
    'a',1,
    'b',2,
    'b',3,
    'c',4
];
let Y = datatable(Key:string, Value2:long)
[
    'b',10,
    'c',20,
    'c',30,
    'd',40
];
X | join kind=leftanti Y on Key

Output

KeyValue1
a1

1.1.5 - leftouter join

Learn how to use the leftouter join flavor to merge the rows of two tables.

The leftouter join flavor returns all the records from the left side table and only matching records from the right side table.

Diagram that shows how the join works.

Syntax

LeftTable | join kind=leftouter [ Hints ] RightTable on Conditions

Returns

Schema: All columns from both tables, including the matching keys.
Rows: All records from the left table and only matching rows from the right table.

Example

The result of a left outer join for tables X and Y always contains all records of the left table (X), even if the join condition doesn’t find any matching record in the right table (Y).

let X = datatable(Key:string, Value1:long)
[
    'a',1,
    'b',2,
    'b',3,
    'c',4
];
let Y = datatable(Key:string, Value2:long)
[
    'b',10,
    'c',20,
    'c',30,
    'd',40
];
X | join kind=leftouter Y on Key

Output

KeyValue1Key1Value2
a1
b2b10
b3b10
c4c20
c4c30

1.1.6 - leftsemi join

Learn how to use the leftsemi join flavor to merge the rows of two tables.

The leftsemi join flavor returns all records from the left side that match a record from the right side. Only columns from the left side are returned.

Diagram that shows how the join works.

Syntax

LeftTable | join kind=leftsemi [ Hints ] RightTable on Conditions

Returns

Schema: All columns from the left table.
Rows: All records from the left table that match records from the right table.

Example

This query filters and returns only those rows from table X that have a matching key in table Y.

let X = datatable(Key:string, Value1:long)
[
    'a',1,
    'b',2,
    'b',3,
    'c',4
];
let Y = datatable(Key:string, Value2:long)
[
    'b',10,
    'c',20,
    'c',30,
    'd',40
];
X | join kind=leftsemi Y on Key

Output

KeyValue1
b2
b3
c4

1.1.7 - rightanti join

Learn how to use the rightanti join flavor to merge the rows of two tables.

The rightanti join flavor returns all records from the right side that don’t match any record from the left side. The anti join models the “NOT IN” query.

Diagram that shows how the join works.

Syntax

LeftTable | join kind=rightanti [ Hints ] RightTable on Conditions

Returns

Schema: All columns from the right table.
Rows: All records from the right table that don’t match records from the left table.

Example

This query filters and returns only those rows from table Y that do not have a matching key in table X.

let X = datatable(Key:string, Value1:long)
[
    'a',1,
    'b',2,
    'b',3,
    'c',4
];
let Y = datatable(Key:string, Value2:long)
[
    'b',10,
    'c',20,
    'c',30,
    'd',40
];
X | join kind=rightanti Y on Key

Output

KeyValue1
d40

1.1.8 - rightouter join

Learn how to use the rightouter join flavor to merge the rows of two tables.

The rightouter join flavor returns all the records from the right side and only matching records from the left side. This join flavor resembles the leftouter join flavor, but the treatment of the tables is reversed.

Diagram that shows how the join works.

Syntax

LeftTable | join kind=rightouter [ Hints ] RightTable on Conditions

Returns

Schema: All columns from both tables, including the matching keys.
Rows: All records from the right table and only matching rows from the left table.

Example

This query returns all rows from table Y and any matching rows from table X, filling in NULL values where there is no match from X.

let X = datatable(Key:string, Value1:long)
[
    'a',1,
    'b',2,
    'b',3,
    'c',4
];
let Y = datatable(Key:string, Value2:long)
[
    'b',10,
    'c',20,
    'c',30,
    'd',40
];
X | join kind=rightouter Y on Key

Output

KeyValue1Key1Value2
b2b10
b3b10
c4c20
c4c30
d40

1.1.9 - rightsemi join

Learn how to use the rightsemi join flavor to merge the rows of two tables.

The rightsemi join flavor returns all records from the right side that match a record from the left side. Only columns from the right side are returned.

Diagram that shows how the join works.

Syntax

LeftTable | join kind=rightsemi [ Hints ] RightTable on Conditions

Returns

Schema: All columns from the right table.
Rows: All records from the right table that match records from the left table.

Example

This query filters and returns only those rows from table Y that have a matching key in table X.

let X = datatable(Key:string, Value1:long)
[
    'a',1,
    'b',2,
    'b',3,
    'c',4
];
let Y = datatable(Key:string, Value2:long)
[
    'b',10,
    'c',20,
    'c',30,
    'd',40
];
X | join kind=rightsemi Y on Key

Output

KeyValue2
b10
c20
c30

1.2 - Broadcast join

Learn how to use the broadcast join execution strategy to distribute the join over nodes.

Today, regular joins are executed on a cluster single node. Broadcast join is an execution strategy of join that distributes the join over cluster nodes. This strategy is useful when the left side of the join is small (up to several tens of MBs). In this case, a broadcast join is more performant than a regular join.

Today, regular joins are executed on an Eventhouse single node. Broadcast join is an execution strategy of join that distributes the join over Eventhouse nodes. This strategy is useful when the left side of the join is small (up to several tens of MBs). In this case, a broadcast join is more performant than a regular join.

Use the lookup operator if the right side is smaller than the left side. The lookup operator runs in broadcast strategy by default when the right side is smaller than the left.

If left side of the join is a small dataset, then you may run join in broadcast mode using the following syntax (hint.strategy = broadcast):

leftSide 
| join hint.strategy = broadcast (factTable) on key

The performance improvement is more noticeable in scenarios where the join is followed by other operators such as summarize. See the following query for example:

leftSide 
| join hint.strategy = broadcast (factTable) on Key
| summarize dcount(Messages) by Timestamp, Key

1.3 - Cross-cluster join

Learn how to perform the Cross-cluster join operation to join datasets residing on different clusters.

A cross-cluster join involves joining data from datasets that reside in different clusters.

In a cross-cluster join, the query can be executed in three possible locations, each with a specific designation for reference throughout this document:

  • Local cluster: The cluster to which the request is sent, which is also known as the cluster hosting the database in context.
  • Left cluster: The cluster hosting the data on the left side of the join operation.
  • Right cluster: The cluster hosting the data on the right side of the join operation.

The cluster that runs the query fetches the data from the other cluster.

Syntax

[ cluster(ClusterName).database(DatabaseName).]LeftTable |
| join [ hint.remote=Strategy ] (
  [ cluster(ClusterName).database(DatabaseName).]RightTable |
) on Conditions

Parameters

NameTypeRequiredDescription
LeftTablestring✔️The left table or tabular expression whose rows are to be merged. Denoted as $left.
StrategystringDetermines the cluster on which to execute the join. Supported values are: left, right, local, and auto. For more information, see Strategies.
ClusterNamestringIf the data for the join resides outside of the local cluster, use the cluster() function to specify the cluster.
DatabaseNamestringIf the data for the join resides outside of the local database context, use the database() function to specify the database.
RightTablestring✔️The right table or tabular expression whose rows are to be merged. Denoted as $right.
Conditionsstring✔️Determines how rows from LeftTable are matched with rows from RightTable. If the columns you want to match have the same name in both tables, use the syntax ON ColumnName. Otherwise, use the syntax ON $left.LeftColumn == $right.RightColumn. To specify multiple conditions, you can either use the “and” keyword or separate them with commas. If you use commas, the conditions are evaluated using the “and” logical operator.

Strategies

The following list explains the supported values for the Strategy parameter:

  • left: Execute join on the cluster of the left table, or left cluster.
  • right: Execute join on the cluster of the right table, or right cluster.
  • local: Execute join on the cluster of the current cluster, or local cluster.
  • auto: (Default) Kusto makes the remoting decision.

How the auto strategy works

By default, the auto strategy determines where the cross-cluster join is executed based on the following rules:

  • If one of the tables is hosted in the local cluster, then the join is performed on the local cluster. For example, with the auto strategy, this query is executed on the local cluster:

    T | ... | join (cluster("B").database("DB").T2 | ...) on Col1
    
  • If both tables are hosted outside of the local cluster, then join is performed on the right cluster. For example, assuming neither cluster is the local cluster, the join would be executed on the right cluster:

    cluster("B").database("DB").T | ... | join (cluster("C").database("DB2").T2 | ...) on Col1
    

Performance considerations

For optimal performance, we recommend running the query on the cluster that contains the largest table.

In the following example, if the dataset produced by T | ... is smaller than one produced by cluster("B").database("DB").T2 | ... then it would be more efficient to execute the join operation on cluster B, in this case the right cluster instead of on the local cluster.

T | ... | join (cluster("B").database("DB").T2 | ...) on Col1

You can rewrite the query to use hint.remote=right to optimize the performance. In this way, the join operation is performed on the right cluster, even if left table is in the local cluster.

T | ... | join hint.remote=right (cluster("B").database("DB").T2 | ...) on Col1

1.4 - join operator

Learn how to use the join operator to merge the rows of two tables.

Merge the rows of two tables to form a new table by matching values of the specified columns from each table.

Kusto Query Language (KQL) offers many kinds of joins that each affect the schema and rows in the resultant table in different ways. For example, if you use an inner join, the table has the same columns as the left table, plus the columns from the right table. For best performance, if one table is always smaller than the other, use it as the left side of the join operator.

The following image provides a visual representation of the operation performed by each join. The color of the shading represents the columns returned, and the areas shaded represent the rows returned.

Diagram showing query join kinds.

Syntax

LeftTable | join [ kind = JoinFlavor ] [ Hints ] (RightTable) on Conditions

Parameters

NameTypeRequiredDescription
LeftTablestring✔️The left table or tabular expression, sometimes called the outer table, whose rows are to be merged. Denoted as $left.
JoinFlavorstringThe type of join to perform: innerunique, inner, leftouter, rightouter, fullouter, leftanti, rightanti, leftsemi, rightsemi. The default is innerunique. For more information about join flavors, see Returns.
HintsstringZero or more space-separated join hints in the form of Name = Value that control the behavior of the row-match operation and execution plan. For more information, see Hints.
RightTablestring✔️The right table or tabular expression, sometimes called the inner table, whose rows are to be merged. Denoted as $right.
Conditionsstring✔️Determines how rows from LeftTable are matched with rows from RightTable. If the columns you want to match have the same name in both tables, use the syntax ON ColumnName. Otherwise, use the syntax ON $left.LeftColumn == $right.RightColumn. To specify multiple conditions, you can either use the “and” keyword or separate them with commas. If you use commas, the conditions are evaluated using the “and” logical operator.

Hints

Hint keyValuesDescription
hint.remoteauto, left, local, rightSee Cross-Cluster Join
hint.strategy=broadcastSpecifies the way to share the query load on cluster nodes.See broadcast join
hint.shufflekey=<key>The shufflekey query shares the query load on cluster nodes, using a key to partition data.See shuffle query
hint.strategy=shuffleThe shuffle strategy query shares the query load on cluster nodes, where each node processes one partition of the data.See shuffle query
NameValuesDescription
hint.remoteauto, left, local, right
hint.strategy=broadcastSpecifies the way to share the query load on cluster nodes.See broadcast join
hint.shufflekey=<key>The shufflekey query shares the query load on cluster nodes, using a key to partition data.See shuffle query
hint.strategy=shuffleThe shuffle strategy query shares the query load on cluster nodes, where each node processes one partition of the data.See shuffle query

Returns

The return schema and rows depend on the join flavor. The join flavor is specified with the kind keyword. The following table shows the supported join flavors. To see examples for a specific join flavor, select the link in the Join flavor column.

Join flavorReturnsIllustration
innerunique (default)Inner join with left side deduplication
Schema: All columns from both tables, including the matching keys
Rows: All deduplicated rows from the left table that match rows from the right table
:::image type=“icon” source=“media/joinoperator/join-innerunique.png” border=“false”:::
innerStandard inner join
Schema: All columns from both tables, including the matching keys
Rows: Only matching rows from both tables
:::image type=“icon” source=“media/joinoperator/join-inner.png” border=“false”:::
leftouterLeft outer join
Schema: All columns from both tables, including the matching keys
Rows: All records from the left table and only matching rows from the right table
:::image type=“icon” source=“media/joinoperator/join-leftouter.png” border=“false”:::
rightouterRight outer join
Schema: All columns from both tables, including the matching keys
Rows: All records from the right table and only matching rows from the left table
:::image type=“icon” source=“media/joinoperator/join-rightouter.png” border=“false”:::
fullouterFull outer join
Schema: All columns from both tables, including the matching keys
Rows: All records from both tables with unmatched cells populated with null
:::image type=“icon” source=“media/joinoperator/join-fullouter.png” border=“false”:::
leftsemiLeft semi join
Schema: All columns from the left table
Rows: All records from the left table that match records from the right table
:::image type=“icon” source=“media/joinoperator/join-leftsemi.png” border=“false”:::
leftanti, anti, leftantisemiLeft anti join and semi variant
Schema: All columns from the left table
Rows: All records from the left table that don’t match records from the right table
:::image type=“icon” source=“media/joinoperator/join-leftanti.png” border=“false”:::
rightsemiRight semi join
Schema: All columns from the right table
Rows: All records from the right table that match records from the left table
:::image type=“icon” source=“media/joinoperator/join-rightsemi.png” border=“false”:::
rightanti, rightantisemiRight anti join and semi variant
Schema: All columns from the right table
Rows: All records from the right table that don’t match records from the left table
:::image type=“icon” source=“media/joinoperator/join-rightanti.png” border=“false”:::

Cross-join

KQL doesn’t provide a cross-join flavor. However, you can achieve a cross-join effect by using a placeholder key approach.

In the following example, a placeholder key is added to both tables and then used for the inner join operation, effectively achieving a cross-join-like behavior:

X | extend placeholder=1 | join kind=inner (Y | extend placeholder=1) on placeholder

1.5 - Joining within time window

Learn how to perform a time window join operation to match between two large datasets.

It’s often useful to join between two large datasets on some high-cardinality key, such as an operation ID or a session ID, and further limit the right-hand-side ($right) records that need to match up with each left-hand-side ($left) record by adding a restriction on the “time-distance” between datetime columns on the left and on the right.

The above operation differs from the usual join operation, since for the equi-join part of matching the high-cardinality key between the left and right datasets, the system can also apply a distance function and use it to considerably speed up the join.

Example to identify event sequences without time window

To identify event sequences within a relatively small time window, this example uses a table T with the following schema:

  • SessionId: A column of type string with correlation IDs.
  • EventType: A column of type string that identifies the event type of the record.
  • Timestamp: A column of type datetime indicates when the event described by the record happened.
SessionIdEventTypeTimestamp
0A2017-10-01T00:00:00Z
0B2017-10-01T00:01:00Z
1B2017-10-01T00:02:00Z
1A2017-10-01T00:03:00Z
3A2017-10-01T00:04:00Z
3B2017-10-01T00:10:00Z

The following query creates the dataset and then identifies all the session IDs in which event type A was followed by an event type B within a 1min time window.

let T = datatable(SessionId:string, EventType:string, Timestamp:datetime)
[
    '0', 'A', datetime(2017-10-01 00:00:00),
    '0', 'B', datetime(2017-10-01 00:01:00),
    '1', 'B', datetime(2017-10-01 00:02:00),
    '1', 'A', datetime(2017-10-01 00:03:00),
    '3', 'A', datetime(2017-10-01 00:04:00),
    '3', 'B', datetime(2017-10-01 00:10:00),
];
T
| where EventType == 'A'
| project SessionId, Start=Timestamp
| join kind=inner
    (
    T 
    | where EventType == 'B'
    | project SessionId, End=Timestamp
    ) on SessionId
| where (End - Start) between (0min .. 1min)
| project SessionId, Start, End 

Output

SessionIdStartEnd
02017-10-01 00:00:00.00000002017-10-01 00:01:00.0000000

Example optimized with time window

To optimize this query, we can rewrite it to account for the time window. THe time window is expressed as a join key. Rewrite the query so that the datetime values are “discretized” into buckets whose size is half the size of the time window. Use equi-join to compare the bucket IDs.

The query finds pairs of events within the same session (SessionId) where an ‘A’ event is followed by a ‘B’ event within 1 minute. It projects the session ID, the start time of the ‘A’ event, and the end time of the ‘B’ event.

let T = datatable(SessionId:string, EventType:string, Timestamp:datetime)
[
    '0', 'A', datetime(2017-10-01 00:00:00),
    '0', 'B', datetime(2017-10-01 00:01:00),
    '1', 'B', datetime(2017-10-01 00:02:00),
    '1', 'A', datetime(2017-10-01 00:03:00),
    '3', 'A', datetime(2017-10-01 00:04:00),
    '3', 'B', datetime(2017-10-01 00:10:00),
];
let lookupWindow = 1min;
let lookupBin = lookupWindow / 2.0;
T 
| where EventType == 'A'
| project SessionId, Start=Timestamp, TimeKey = bin(Timestamp, lookupBin)
| join kind=inner
    (
    T 
    | where EventType == 'B'
    | project SessionId, End=Timestamp,
              TimeKey = range(bin(Timestamp-lookupWindow, lookupBin),
                              bin(Timestamp, lookupBin),
                              lookupBin)
    | mv-expand TimeKey to typeof(datetime)
    ) on SessionId, TimeKey 
| where (End - Start) between (0min .. lookupWindow)
| project SessionId, Start, End 

Output

SessionIdStartEnd
02017-10-01 00:00:00.00000002017-10-01 00:01:00.0000000

5 million data query

The next query emulates an extensive dataset of 5M records and approximately 1M Session IDs and runs the query with the time window technique.

let T = range x from 1 to 5000000 step 1
| extend SessionId = rand(1000000), EventType = rand(3), Time=datetime(2017-01-01)+(x * 10ms)
| extend EventType = case(EventType < 1, "A",
                          EventType < 2, "B",
                          "C");
let lookupWindow = 1min;
let lookupBin = lookupWindow / 2.0;
T 
| where EventType == 'A'
| project SessionId, Start=Time, TimeKey = bin(Time, lookupBin)
| join kind=inner
    (
    T 
    | where EventType == 'B'
    | project SessionId, End=Time, 
              TimeKey = range(bin(Time-lookupWindow, lookupBin), 
                              bin(Time, lookupBin),
                              lookupBin)
    | mv-expand TimeKey to typeof(datetime)
    ) on SessionId, TimeKey 
| where (End - Start) between (0min .. lookupWindow)
| project SessionId, Start, End 
| count 

Output

Count
3344

2 - Render operator

2.1 - visualizations

2.1.1 - Anomaly chart visualization

This article describes the anomaly chart visualization.

The anomaly chart visualization is similar to a timechart, but highlights anomalies using the series_decompose_anomalies function.

Syntax

T | render anomalychart [with ( propertyName = propertyValue [, …])]

Parameters

NameTypeRequiredDescription
Tstring✔️Input table name.
propertyName, propertyValuestringA comma-separated list of key-value property pairs. See supported properties.

Supported properties

All properties are optional.

PropertyNamePropertyValue
accumulateWhether the value of each measure gets added to all its predecessors. (true or false)
legendWhether to display a legend or not (visible or hidden).
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
yminThe minimum value to be displayed on Y-axis.
ymaxThe maximum value to be displayed on Y-axis.
titleThe title of the visualization (of type string).
xaxisHow to scale the x-axis (linear or log).
xcolumnWhich column in the result is used for the x-axis.
xtitleThe title of the x-axis (of type string).
yaxisHow to scale the y-axis (linear or log).
ycolumnsComma-delimited list of columns that consist of the values provided per value of the x column.
ysplitHow to split the visualization into multiple y-axis values. For more information, see Multiple y-axes.
ytitleThe title of the y-axis (of type string).
anomalycolumnsComma-delimited list of columns, which will be considered as anomaly series and displayed as points on the chart

ysplit property

This visualization supports splitting into multiple y-axis values. The supported values of this property are:

ysplitDescription
noneA single y-axis is displayed for all series data. (Default)
axesA single chart is displayed with multiple y-axes (one per series).
panelsOne chart is rendered for each ycolumn value. Maximum five panels.

Example

The example in this section shows how to use the syntax to help you get started.

let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t step dt by sid 
| where sid == 'TS1'   //  select a single time series for a cleaner visualization
| extend (anomalies, score, baseline) = series_decompose_anomalies(num, 1.5, -1, 'linefit')
| render anomalychart with(anomalycolumns=anomalies, title='Web app. traffic of a month, anomalies') //use "| render anomalychart with anomalycolumns=anomalies" to render the anomalies as bold points on the series charts.

Screenshot of anomaly chart output.

2.1.2 - Area chart visualization

This article describes the area chart visualization.

The area chart visual shows a time-series relationship. The first column of the query should be numeric and is used as the x-axis. Other numeric columns are the y-axes. Unlike line charts, area charts also visually represent volume. Area charts are ideal for indicating the change among different datasets.

Syntax

T | render areachart [with (propertyName = propertyValue [, …])]

Parameters

NameTypeRequiredDescription
Tstring✔️Input table name.
propertyName, propertyValuestringA comma-separated list of key-value property pairs. See supported properties.

Supported properties

All properties are optional.

PropertyNamePropertyValue
accumulateWhether the value of each measure gets added to all its predecessors. (true or false)
kindFurther elaboration of the visualization kind. For more information, see kind property.
legendWhether to display a legend or not (visible or hidden).
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
yminThe minimum value to be displayed on Y-axis.
ymaxThe maximum value to be displayed on Y-axis.
titleThe title of the visualization (of type string).
xaxisHow to scale the x-axis (linear or log).
xcolumnWhich column in the result is used for the x-axis.
xtitleThe title of the x-axis (of type string).
yaxisHow to scale the y-axis (linear or log).
ycolumnsComma-delimited list of columns that consist of the values provided per value of the x column.
ysplitHow to split the y-axis values for multiple visualizations.
ytitleThe title of the y-axis (of type string).

ysplit property

This visualization supports splitting into multiple y-axis values:

ysplitDescription
noneA single y-axis is displayed for all series data. (Default)
axesA single chart is displayed with multiple y-axes (one per series).
panelsOne chart is rendered for each ycolumn value. Maximum five panels.

Supported properties

All properties are optional.

PropertyNamePropertyValue
kindFurther elaboration of the visualization kind. For more information, see kind property.
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
titleThe title of the visualization (of type string).

kind property

This visualization can be further elaborated by providing the kind property. The supported values of this property are:

kind valueDescription
defaultEach “area” stands on its own.
unstackedSame as default.
stackedStack “areas” to the right.
stacked100Stack “areas” to the right and stretch each one to the same width as the others.

Examples

The example in this section shows how to use the syntax to help you get started.

Simple area chart

The following example shows a basic area chart visualization.

demo_series3
| render areachart

Screenshot of area chart visualization.

Area chart using properties

The following example shows an area chart using multiple property settings.

OccupancyDetection
| summarize avg_temp= avg(Temperature), avg_humidity= avg(Humidity) by bin(Timestamp, 1h)
| render areachart
    with ( 
        kind = unstacked,
        legend = visible,
        ytitle ="Sample value",
        ymin = 10,
        ymax =100,
        xtitle = "Time",    
        title ="Humidity and temperature"
    )

Screenshot of area chart visualization with properties.

Area chart using split panels

The following example shows an area chart using split panels. In this example, the ysplit property is set to panels.

StormEvents
| where State in ("TEXAS", "NEBRASKA", "KANSAS") and EventType == "Hail"
| summarize count=count() by State, bin(StartTime, 1d)
| render areachart
    with (
        ysplit= panels,
        legend = visible,
        ycolumns=count,
        yaxis =log,
        ytitle ="Count",
        ymin = 0,
        ymax =100,
        xaxis = linear,
        xcolumn = StartTime,
        xtitle = "Date",    
        title ="Hail events"
    )

Screenshot of area chart visualization with split panels.

2.1.3 - Bar chart visualization

This article describes the bar chart visualization.

The bar chart visual needs a minimum of two columns in the query result. By default, the first column is used as the y-axis. This column can contain text, datetime, or numeric data types. The other columns are used as the x-axis and contain numeric data types to be displayed as horizontal lines. Bar charts are used mainly for comparing numeric and nominal discrete values, where the length of each line represents its value.

Syntax

T | render barchart [with (propertyName = propertyValue [, …])]

Parameters

NameTypeRequiredDescription
Tstring✔️Input table name.
propertyName, propertyValuestringA comma-separated list of key-value property pairs. See supported properties.

Supported properties

All properties are optional.

PropertyNamePropertyValue
accumulateWhether the value of each measure gets added to all its predecessors (true or false).
kindFurther elaboration of the visualization kind. For more information, see kind property.
legendWhether to display a legend or not (visible or hidden).
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
yminThe minimum value to be displayed on Y-axis.
ymaxThe maximum value to be displayed on Y-axis.
titleThe title of the visualization (of type string).
xaxisHow to scale the x-axis (linear or log).
xcolumnWhich column in the result is used for the x-axis.
xtitleThe title of the x-axis (of type string).
yaxisHow to scale the y-axis (linear or log).
ycolumnsComma-delimited list of columns that consist of the values provided per value of the x column.
ytitleThe title of the y-axis (of type string).
ysplitHow to split the visualization into multiple y-axis values. For more information, see ysplit property.

ysplit property

This visualization supports splitting into multiple y-axis values:

ysplitDescription
noneA single y-axis is displayed for all series data. This is the default.
axesA single chart is displayed with multiple y-axes (one per series).
panelsOne chart is rendered for each ycolumn value. Maximum five panels.

Supported properties

All properties are optional.

PropertyNamePropertyValue
kindFurther elaboration of the visualization kind. For more information, see kind property.
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
titleThe title of the visualization (of type string).

kind property

This visualization can be further elaborated by providing the kind property. The supported values of this property are:

kind valueDescription
defaultEach “bar” stands on its own.
unstackedSame as default.
stackedStack “bars”.
stacked100Stack “bars” and stretch each one to the same width as the others.

Examples

The example in this section shows how to use the syntax to help you get started.

Render a bar chart

The following query creates a bar chart displaying the number of storm events for each state, filtering only those states with more than 10 events. The chart provides a visual representation of the event distribution across different states.

StormEvents
| summarize event_count=count() by State
| project State, event_count
| render barchart
    with (
    title="Storm count by state",
    ytitle="Storm count",
    xtitle="State",
    legend=hidden
    )

Screenshot of a labeled bar chart.

Render a stacked bar chart

The following query creates a stacked bar chart that shows the total count of storm events by their type for selected states of Texas, California, and Florida. Each bar represents a storm event type, and the stacked bars show the breakdown of storm events by state within each type.

StormEvents
| where State in ("TEXAS", "CALIFORNIA", "FLORIDA")
| summarize EventCount = count() by EventType, State
| order by EventType asc, State desc
| render barchart with (kind=stacked)

Scrrenshot of a stacked bar chart visualization.

Render a stacked100 bar chart

The following query creates a stacked100 bar chart that shows the total count of storm events by their type for selected states of Texas, California, and Florida. The chart shows the distribution of storm events across states within each type. Although the stacks visually sum up to 100, the values actually represent the number of events, not percentages. This visualization is helpful for understanding both the percentages and the actual event counts.

StormEvents
| where State in ("TEXAS", "CALIFORNIA", "FLORIDA")
| summarize EventCount = count() by EventType, State
| order by EventType asc, State desc
| render barchart with (kind=stacked100)

Screenshot of a stacked 100 bar chart visualization.

Use the ysplit property

The following query provides a daily summary of storm-related injuries and deaths, visualized as a bar chart with split axes/panels for better comparison.

StormEvents
| summarize
    TotalInjuries = sum(InjuriesDirect) + sum(InjuriesIndirect),
    TotalDeaths = sum(DeathsDirect) + sum(DeathsIndirect)
    by bin(StartTime, 1d)
| project StartTime, TotalInjuries, TotalDeaths
| render barchart with (ysplit=axes)

Screenshot of column chart using ysplit axes property.

To split the view into separate panels, specify panels instead of axes:

StormEvents
| summarize
    TotalInjuries = sum(InjuriesDirect) + sum(InjuriesIndirect),
    TotalDeaths = sum(DeathsDirect) + sum(DeathsIndirect)
    by bin(StartTime, 1d)
| project StartTime, TotalInjuries, TotalDeaths
| render barchart with (ysplit=panels)

Screenshot of column chart using ysplit panels property.

2.1.4 - Card visualization

This article describes the card visualization.

The card visual only shows one element. If there are multiple columns and rows in the output, the first result record is treated as set of scalar values and shows as a card.

Syntax

T | render card [with (propertyName = propertyValue [, …])]

Parameters

NameTypeRequiredDescription
Tstring✔️Input table name.
propertyName, propertyValuestringA comma-separated list of key-value property pairs. See supported properties.

Supported properties

All properties are optional.

PropertyNamePropertyValue
titleThe title of the visualization (of type string).

Example

This query provides a count of flood events in Virginia and displays the result in a card format.

StormEvents
| where State=="VIRGINIA" and EventType=="Flood"
| count
| render card with (title="Floods in Virginia")

Screenshot of card visual.

2.1.5 - Column chart visualization

This article describes the column chart visualization.

The column chart visual needs a minimum of two columns in the query result. By default, the first column is used as the x-axis. This column can contain text, datetime, or numeric data types. The other columns are used as the y-axis and contain numeric data types to be displayed as vertical lines. Column charts are used for comparing specific sub category items in a main category range, where the length of each line represents its value.

Syntax

T | render columnchart [with (propertyName = propertyValue [, …])]

Parameters

NameTypeRequiredDescription
Tstring✔️Input table name.
propertyName, propertyValuestringA comma-separated list of key-value property pairs. See supported properties.

Supported properties

All properties are optional.

PropertyNamePropertyValue
accumulateWhether the value of each measure gets added to all its predecessors. (true or false)
kindFurther elaboration of the visualization kind. For more information, see kind property.
legendWhether to display a legend or not (visible or hidden).
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
yminThe minimum value to be displayed on Y-axis.
ymaxThe maximum value to be displayed on Y-axis.
titleThe title of the visualization (of type string).
xaxisHow to scale the x-axis (linear or log).
xcolumnWhich column in the result is used for the x-axis.
xtitleThe title of the x-axis (of type string).
yaxisHow to scale the y-axis (linear or log).
ycolumnsComma-delimited list of columns that consist of the values provided per value of the x column.
ytitleThe title of the y-axis (of type string).
ysplitHow to split the visualization into multiple y-axis values. For more information, see ysplit property.

ysplit property

This visualization supports splitting into multiple y-axis values:

ysplitDescription
noneA single y-axis is displayed for all series data. This is the default.
axesA single chart is displayed with multiple y-axes (one per series).
panelsOne chart is rendered for each ycolumn value. Maximum five panels.

Supported properties

All properties are optional.

PropertyNamePropertyValue
kindFurther elaboration of the visualization kind. For more information, see kind property.
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
titleThe title of the visualization (of type string).

kind property

This visualization can be further elaborated by providing the kind property. The supported values of this property are:

kind valueDefinition
defaultEach “column” stands on its own.
unstackedSame as default.
stackedStack “columns” one atop the other.
stacked100Stack “columns” and stretch each one to the same height as the others.

Examples

The example in this section shows how to use the syntax to help you get started.

Render a column chart

This query provides a visual representation of states with a high frequency of storm events, specifically those with more than 10 events, using a column chart.

StormEvents
| summarize event_count=count() by State
| where event_count > 10
| project State, event_count
| render columnchart

Screenshot of column chart visualization.

Use the ysplit property

This query provides a daily summary of storm-related injuries and deaths, visualized as a column chart with split axes/panels for better comparison.

StormEvents
| summarize
    TotalInjuries = sum(InjuriesDirect) + sum(InjuriesIndirect),
    TotalDeaths = sum(DeathsDirect) + sum(DeathsIndirect)
    by bin(StartTime, 1d)
| project StartTime, TotalInjuries, TotalDeaths
| render columnchart with (ysplit=axes)

Screenshot of column chart using ysplit axes property.

To split the view into separate panels, specify panels instead of axes:

StormEvents
| summarize
    TotalInjuries = sum(InjuriesDirect) + sum(InjuriesIndirect),
    TotalDeaths = sum(DeathsDirect) + sum(DeathsIndirect)
    by bin(StartTime, 1d)
| project StartTime, TotalInjuries, TotalDeaths
| render columnchart with (ysplit=panels)

Screenshot of column chart using ysplit panels property.

Example

This query helps you identify states with a significant number of storm events and presents the information in a clear, visual format.

StormEvents
| summarize event_count=count() by State
| where event_count > 10
| project State, event_count
| render columnchart

Screenshot of column chart visualization.

2.1.6 - Ladder chart visualization

This article describes the ladder chart visualization.

The last two columns are the x-axis, and the other columns are the y-axis.

Syntax

T | render ladderchart [with (propertyName = propertyValue [, …])]

Parameters

NameTypeRequiredDescription
Tstring✔️Input table name
propertyName, propertyValuestringA comma-separated list of key-value property pairs. See supported properties.

Supported properties

All properties are optional.

PropertyNamePropertyValue
accumulateWhether the value of each measure gets added to all its predecessors. (true or false)
legendWhether to display a legend or not (visible or hidden).
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
yminThe minimum value to be displayed on Y-axis.
ymaxThe maximum value to be displayed on Y-axis.
titleThe title of the visualization (of type string).
xaxisHow to scale the x-axis (linear or log).
xcolumnWhich column in the result is used for the x-axis.
xtitleThe title of the x-axis (of type string).
yaxisHow to scale the y-axis (linear or log).
ycolumnsComma-delimited list of columns that consist of the values provided per value of the x column.
ytitleThe title of the y-axis (of type string).

Examples

The example in this section shows how to use the syntax to help you get started.

The examples in this article use publicly available tables in the help cluster, such as the StormEvents table in the Samples database.

Dates of storms by state

This query outputs a state-wise visualization of the duration of rain-related storm events, displayed as a ladder chart to help you analyze the temporal distribution of these events.

StormEvents
| where EventType  has "rain"
| summarize min(StartTime), max(EndTime) by State
| render ladderchart

Screenshot of ladderchart showing dates of storms by state.

Dates of storms by event type

This query outputs a visualization of the duration of various storm events in Washington, displayed as a ladder chart to help you analyze the temporal distribution of these events by type.

StormEvents
| where State == "WASHINGTON"
| summarize min(StartTime), max(EndTime) by EventType
| render ladderchart

Screenshot of ladderchart showing dates of storms by event type.

Dates of storms by state and event type

This query outputs a visualization of the duration of various storm events in states starting with “W”, displayed as a ladder chart to help you analyze the temporal distribution of these events by state and event type.

StormEvents
| where State startswith "W"
| summarize min(StartTime), max(EndTime) by State, EventType
| render ladderchart with (series=State, EventType)

Screenshot of ladderchart showing dates of storms by state and event type.

2.1.7 - Line chart visualization

This article describes the line chart visualization.

The line chart visual is the most basic type of chart. The first column of the query should be numeric and is used as the x-axis. Other numeric columns are the y-axes. Line charts track changes over short and long periods of time. When smaller changes exist, line graphs are more useful than bar graphs.

Syntax

T | render linechart [with ( propertyName = propertyValue [, …] )]

Parameters

NameTypeRequiredDescription
Tstring✔️Input table name.
propertyName, propertyValuestringA comma-separated list of key-value property pairs. See supported properties.

Supported properties

All properties are optional.

PropertyNamePropertyValue
accumulateWhether the value of each measure gets added to all its predecessors (true or false).
legendWhether to display a legend or not (visible or hidden).
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
yminThe minimum value to be displayed on Y-axis.
ymaxThe maximum value to be displayed on Y-axis.
titleThe title of the visualization (of type string).
xaxisHow to scale the x-axis (linear or log).
xcolumnWhich column in the result is used for the x-axis.
xtitleThe title of the x-axis (of type string).
yaxisHow to scale the y-axis (linear or log).
ycolumnsComma-delimited list of columns that consist of the values provided per value of the x column.
ysplitHow to split the visualization into multiple y-axis values. For more information, see ysplit property.
ytitleThe title of the y-axis (of type string).

ysplit property

This visualization supports splitting into multiple y-axis values:

ysplitDescription
noneA single y-axis is displayed for all series data. (Default)
axesA single chart is displayed with multiple y-axes (one per series).
panelsOne chart is rendered for each ycolumn value. Maximum five panels.

Examples

The example in this section shows how to use the syntax to help you get started.

Render a line chart

This query retrieves storm events in Virginia, focusing on the start time and property damage, and then displays this information in a line chart.

StormEvents
| where State=="VIRGINIA"
| project StartTime, DamageProperty
| render linechart 

Screenshot of line chart visualization output.

Label a line chart

This query retrieves storm events in Virginia, focusing on the start time and property damage, and then displays this information in a line chart with specified titles for better clarity and presentation.

StormEvents
| where State=="VIRGINIA"
| project StartTime, DamageProperty
| render linechart
    with (
    title="Property damage from storms in Virginia",
    xtitle="Start time of storm",
    ytitle="Property damage"
    )

Screenshot of line chart with labels.

Limit values displayed on the y-axis

This query retrieves storm events in Virginia, focusing on the start time and property damage, and then displays this information in a line chart with specified y-axis limits for better visualization of the data.

StormEvents
| where State=="VIRGINIA"
| project StartTime, DamageProperty
| render linechart with (ymin=7000, ymax=300000)

Screenshot of line chart with limitations on y-axis values.

View multiple y-axes

This query retrieves hail events in Texas, Nebraska, and Kansas. It counts the number of hail events per day for each state, and then displays this information in a line chart with separate panels for each state.

StormEvents
| where State in ("TEXAS", "NEBRASKA", "KANSAS") and EventType == "Hail"
| summarize count() by State, bin(StartTime, 1d)
| render linechart with (ysplit=panels)

Screenshot of the time chart query result with the ysplit panels property.

2.1.8 - Pie chart visualization

This article describes the pie chart visualization.

The pie chart visual needs a minimum of two columns in the query result. By default, the first column is used as the color axis. This column can contain text, datetime, or numeric data types. Other columns will be used to determine the size of each slice and contain numeric data types. Pie charts are used for presenting a composition of categories and their proportions out of a total.

The pie chart visual can also be used in the context of Geospatial visualizations.

Syntax

T | render piechart [with (propertyName = propertyValue [, …])]

Parameters

NameTypeRequiredDescription
Tstring✔️Input table name.
propertyName, propertyValuestringA comma-separated list of key-value property pairs. See supported properties.

Supported properties

All properties are optional.

PropertyNamePropertyValue
accumulateWhether the value of each measure gets added to all its predecessors. (true or false)
kindFurther elaboration of the visualization kind. For more information, see kind property.
legendWhether to display a legend or not (visible or hidden).
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
titleThe title of the visualization (of type string).
xaxisHow to scale the x-axis (linear or log).
xcolumnWhich column in the result is used for the x-axis.
xtitleThe title of the x-axis (of type string).
yaxisHow to scale the y-axis (linear or log).
ycolumnsComma-delimited list of columns that consist of the values provided per value of the x column.
ytitleThe title of the y-axis (of type string).
PropertyNamePropertyValue
kindFurther elaboration of the visualization kind. For more information, see kind property.
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
titleThe title of the visualization (of type string).

kind property

This visualization can be further elaborated by providing the kind property. The supported values of this property are:

kind valueDescription
mapExpected columns are [Longitude, Latitude] or GeoJSON point, color-axis and numeric. Supported in Kusto Explorer desktop. For more information, see Geospatial visualizations

Example

This query provides a visual representation of the top 10 states with the highest number of storm events, displayed as a pie chart

StormEvents
| summarize statecount=count() by State
| sort by statecount 
| limit 10
| render piechart with(title="Storm Events by State")

Screenshot of pie chart visualization output.

2.1.9 - Pivot chart visualization

This article describes the pivot chart visualization.

Displays a pivot table and chart. You can interactively select data, columns, rows, and various chart types.

Syntax

T | render pivotchart

Parameters

NameTypeRequiredDescription
Tstring✔️Input table name.

Example

This query provides a detailed analysis of sales for Contoso computer products within the specified date range, visualized as a pivot chart.

SalesFact
| join kind= inner Products on ProductKey
| where ProductCategoryName has "Computers" and ProductName has "Contoso"
| where DateKey between (datetime(2006-12-31) .. datetime(2007-02-01))
| project SalesAmount, ProductName, DateKey
| render pivotchart

Output

Screenshot of query result showing a pivot chart visualization.

2.1.10 - Plotly visualization

This article describes how to visualize data using the Plotly graphics library.

The Plotly graphics library supports ~80 chart types that are useful for advanced charting including geographic, scientific, machine learning, 3d, animation, and many other chart types. For more information, see Plotly.

To render a Plotly visual in Kusto Query Language, the query must generate a table with a single string cell containing Plotly JSON. This Plotly JSON string can be generated by one of the following two methods:

Write your own Plotly visualization in Python

In this method, you dynamically create the Plotly JSON string in Python using the Plotly package. This process requires use of the python() plugin. The Python script is run on the existing nodes using the inline python() plugin. It generates a Plotly JSON that is rendered by the client application.

All types of Plotly visualizations are supported.

Example

The following query uses inline Python to create a 3D scatter chart:

OccupancyDetection
| project Temperature, Humidity, CO2, Occupancy
| where rand() < 0.1
| evaluate python(typeof(plotly:string),
```if 1:
    import plotly.express as px
    fig = px.scatter_3d(df, x='Temperature', y='Humidity', z='CO2', color='Occupancy')
    fig.update_layout(title=dict(text="Occupancy detection, plotly 5.11.0"))
    plotly_obj = fig.to_json()
    result = pd.DataFrame(data = [plotly_obj], columns = ["plotly"])
```)

Screenshot of plotly visual type.

The Plotly graphics library supports ~80 chart types including basic charts, scientific, statistical, financial, maps, 3D, animations, and more. To render a Plotly visual in KQL, the query must generate a table with a single string cell containing Plotly JSON.

Since python isn’t available in this service, you create this Plotly JSON using a preprepared template.

Use a preprepared Plotly template

In this method, a preprepared Plotly JSON for specific visualization can be reused by replacing the data objects with the required data to be rendered. The templates can be stored in a standard table, and the data replacement logic can be packed in a stored function.

Currently, the supported templates are: plotly_anomaly_fl() and plotly_scatter3d_fl(). Refer to these documents for syntax and usage.

Example

let plotly_scatter3d_fl=(tbl:(*), x_col:string, y_col:string, z_col:string, aggr_col:string='', chart_title:string='3D Scatter chart')
{
    let scatter3d_chart = toscalar(PlotlyTemplate | where name == "scatter3d" | project plotly);
    let tbl_ex = tbl | extend _x = column_ifexists(x_col, 0.0), _y = column_ifexists(y_col, 0.0), _z = column_ifexists(z_col, 0.0), _aggr = column_ifexists(aggr_col, 'ALL');
    tbl_ex
    | serialize 
    | summarize _x=pack_array(make_list(_x)), _y=pack_array(make_list(_y)), _z=pack_array(make_list(_z)) by _aggr
    | summarize _aggr=make_list(_aggr), _x=make_list(_x), _y=make_list(_y), _z=make_list(_z)
    | extend plotly = scatter3d_chart
    | extend plotly=replace_string(plotly, '$CLASS1$', tostring(_aggr[0]))
    | extend plotly=replace_string(plotly, '$CLASS2$', tostring(_aggr[1]))
    | extend plotly=replace_string(plotly, '$CLASS3$', tostring(_aggr[2]))
    | extend plotly=replace_string(plotly, '$X_NAME$', x_col)
    | extend plotly=replace_string(plotly, '$Y_NAME$', y_col)
    | extend plotly=replace_string(plotly, '$Z_NAME$', z_col)
    | extend plotly=replace_string(plotly, '$CLASS1_X$', tostring(_x[0]))
    | extend plotly=replace_string(plotly, '$CLASS1_Y$', tostring(_y[0]))
    | extend plotly=replace_string(plotly, '$CLASS1_Z$', tostring(_z[0]))
    | extend plotly=replace_string(plotly, '$CLASS2_X$', tostring(_x[1]))
    | extend plotly=replace_string(plotly, '$CLASS2_Y$', tostring(_y[1]))
    | extend plotly=replace_string(plotly, '$CLASS2_Z$', tostring(_z[1]))
    | extend plotly=replace_string(plotly, '$CLASS3_X$', tostring(_x[2]))
    | extend plotly=replace_string(plotly, '$CLASS3_Y$', tostring(_y[2]))
    | extend plotly=replace_string(plotly, '$CLASS3_Z$', tostring(_z[2]))
    | extend plotly=replace_string(plotly, '$TITLE$', chart_title)
    | project plotly
};
Iris
| invoke plotly_scatter3d_fl(x_col='SepalLength', y_col='PetalLength', z_col='SepalWidth', aggr_col='Class', chart_title='3D scatter chart using plotly_scatter3d_fl()')
| render plotly

Screenshot of output of plotly example.

2.1.11 - Scatter chart visualization

This article describes the scatter chart visualization.

In a scatter chart visual, the first column is the x-axis and should be a numeric column. Other numeric columns are y-axes. Scatter plots are used to observe relationships between variables. The scatter chart visual can also be used in the context of Geospatial visualizations.

Syntax

T | render scatterchart [with (propertyName = propertyValue [, …])]

Parameters

NameTypeRequiredDescription
Tstring✔️Input table name.
propertyName, propertyValuestringA comma-separated list of key-value property pairs. See supported properties.

Supported properties

All properties are optional.

PropertyNamePropertyValue
accumulateWhether the value of each measure gets added to all its predecessors. (true or false)
kindFurther elaboration of the visualization kind. For more information, see kind property.
legendWhether to display a legend or not (visible or hidden).
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
yminThe minimum value to be displayed on Y-axis.
ymaxThe maximum value to be displayed on Y-axis.
titleThe title of the visualization (of type string).
xaxisHow to scale the x-axis (linear or log).
xcolumnWhich column in the result is used for the x-axis.
xtitleThe title of the x-axis (of type string).
yaxisHow to scale the y-axis (linear or log).
ycolumnsComma-delimited list of columns that consist of the values provided per value of the x column.
ytitleThe title of the y-axis (of type string).
PropertyNamePropertyValue
kindFurther elaboration of the visualization kind. For more information, see kind property.
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
titleThe title of the visualization (of type string).

kind property

This visualization can be further elaborated by providing the kind property. The supported values of this property are:

kind valueDescription
mapExpected columns are [Longitude, Latitude] or GeoJSON point. Series column is optional. For more information, see Geospatial visualizations.

Example

This query provides a scatter chart that helps you analyze the correlation between state populations and the total property damage caused by storm events.

StormEvents
| summarize sum(DamageProperty)by State
| lookup PopulationData on State
| project-away State
| render scatterchart with (xtitle="State population", title="Property damage by state", legend=hidden)

Screenshot of scatter chart visualization output.

2.1.12 - Stacked area chart visualization

This article describes the stacked area chart visualization.

The stacked area chart visual shows a continuous relationship. This visual is similar to the Area chart, but shows the area under each element of a series. The first column of the query should be numeric and is used as the x-axis. Other numeric columns are the y-axes. Unlike line charts, area charts also visually represent volume. Area charts are ideal for indicating the change among different datasets.

Syntax

T | render stackedareachart [with (propertyName = propertyValue [, …])]

Supported parameters

NameTypeRequiredDescription
Tstring✔️Input table name.
propertyName, propertyValuestringA comma-separated list of key-value property pairs. See supported properties.

Supported properties

All properties are optional.

PropertyNamePropertyValue
accumulateWhether the value of each measure gets added to all its predecessors. (true or false)
legendWhether to display a legend or not (visible or hidden).
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
yminThe minimum value to be displayed on Y-axis.
ymaxThe maximum value to be displayed on Y-axis.
titleThe title of the visualization (of type string).
xaxisHow to scale the x-axis (linear or log).
xcolumnWhich column in the result is used for the x-axis.
xtitleThe title of the x-axis (of type string).
yaxisHow to scale the y-axis (linear or log).
ycolumnsComma-delimited list of columns that consist of the values provided per value of the x column.
ytitleThe title of the y-axis (of type string).

Example

The following query summarizes data from the nyc_taxi table by number of passengers and visualizes the data in a stacked area chart. The x-axis shows the pickup time in two day intervals, and the stacked areas represent different passenger counts.

nyc_taxi
| summarize count() by passenger_count, bin(pickup_datetime, 2d)
| render stackedareachart with (xcolumn=pickup_datetime, series=passenger_count)

Output

Screenshot of stacked area chart visual output.

2.1.13 - Table visualization

This article describes the table visualization.

Default - results are shown as a table.

Syntax

T | render table [with (propertyName = propertyValue [, …])]

Parameters

NameTypeRequiredDescription
Tstring✔️Input table name.
propertyName, propertyValuestringA comma-separated list of key-value property pairs. See supported properties.

Supported properties

All properties are optional.

PropertyNamePropertyValue
accumulateWhether the value of each measure gets added to all its predecessors. (true or false)
legendWhether to display a legend or not (visible or hidden).
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
yminThe minimum value to be displayed on Y-axis.
ymaxThe maximum value to be displayed on Y-axis.
titleThe title of the visualization (of type string).
xaxisHow to scale the x-axis (linear or log).
xcolumnWhich column in the result is used for the x-axis.
xtitleThe title of the x-axis (of type string).
yaxisHow to scale the y-axis (linear or log).
ycolumnsComma-delimited list of columns that consist of the values provided per value of the x column.
ytitleThe title of the y-axis (of type string).
PropertyNamePropertyValue
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
titleThe title of the visualization (of type string).

Example

This query outputs a snapshot of the first 10 storm event records, displayed in a table format.

StormEvents
| take 10 
| render table 

Screenshot of table visualization output.

2.1.14 - Time chart visualization

This article describes the time chart visualization.

A time chart visual is a type of line graph. The first column of the query is the x-axis, and should be a datetime. Other numeric columns are y-axes. One string column values are used to group the numeric columns and create different lines in the chart. Other string columns are ignored. The time chart visual is like a line chart except the x-axis is always time.

Syntax

T | render timechart [with (propertyName = propertyValue [, …])]

Parameters

NameTypeRequiredDescription
Tstring✔️Input table name.
propertyName, propertyValuestringA comma-separated list of key-value property pairs. See supported properties.

Supported properties

All properties are optional.

PropertyNamePropertyValue
accumulateWhether the value of each measure gets added to all its predecessors (true or false).
legendWhether to display a legend or not (visible or hidden).
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
yminThe minimum value to be displayed on Y-axis.
ymaxThe maximum value to be displayed on Y-axis.
titleThe title of the visualization (of type string).
xaxisHow to scale the x-axis (linear or log).
xcolumnWhich column in the result is used for the x-axis.
xtitleThe title of the x-axis (of type string).
yaxisHow to scale the y-axis (linear or log).
ycolumnsComma-delimited list of columns that consist of the values provided per value of the x column.
ysplitHow to split the visualization into multiple y-axis values. For more information, see ysplit property.
ytitleThe title of the y-axis (of type string).

ysplit property

This visualization supports splitting into multiple y-axis values:

ysplitDescription
noneA single y-axis is displayed for all series data. (Default)
axesA single chart is displayed with multiple y-axes (one per series).
panelsOne chart is rendered for each ycolumn value. Maximum five panels.

Examples

The example in this section shows how to use the syntax to help you get started.

Render a timechart

The following example renders a timechart with a title “Web app. traffic over a month, decomposing” that decomposes the data into baseline, seasonal, trend, and residual components.

let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t step dt by sid 
| where sid == 'TS1'   //  select a single time series for a cleaner visualization
| extend (baseline, seasonal, trend, residual) = series_decompose(num, -1, 'linefit')  //  decomposition of a set of time series to seasonal, trend, residual, and baseline (seasonal+trend)
| render timechart with(title='Web app. traffic over a month, decomposition')

Screenshot of timechart visualization output.

Label a timechart

The following example renders a timechart that depicts crop damage grouped by week. The timechart x axis label is “Date” and the y axis label is “Crop damage.”

StormEvents
| where StartTime between (datetime(2007-01-01) .. datetime(2007-12-31)) 
    and DamageCrops > 0
| summarize EventCount = count() by bin(StartTime, 7d)
| render timechart
    with (
    title="Crop damage over time",
    xtitle="Date",
    ytitle="Crop damage",
    legend=hidden
    )

Screenshot of timechart with labels.

View multiple y-axes

The following example renders daily hail events in the states of Texas, Nebraska, and Kansas. The visualization uses the ysplit property to render each state’s events in separate panels for comparison.

StormEvents
| where State in ("TEXAS", "NEBRASKA", "KANSAS") and EventType == "Hail"
| summarize count() by State, bin(StartTime, 1d)
| render timechart with (ysplit=panels)

Screenshot of the time chart query result with the ysplit panels property.

Supported properties

All properties are optional.

PropertyNamePropertyValue
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
titleThe title of the visualization (of type string).

Example

The following example renders a timechart with a title “Web app. traffic over a month, decomposing” that decomposes the data into baseline, seasonal, trend, and residual components.

let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t step dt by sid 
| where sid == 'TS1'   //  select a single time series for a cleaner visualization
| extend (baseline, seasonal, trend, residual) = series_decompose(num, -1, 'linefit')  //  decomposition of a set of time series to seasonal, trend, residual, and baseline (seasonal+trend)
| render timechart with(title='Web app. traffic of a month, decomposition')

Screenshot of timechart visualization output.

2.1.15 - Time pivot visualization

This article describes the time pivot visualization.

The time pivot visualization is an interactive navigation over the events time-line pivoting on time axis.

Syntax

T | render timepivot [with (propertyName = propertyValue [, …])]

Parameters

NameTypeRequiredDescription
Tstring✔️Input table name.
propertyName, propertyValuestringA comma-separated list of key-value property pairs. See supported properties.

Supported properties

All properties are optional.

PropertyNamePropertyValue
accumulateWhether the value of each measure gets added to all its predecessors. (true or false)
legendWhether to display a legend or not (visible or hidden).
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
yminThe minimum value to be displayed on Y-axis.
ymaxThe maximum value to be displayed on Y-axis.
titleThe title of the visualization (of type string).
xaxisHow to scale the x-axis (linear or log).
xcolumnWhich column in the result is used for the x-axis.
xtitleThe title of the x-axis (of type string).
yaxisHow to scale the y-axis (linear or log).
ycolumnsComma-delimited list of columns that consist of the values provided per value of the x column.
ytitleThe title of the y-axis (of type string).

Example

This query outputs a visualization of flood events in the specified Midwestern states, displayed as a time pivot chart.

let midwesternStates = dynamic([
    "ILLINOIS", "INDIANA", "IOWA", "KANSAS", "MICHIGAN", "MINNESOTA",
    "MISSOURI", "NEBRASKA", "NORTH DAKOTA", "OHIO", "SOUTH DAKOTA", "WISCONSIN"
]);
StormEvents
| where EventType == "Flood" and State in (midwesternStates)
| render timepivot with (xcolumn=State)

Output

:::image type=“content” source=“media/visualization-timepivot/time-pivot-visualization.jpg” lightbox=“media/visualization-timepivot/time-pivot-visualization.jpg” alt-text=“Screenshot of timepivot in Kusto.Explorer.”:::

2.1.16 - Treemap visualization

Learn how to use the treemap visualization to visualize data.

Treemaps display hierarchical data as a set of nested rectangles. Each level of the hierarchy is represented by a colored rectangle (branch) containing smaller rectangles (leaves).

Syntax

T | render treemap [with (propertyName = propertyValue [, …])]

Parameters

NameTypeRequiredDescription
Tstring✔️Input table name.
propertyName, propertyValuestringA comma-separated list of key-value property pairs. See supported properties.

Supported properties

All properties are optional.

PropertyNamePropertyValue
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.

Example

This query counts the number of storm events for each type and state, sorts them in descending order, limits the results to the top 30, and then visualizes the data as a treemap.

StormEvents
| summarize StormEvents=count() by EventType, State
| sort by StormEvents
| limit 30
| render treemap with(title="Storm Events by EventType and State")

Screenshot of treemap visualization output.

2.2 - render operator

Learn how to use the render operator to instruct the user agent to render a visualization of the query results.

Instructs the user agent to render a visualization of the query results.

The render operator must be the last operator in the query, and can only be used with queries that produce a single tabular data stream result. The render operator doesn’t modify data. It injects an annotation (“Visualization”) into the result’s extended properties. The annotation contains the information provided by the operator in the query. The interpretation of the visualization information is done by the user agent. Different agents, such as Kusto.Explorer or Azure Data Explorer web UI, may support different visualizations.

The data model of the render operator looks at the tabular data as if it has three kinds of columns:

  • The x axis column (indicated by the xcolumn property).

  • The series columns (any number of columns indicated by the series property.) For each record, the combined values of these columns define a single series, and the chart has as many series as there are distinct combined values.

  • The y axis columns (any number of columns indicated by the ycolumns property). For each record, the series has as many measurements (“points” in the chart) as there are y-axis columns.

    by the query. In particular, having “uninteresting” columns in the schema of the result might translate into them guessing wrong. Try projecting-away such columns when that happens.

Syntax

T | render visualization [with ( propertyName = propertyValue [, …])]

Parameters

NameTypeRequiredDescription
Tstring✔️Input table name.
visualizationstring✔️Indicates the kind of visualization to use. Must be one of the supported values in the following list.
propertyName, propertyValuestringA comma-separated list of key-value property pairs. See supported properties.

Visualization

visualizationDescriptionIllustration
anomalychartSimilar to timechart, but highlights anomalies using series_decompose_anomalies function.:::image type=“icon” source=“media/renderoperator/anomaly-chart.png” border=“false”:::
areachartArea graph.:::image type=“icon” source=“media/renderoperator/area-chart.png” border=“false”:::
barchartdisplayed as horizontal strips.:::image type=“icon” source=“media/renderoperator/bar-chart.png” border=“false”:::
cardFirst result record is treated as set of scalar values and shows as a card.:::image type=“icon” source=“media/renderoperator/card.png” border=“false”:::
columnchartLike barchart with vertical strips instead of horizontal strips.:::image type=“icon” source=“media/renderoperator/column-chart.png” border=“false”:::
ladderchartLast two columns are the x-axis, other columns are y-axis.:::image type=“icon” source=“media/renderoperator/ladder-chart.png” border=“false”:::
linechartLine graph.:::image type=“icon” source=“media/renderoperator/line-chart.png” border=“false”:::
piechartFirst column is color-axis, second column is numeric.:::image type=“icon” source=“media/renderoperator/pie-chart.png” border=“false”:::
pivotchartDisplays a pivot table and chart. User can interactively select data, columns, rows and various chart types.:::image type=“icon” source=“media/renderoperator/pivot-chart.png” border=“false”:::
scatterchartPoints graph.:::image type=“icon” source=“media/renderoperator/scatter-chart.png” border=“false”:::
stackedareachartStacked area graph.:::image type=“icon” source=“media/renderoperator/stacked-area-chart.png” border=“false”:::
tableDefault - results are shown as a table.:::image type=“icon” source=“media/renderoperator/table-visualization.png” border=“false”:::
timechartLine graph. First column is x-axis, and must be datetime. Other (numeric) columns are y-axes.:::image type=“icon” source=“media/renderoperator/visualization-timechart.png” border=“false”:::
timepivotInteractive navigation over the events time-line (pivoting on time axis):::image type=“icon” source=“media/renderoperator/visualization-time-pivot.png” border=“false”:::
treemapDisplays hierarchical data as a set of nested rectangles.:::image type=“icon” source=“media/renderoperator/tree-map.png” border=“false”:::
VisualizationDescriptionIllustration
areachartArea graph. First column is the x-axis and should be a numeric column. Other numeric columns are y-axes.:::image type=“icon” source=“media/renderoperator/area-chart.png” border=“false”:::
barchartFirst column is the x-axis and can be text, datetime or numeric. Other columns are numeric, displayed as horizontal strips.:::image type=“icon” source=“media/renderoperator/bar-chart.png” border=“false”:::
columnchartLike barchart with vertical strips instead of horizontal strips.:::image type=“icon” source=“media/renderoperator/column-chart.png” border=“false”:::
piechartFirst column is color-axis, second column is numeric.:::image type=“icon” source=“media/renderoperator/pie-chart.png” border=“false”:::
scatterchartPoints graph. First column is the x-axis and should be a numeric column. Other numeric columns are y-axes.:::image type=“icon” source=“media/renderoperator/scatter-chart.png” border=“false”:::
tableDefault - results are shown as a table.:::image type=“icon” source=“media/renderoperator/table-visualization.png” border=“false”:::
timechartLine graph. First column is x-axis, and should be datetime. Other (numeric) columns are y-axes. There’s one string column whose values are used to “group” the numeric columns and create different lines in the chart (further string columns are ignored).:::image type=“icon” source=“media/renderoperator/visualization-timechart.png” border=“false”:::
visualizationDescriptionIllustration
anomalychartSimilar to timechart, but highlights anomalies using series_decompose_anomalies function.:::image type=“icon” source=“media/renderoperator/anomaly-chart.png” border=“false”:::
areachartArea graph.:::image type=“icon” source=“media/renderoperator/area-chart.png” border=“false”:::
barchartdisplayed as horizontal strips.:::image type=“icon” source=“media/renderoperator/bar-chart.png” border=“false”:::
cardFirst result record is treated as set of scalar values and shows as a card.:::image type=“icon” source=“media/renderoperator/card.png” border=“false”:::
columnchartLike barchart with vertical strips instead of horizontal strips.:::image type=“icon” source=“media/renderoperator/column-chart.png” border=“false”:::
linechartLine graph.:::image type=“icon” source=“media/renderoperator/line-chart.png” border=“false”:::
piechartFirst column is color-axis, second column is numeric.:::image type=“icon” source=“media/renderoperator/pie-chart.png” border=“false”:::
scatterchartPoints graph.:::image type=“icon” source=“media/renderoperator/scatter-chart.png” border=“false”:::
stackedareachartStacked area graph.:::image type=“icon” source=“media/renderoperator/stacked-area-chart.png” border=“false”:::
tableDefault - results are shown as a table.:::image type=“icon” source=“media/renderoperator/table-visualization.png” border=“false”:::
timechartLine graph. First column is x-axis, and must be datetime. Other (numeric) columns are y-axes.:::image type=“icon” source=“media/renderoperator/visualization-timechart.png” border=“false”:::

Supported properties

PropertyName/PropertyValue indicate additional information to use when rendering. All properties are optional. The supported properties are:

PropertyNamePropertyValue
accumulateWhether the value of each measure gets added to all its predecessors. (true or false)
kindFurther elaboration of the visualization kind. For more information, see kind property.
legendWhether to display a legend or not (visible or hidden).
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
yminThe minimum value to be displayed on Y-axis.
ymaxThe maximum value to be displayed on Y-axis.
titleThe title of the visualization (of type string).
xaxisHow to scale the x-axis (linear or log).
xcolumnWhich column in the result is used for the x-axis.
xtitleThe title of the x-axis (of type string).
yaxisHow to scale the y-axis (linear or log).
ycolumnsComma-delimited list of columns that consist of the values provided per value of the x column.
ysplitHow to split the visualization into multiple y-axis values. For more information, see y-split property.
ytitleThe title of the y-axis (of type string).
anomalycolumnsProperty relevant only for anomalychart. Comma-delimited list of columns, which will be considered as anomaly series and displayed as points on the chart
PropertyNamePropertyValue
kindFurther elaboration of the visualization kind. For more information, see kind property.
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
titleThe title of the visualization (of type string).

kind property

This visualization can be further elaborated by providing the kind property. The supported values of this property are:

VisualizationkindDescription
areachartdefaultEach “area” stands on its own.
unstackedSame as default.
stackedStack “areas” to the right.
stacked100Stack “areas” to the right and stretch each one to the same width as the others.
barchartdefaultEach “bar” stands on its own.
unstackedSame as default.
stackedStack “bars”.
stacked100Stack “bars” and stretch each one to the same width as the others.
columnchartdefaultEach “column” stands on its own.
unstackedSame as default.
stackedStack “columns” one atop the other.
stacked100Stack “columns” and stretch each one to the same height as the others.
scatterchartmapExpected columns are [Longitude, Latitude] or GeoJSON point. Series column is optional. For more information, see Geospatial visualizations.
piechartmapExpected columns are [Longitude, Latitude] or GeoJSON point, color-axis and numeric. Supported in Kusto Explorer desktop. For more information, see Geospatial visualizations.

ysplit property

Some visualizations support splitting into multiple y-axis values:

ysplitDescription
noneA single y-axis is displayed for all series data. (Default)
axesA single chart is displayed with multiple y-axes (one per series).
panelsOne chart is rendered for each ycolumn value. Maximum five panels.

How to render continuous data

Several visualizations are used for rendering sequences of values, for example, linechart, timechart, and areachart. These visualizations have the following conceptual model:

  • One column in the table represents the x-axis of the data. This column can be explicitly defined using the xcolumn property. If not defined, the user agent picks the first column that is appropriate for the visualization.
    • For example: in the timechart visualization, the user agent uses the first datetime column.
    • If this column is of type dynamic and it holds an array, the individual values in the array will be treated as the values of the x-axis.
  • One or more columns in the table represent one or more measures that vary by the x-axis. These columns can be explicitly defined using the ycolumns property. If not defined, the user agent picks all columns that are appropriate for the visualization.
    • For example: in the timechart visualization, the user agent uses all columns with a numeric value that haven’t been specified otherwise.
    • If the x-axis is an array, the values of each y-axis should also be an array of a similar length, with each y-axis occurring in a single column.
  • Zero or more columns in the table represent a unique set of dimensions that group together the measures. These columns can be specified by the series property, or the user agent will pick them automatically from the columns that are otherwise unspecified.

three kinds of columns: property). For each record, the series has as many measurements (“points” in the chart) as there are y-axis columns.

Example

InsightsMetrics
| where Computer == "DC00.NA.contosohotels.com"
| where Namespace  == "Processor" and Name == "UtilizationPercentage"
| summarize avg(Val) by Computer, bin(TimeGenerated, 1h)
| render timechart

3 - Summarize operator

3.1 - Kusto partition & compose intermediate aggregation results

Learn how to use the hll() and tdigest() functions to partition and compose intermediate results of aggregations.

Suppose you want to calculate the count of distinct users every day over the last seven days. You can run summarize dcount(user) once a day with a span filtered to the last seven days. This method is inefficient, because each time the calculation is run, there’s a six-day overlap with the previous calculation. You can also calculate an aggregate for each day, and then combine these aggregates. This method requires you to “remember” the last six results, but it’s much more efficient.

Partitioning queries as described is easy for simple aggregates, such as count() and sum(). It can also be useful for complex aggregates, such as dcount() and percentiles(). This article explains how Kusto supports such calculations.

The following examples show how to use hll/tdigest and demonstrate that using these commands is highly performant in some scenarios:

range x from 1 to 1000000 step 1
| summarize hll(x,4)
| project sizeInMb = estimate_data_size(hll_x) / pow(1024,2)

Output

sizeInMb
1.0000524520874

Ingesting this object into a table before applying this kind of policy will ingest null:

.set-or-append MyTable <| range x from 1 to 1000000 step 1
| summarize hll(x,4)
MyTable
| project isempty(hll_x)

Output

Column1
1

To avoid ingesting null, use the special encoding policy type bigobject, which overrides the MaxValueSize to 2 MB like this:

.alter column MyTable.hll_x policy encoding type='bigobject'

Ingesting a value now to the same table above:

.set-or-append MyTable <| range x from 1 to 1000000 step 1
| summarize hll(x,4)

ingests the second value successfully:

MyTable
| project isempty(hll_x)

Output

Column1
1
0

Example: Count with binned timestamp

There’s a table, PageViewsHllTDigest, containing hll values of Pages viewed in each hour. You want these values binned to 12h. Merge the hll values using the hll_merge() aggregate function, with the timestamp binned to 12h. Use the function dcount_hll to return the final dcount value:

PageViewsHllTDigest
| summarize merged_hll = hll_merge(hllPage) by bin(Timestamp, 12h)
| project Timestamp , dcount_hll(merged_hll)

Output

Timestampdcount_hll_merged_hll
2016-05-01 12:00:00.000000020056275
2016-05-02 00:00:00.000000038797623
2016-05-02 12:00:00.000000039316056
2016-05-03 00:00:00.000000013685621

To bin timestamp for 1d:

PageViewsHllTDigest
| summarize merged_hll = hll_merge(hllPage) by bin(Timestamp, 1d)
| project Timestamp , dcount_hll(merged_hll)

Output

Timestampdcount_hll_merged_hll
2016-05-01 00:00:00.000000020056275
2016-05-02 00:00:00.000000064135183
2016-05-03 00:00:00.000000013685621

The same query may be done over the values of tdigest, which represent the BytesDelivered in each hour:

PageViewsHllTDigest
| summarize merged_tdigests = merge_tdigest(tdigestBytesDel) by bin(Timestamp, 12h)
| project Timestamp , percentile_tdigest(merged_tdigests, 95, typeof(long))

Output

Timestamppercentile_tdigest_merged_tdigests
2016-05-01 12:00:00.0000000170200
2016-05-02 00:00:00.0000000152975
2016-05-02 12:00:00.0000000181315
2016-05-03 00:00:00.0000000146817

Example: Temporary table

Kusto limits are reached with datasets that are too large, where you need to run periodic queries over the dataset, but run the regular queries to calculate percentile() or dcount() over large datasets.

To solve this problem, newly added data may be added to a temp table as hll or tdigest values using hll() when the required operation is dcount or tdigest() when the required operation is percentile using set/append or update policy. In this case, the intermediate results of dcount or tdigest are saved into another dataset, which should be smaller than the target large one.

To solve this problem, newly added data may be added to a temp table as hll or tdigest values using hll() when the required operation is dcount. In this case, the intermediate results of dcount are saved into another dataset, which should be smaller than the target large one.

When you need to get the final results of these values, the queries may use hll/tdigest mergers: hll-merge()/tdigest_merge(). Then, after getting the merged values, percentile_tdigest() / dcount_hll() may be invoked on these merged values to get the final result of dcount or percentiles.

Assuming there’s a table, PageViews, into which data is ingested daily, every day on which you want to calculate the distinct count of pages viewed per minute later than date = datetime(2016-05-01 18:00:00.0000000).

Run the following query:

PageViews
| where Timestamp > datetime(2016-05-01 18:00:00.0000000)
| summarize percentile(BytesDelivered, 90), dcount(Page,2) by bin(Timestamp, 1d)

Output

Timestamppercentile_BytesDelivered_90dcount_Page
2016-05-01 00:00:00.00000008363420056275
2016-05-02 00:00:00.00000008277064135183
2016-05-03 00:00:00.00000007292013685621

This query aggregates all the values every time you run this query (for example, if you want to run it many times a day).

If you save the hll and tdigest values (which are the intermediate results of dcount and percentile) into a temp table, PageViewsHllTDigest, using an update policy or set/append commands, you may only merge the values and then use dcount_hll/percentile_tdigest using the following query:

PageViewsHllTDigest
| summarize  percentile_tdigest(merge_tdigest(tdigestBytesDel), 90), dcount_hll(hll_merge(hllPage)) by bin(Timestamp, 1d)

Output

Timestamppercentile_tdigest_merge_tdigests_tdigestBytesDeldcount_hll_hll_merge_hllPage
2016-05-01 00:00:00.00000008422420056275
2016-05-02 00:00:00.00000008348664135183
2016-05-03 00:00:00.00000007224713685621

This query should be more performant, as it runs over a smaller table. In this example, the first query runs over ~215M records, while the second one runs over just 32 records:

Example: Intermediate results

The Retention Query. Assume you have a table that summarizes when each Wikipedia page was viewed (sample size is 10M), and you want to find for each date1 date2 the percentage of pages reviewed in both date1 and date2 relative to the pages viewed on date1 (date1 < date2).

The trivial way uses join and summarize operators:

// Get the total pages viewed each day
let totalPagesPerDay = PageViewsSample
| summarize by Page, Day = startofday(Timestamp)
| summarize count() by Day;
// Join the table to itself to get a grid where 
// each row shows foreach page1, in which two dates
// it was viewed.
// Then count the pages between each two dates to
// get how many pages were viewed between date1 and date2.
PageViewsSample
| summarize by Page, Day1 = startofday(Timestamp)
| join kind = inner
(
    PageViewsSample
    | summarize by Page, Day2 = startofday(Timestamp)
)
on Page
| where Day2 > Day1
| summarize count() by Day1, Day2
| join kind = inner
    totalPagesPerDay
on $left.Day1 == $right.Day
| project Day1, Day2, Percentage = count_*100.0/count_1

Output

Day1Day2Percentage
2016-05-01 00:00:00.00000002016-05-02 00:00:00.000000034.0645725975255
2016-05-01 00:00:00.00000002016-05-03 00:00:00.000000016.618368960101
2016-05-02 00:00:00.00000002016-05-03 00:00:00.000000014.6291376489636

The above query took ~18 seconds.

When you use the hll(), hll_merge(), and dcount_hll() functions, the equivalent query will end after ~1.3 seconds and show that the hll functions speeds up the query above by ~14 times:

let Stats=PageViewsSample | summarize pagehll=hll(Page, 2) by day=startofday(Timestamp); // saving the hll values (intermediate results of the dcount values)
let day0=toscalar(Stats | summarize min(day)); // finding the min date over all dates.
let dayn=toscalar(Stats | summarize max(day)); // finding the max date over all dates.
let daycount=tolong((dayn-day0)/1d); // finding the range between max and min
Stats
| project idx=tolong((day-day0)/1d), day, pagehll
| mv-expand pidx=range(0, daycount) to typeof(long)
// Extend the column to get the dcount value from hll'ed values for each date (same as totalPagesPerDay from the above query)
| extend key1=iff(idx < pidx, idx, pidx), key2=iff(idx < pidx, pidx, idx), pages=dcount_hll(pagehll)
// For each two dates, merge the hll'ed values to get the total dcount over each two dates, 
// This helps to get the pages viewed in both date1 and date2 (see the description below about the intersection_size)
| summarize (day1, pages1)=arg_min(day, pages), (day2, pages2)=arg_max(day, pages), union_size=dcount_hll(hll_merge(pagehll)) by key1, key2
| where day2 > day1
// To get pages viewed in date1 and also date2, look at the merged dcount of date1 and date2, subtract it from pages of date1 + pages on date2.
| project pages1, day1,day2, intersection_size=(pages1 + pages2 - union_size)
| project day1, day2, Percentage = intersection_size*100.0 / pages1

Output

day1day2Percentage
2016-05-01 00:00:00.00000002016-05-02 00:00:00.000000033.2298494510578
2016-05-01 00:00:00.00000002016-05-03 00:00:00.000000016.9773830213667
2016-05-02 00:00:00.00000002016-05-03 00:00:00.000000014.5160020350006

3.2 - summarize operator

Learn how to use the summarize operator to produce a table that summarizes the content of the input table.

Produces a table that aggregates the content of the input table.

Syntax

T | summarize [ SummarizeParameters ] [[Column =] Aggregation [, …]] [by [Column =] GroupExpression [, …]]

Parameters

NameTypeRequiredDescription
ColumnstringThe name for the result column. Defaults to a name derived from the expression.
Aggregationstring✔️A call to an aggregation function such as count() or avg(), with column names as arguments.
GroupExpressionscalar✔️A scalar expression that can reference the input data. The output will have as many records as there are distinct values of all the group expressions.
SummarizeParametersstringZero or more space-separated parameters in the form of Name = Value that control the behavior. See supported parameters.

Supported parameters

NameDescription
hint.num_partitionsSpecifies the number of partitions used to share the query load on cluster nodes. See shuffle query
hint.shufflekey=<key>The shufflekey query shares the query load on cluster nodes, using a key to partition data. See shuffle query
hint.strategy=shuffleThe shuffle strategy query shares the query load on cluster nodes, where each node will process one partition of the data. See shuffle query

Returns

The input rows are arranged into groups having the same values of the by expressions. Then the specified aggregation functions are computed over each group, producing a row for each group. The result contains the by columns and also at least one column for each computed aggregate. (Some aggregation functions return multiple columns.)

The result has as many rows as there are distinct combinations of by values (which may be zero). If there are no group keys provided, the result has a single record.

To summarize over ranges of numeric values, use bin() to reduce ranges to discrete values.

Default values of aggregations

The following table summarizes the default values of aggregations:

OperatorDefault value
count(), countif(), dcount(), dcountif(), count_distinct(), sum(), sumif(), variance(), varianceif(), stdev(), stdevif()0
make_bag(), make_bag_if(), make_list(), make_list_if(), make_set(), make_set_if()empty dynamic array ([])
All othersnull

Examples

The example in this section shows how to use the syntax to help you get started.

Summarize price by fruit and supplier.

Unique combination

The following query determines what unique combinations of State and EventType there are for storms that resulted in direct injury. There are no aggregation functions, just group-by keys. The output will just show the columns for those results.

StormEvents
| where InjuriesDirect > 0
| summarize by State, EventType

Output

The following table shows only the first 5 rows. To see the full output, run the query.

StateEventType
TEXASThunderstorm Wind
TEXASFlash Flood
TEXASWinter Weather
TEXASHigh Wind
TEXASFlood

Minimum and maximum timestamp

Finds the minimum and maximum heavy rain storms in Hawaii. There’s no group-by clause, so there’s just one row in the output.

StormEvents
| where State == "HAWAII" and EventType == "Heavy Rain"
| project Duration = EndTime - StartTime
| summarize Min = min(Duration), Max = max(Duration)

Output

MinMax
01:08:0011:55:00

Distinct count

The following query calculates the number of unique storm event types for each state and sorts the results by the number of unique storm types:

StormEvents
| summarize TypesOfStorms=dcount(EventType) by State
| sort by TypesOfStorms

Output

The following table shows only the first 5 rows. To see the full output, run the query.

StateTypesOfStorms
TEXAS27
CALIFORNIA26
PENNSYLVANIA25
GEORGIA24
ILLINOIS23

Histogram

The following example calculates a histogram storm event types that had storms lasting longer than 1 day. Because Duration has many values, use bin() to group its values into 1-day intervals.

StormEvents
| project EventType, Duration = EndTime - StartTime
| where Duration > 1d
| summarize EventCount=count() by EventType, Length=bin(Duration, 1d)
| sort by Length

Output

EventTypeLengthEventCount
Drought30.00:00:001646
Wildfire30.00:00:0011
Heat30.00:00:0014
Flood30.00:00:0020
Heavy Rain29.00:00:0042

Aggregates default values

When the input of summarize operator has at least one empty group-by key, its result is empty, too.

When the input of summarize operator doesn’t have an empty group-by key, the result is the default values of the aggregates used in the summarize For more information, see Default values of aggregations.

datatable(x:long)[]
| summarize any_x=take_any(x), arg_max_x=arg_max(x, *), arg_min_x=arg_min(x, *), avg(x), buildschema(todynamic(tostring(x))), max(x), min(x), percentile(x, 55), hll(x) ,stdev(x), sum(x), sumif(x, x > 0), tdigest(x), variance(x)

Output

any_xarg_max_xarg_min_xavg_xschema_xmax_xmin_xpercentile_x_55hll_xstdev_xsum_xsumif_xtdigest_xvariance_x
NaN0000

The result of avg_x(x) is NaN due to dividing by 0.

datatable(x:long)[]
| summarize  count(x), countif(x > 0) , dcount(x), dcountif(x, x > 0)

Output

count_xcountif_dcount_xdcountif_x
0000
datatable(x:long)[]
| summarize  make_set(x), make_list(x)

Output

set_xlist_x
[][]

The aggregate avg sums all the non-nulls and counts only those which participated in the calculation (won’t take nulls into account).

range x from 1 to 4 step 1
| extend y = iff(x == 1, real(null), real(5))
| summarize sum(y), avg(y)

Output

sum_yavg_y
155

The regular count will count nulls:

range x from 1 to 2 step 1
| extend y = iff(x == 1, real(null), real(5))
| summarize count(y)

Output

count_y
2
range x from 1 to 2 step 1
| extend y = iff(x == 1, real(null), real(5))
| summarize make_set(y), make_set(y)

Output

set_yset_y1
[5.0][5.0]

4 - as operator

Learn how to use the as operator to bind a name to the operator’s input tabular expression.

Binds a name to the operator’s input tabular expression. This operator allows the query to reference the value of the tabular expression multiple times without breaking the query and binding a name through the let statement.

To optimize multiple uses of the as operator within a single query, see Named expressions.

Syntax

T | as [hint.materialized = Materialized] Name

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular expression to rename.
Namestring✔️The temporary name for the tabular expression.
hint.materializedboolIf Materialized is set to true, the value of the tabular expression output is wrapped by a materialize() function call. Otherwise, the value is recalculated on every reference.

Examples

In the following two examples, the generated TableName column consists of ‘T1’ and ‘T2’.

range x from 1 to 5 step 1 
| as T1 
| union withsource=TableName (range x from 1 to 5 step 1 | as T2)

Alternatively, you can write the same example as follows:

union withsource=TableName (range x from 1 to 5 step 1 | as T1), (range x from 1 to 5 step 1 | as T2)

Output

TableNamex
T11
T12
T13
T14
T15
T21
T22
T23
T24
T25

In the following example, the ’left side’ of the join is: MyLogTable filtered by type == "Event" and Name == "Start" and the ‘right side’ of the join is: MyLogTable filtered by type == "Event" and Name == "Stop"

MyLogTable  
| where type == "Event"
| as T
| where Name == "Start"
| join (
    T
    | where Name == "Stop"
) on ActivityId

5 - consume operator

Learn how to use the consume operator to consume the tabular data stream handed to the operator.

Consumes the tabular data stream handed to the operator.

The consume operator is mostly used for triggering the query side-effect without actually returning the results back to the caller.

The consume operator can be used for estimating the cost of a query without actually delivering the results back to the client. (The estimation isn’t exact for various reasons; for example, consume is calculated distributively, so T | consume won’t transmit the table’s data between the nodes of the cluster.)

Syntax

consume [decodeblocks = DecodeBlocks]

Parameters

NameTypeRequiredDescription
DecodeBlocksboolIf set to true, or if the request property perftrace is set to true, the consume operator won’t just enumerate the records at its input, but actually force each value in those records to be decompressed and decoded.

6 - count operator

Learn how to use the count operator to return the number of records in the input record set.

Returns the number of records in the input record set.

Syntax

T | count

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose records are to be counted.

Returns

This function returns a table with a single record and column of type long. The value of the only cell is the number of records in T.

Example

When you use the count operator with a table name, like StormEvents, it will return the total number of records in that table.

StormEvents | count

Output

Count
59066

For information about the count() aggregation function, see count() (aggregation function).

7 - datatable operator

Learn how to use the datatable operator to define a table with given schema and data.

Returns a table whose schema and values are defined in the query itself.

Syntax

datatable( ColumnName : ColumnType [, …]) [ ScalarValue [, …] ]

Parameters

NameTypeRequiredDescription
ColumnNamestring✔️The name for a column.
ColumnTypestring✔️The type of data in the column.
ScalarValuescalar✔️The value to insert into the table. The total number of values must be a multiple of the number of columns in the table. Each value is assigned to a column based on its position. Specifically, the n’th value is assigned to the column at position n % NumColumns, where NumColumns is the total number of columns.

Returns

This operator returns a data table of the given schema and data.

Example

This example creates a table with Date, Event, and MoreData columns, filters rows with Event descriptions longer than 4 characters, and adds a new column key2 to each row from the MoreData dynamic object.

datatable(Date:datetime, Event:string, MoreData:dynamic) [
    datetime(1910-06-11), "Born", dynamic({"key1":"value1", "key2":"value2"}),
    datetime(1930-01-01), "Enters Ecole Navale", dynamic({"key1":"value3", "key2":"value4"}),
    datetime(1953-01-01), "Published first book", dynamic({"key1":"value5", "key2":"value6"}),
    datetime(1997-06-25), "Died", dynamic({"key1":"value7", "key2":"value8"}),
]
| where strlen(Event) > 4
| extend key2 = MoreData.key2

Output

DateEventMoreDatakey2
1930-01-01 00:00:00.0000000Enters Ecole Navale{
“key1”: “value3”,
“key2”: “value4”
}
value4
1953-01-01 00:00:00.0000000Published first book{
“key1”: “value5”,
“key2”: “value6”
}
value6

8 - distinct operator

Learn how to use the distinct operator to create a table with the distinct combination of the columns of the input table.

Produces a table with the distinct combination of the provided columns of the input table.

Syntax

T | distinct ColumnName[,ColumnName2, ...]

Parameters

NameTypeRequiredDescription
ColumnNamestring✔️The column name to search for distinct values.

Example

Shows distinct combination of states and type of events that led to over 45 direct injuries.

StormEvents
| where InjuriesDirect > 45
| distinct State, EventType

Output

StateEventType
TEXASWinter Weather
KANSASTornado
MISSOURIExcessive Heat
OKLAHOMAThunderstorm Wind
OKLAHOMAExcessive Heat
ALABAMATornado
ALABAMAHeat
TENNESSEEHeat
CALIFORNIAWildfire

If the group by keys are of high cardinalities, try summarize by ... with the shuffle strategy.

9 - evaluate plugin operator

Learn how to use the evaluate plugin operator to invoke plugins.

Invokes a service-side query extension (plugin).

The evaluate operator is a tabular operator that allows you to invoke query language extensions known as plugins. Unlike other language constructs, plugins can be enabled or disabled. Plugins aren’t “bound” by the relational nature of the language. In other words, they may not have a predefined, statically determined, output schema.

Syntax

[T |] evaluate [ evaluateParameters ] PluginName ([ PluginArgs ])

Parameters

NameTypeRequiredDescription
TstringA tabular input to the plugin. Some plugins don’t take any input and act as a tabular data source.
evaluateParametersstringZero or more space-separated evaluate parameters in the form of Name = Value that control the behavior of the evaluate operation and execution plan. Each plugin may decide differently how to handle each parameter. Refer to each plugin’s documentation for specific behavior.
PluginNamestring✔️The mandatory name of the plugin being invoked.
PluginArgsstringZero or more comma-separated arguments to provide to the plugin.

Evaluate parameters

The following parameters are supported:

NameValuesDescription
hint.distributionsingle, per_node, per_shardDistribution hints
hint.pass_filterstrue, falseAllow evaluate operator to passthrough any matching filters before the plugin. Filter is considered as ‘matched’ if it refers to a column existing before the evaluate operator. Default: false
hint.pass_filters_columncolumn_nameAllow plugin operator to passthrough filters referring to column_name before the plugin. Parameter can be used multiple times with different column names.

Plugins

The following plugins are supported:

Distribution hints

Distribution hints specify how the plugin execution will be distributed across multiple cluster nodes. Each plugin may implement a different support for the distribution. The plugin’s documentation specifies the distribution options supported by the plugin.

Possible values:

  • single: A single instance of the plugin will run over the entire query data.
  • per_node: If the query before the plugin call is distributed across nodes, then an instance of the plugin will run on each node over the data that it contains.
  • per_shard: If the data before the plugin call is distributed across shards, then an instance of the plugin will run over each shard of the data.

10 - extend operator

Learn how to use the extend operator to create calculated columns and append them to the result set.

Creates calculated columns and append them to the result set.

Syntax

T | extend [ColumnName | (ColumnName[, …]) =] Expression [, …]

Parameters

NameTypeRequiredDescription
Tstring✔️Tabular input to extend.
ColumnNamestringName of the column to add or update.
Expressionstring✔️Calculation to perform over the input.
  • If ColumnName is omitted, the output column name of Expression is automatically generated.
  • If Expression returns more than one column, a list of column names can be specified in parentheses. Then, Expression’s output columns is given the specified names. If a list of the column names isn’t specified, all Expression’s output columns with generated names are added to the output.

Returns

A copy of the input tabular result set, such that:

  1. Column names noted by extend that already exist in the input are removed and appended as their new calculated values.
  2. Column names noted by extend that don’t exist in the input are appended as their new calculated values.

not have an index. In most cases, if the new column is set to be exactly the same as an existing table column that has an index, Kusto can automatically use the existing index. However, in some complex scenarios this propagation is not done. In such cases, if the goal is to rename a column, use the project-rename operator instead.

Example

StormEvents
| project EndTime, StartTime
| extend Duration = EndTime - StartTime

The following table shows only the first 10 results. To see the full output, run the query.

EndTimeStartTimeDuration
2007-01-01T00:00:00Z2007-01-01T00:00:00Z00:00:00
2007-01-01T00:25:00Z2007-01-01T00:25:00Z00:00:00
2007-01-01T02:24:00Z2007-01-01T02:24:00Z00:00:00
2007-01-01T03:45:00Z2007-01-01T03:45:00Z00:00:00
2007-01-01T04:35:00Z2007-01-01T04:35:00Z00:00:00
2007-01-01T04:37:00Z2007-01-01T03:37:00Z01:00:00
2007-01-01T05:00:00Z2007-01-01T00:00:00Z05:00:00
2007-01-01T05:00:00Z2007-01-01T00:00:00Z05:00:00
2007-01-01T06:00:00Z2007-01-01T00:00:00Z06:00:00
2007-01-01T06:00:00Z2007-01-01T00:00:00Z06:00:00

11 - externaldata operator

Learn how to use the externaldata operator to return a data table of the given schema whose data was parsed from the specified storage artifact.

The externaldata operator returns a table whose schema is defined in the query itself, and whose data is read from an external storage artifact, such as a blob in Azure Blob Storage or a file in Azure Data Lake Storage.

Syntax

externaldata (columnName:columnType [, …] ) [ storageConnectionString [, …] ] [with ( propertyName = propertyValue [, …])]

Parameters

NameTypeRequiredDescription
columnName, columnTypestring✔️A list of column names and their types. This list defines the schema of the table.
storageConnectionStringstring✔️A storage connection string of the storage artifact to query.
propertyName, propertyValuestringA list of optional supported properties that determines how to interpret the data retrieved from storage.

Supported properties

PropertyTypeDescription
formatstringThe data format. If unspecified, an attempt is made to detect the data format from file extension. The default is CSV. All ingestion data formats are supported.
ignoreFirstRecordboolIf set to true, the first record in every file is ignored. This property is useful when querying CSV files with headers.
ingestionMappingstringIndicates how to map data from the source file to the actual columns in the operator result set. See data mappings.

Returns

The externaldata operator returns a data table of the given schema whose data was parsed from the specified storage artifact, indicated by the storage connection string.

Examples

The examples query data in an external storage file.

Fetch a list of user IDs stored in Azure Blob Storage

The following example shows how to find all records in a table whose UserID column falls into a known set of IDs, held (one per line) in an external storage file. Since the data format isn’t specified, the detected data format is TXT.

Users
| where UserID in ((externaldata (UserID:string) [
    @"https://storageaccount.blob.core.windows.net/storagecontainer/users.txt" 
      h@"?...SAS..." // Secret token needed to access the blob
    ]))
| ...

Query multiple data files

The following example queries multiple data files stored in external storage.

externaldata(Timestamp:datetime, ProductId:string, ProductDescription:string)
[
  h@"https://mycompanystorage.blob.core.windows.net/archivedproducts/2019/01/01/part-00000-7e967c99-cf2b-4dbb-8c53-ce388389470d.csv.gz?...SAS...",
  h@"https://mycompanystorage.blob.core.windows.net/archivedproducts/2019/01/02/part-00000-ba356fa4-f85f-430a-8b5a-afd64f128ca4.csv.gz?...SAS...",
  h@"https://mycompanystorage.blob.core.windows.net/archivedproducts/2019/01/03/part-00000-acb644dc-2fc6-467c-ab80-d1590b23fc31.csv.gz?...SAS..."
]
with(format="csv")
| summarize count() by ProductId

The above example can be thought of as a quick way to query multiple data files without defining an external table.

Query hierarchical data formats

To query hierarchical data format, such as JSON, Parquet, Avro, or ORC, ingestionMapping must be specified in the operator properties. In this example, there’s a JSON file stored in Azure Blob Storage with the following contents:

{
  "timestamp": "2019-01-01 10:00:00.238521",   
  "data": {    
    "tenant": "e1ef54a6-c6f2-4389-836e-d289b37bcfe0",   
    "method": "RefreshTableMetadata"   
  }   
}   
{
  "timestamp": "2019-01-01 10:00:01.845423",   
  "data": {   
    "tenant": "9b49d0d7-b3e6-4467-bb35-fa420a25d324",   
    "method": "GetFileList"   
  }   
}
...

To query this file using the externaldata operator, a data mapping must be specified. The mapping dictates how to map JSON fields to the operator result set columns:

externaldata(Timestamp: datetime, TenantId: guid, MethodName: string)
[ 
   h@'https://mycompanystorage.blob.core.windows.net/events/2020/09/01/part-0000046c049c1-86e2-4e74-8583-506bda10cca8.json?...SAS...'
]
with(format='multijson', ingestionMapping='[{"Column":"Timestamp","Properties":{"Path":"$.timestamp"}},{"Column":"TenantId","Properties":{"Path":"$.data.tenant"}},{"Column":"MethodName","Properties":{"Path":"$.data.method"}}]')

The MultiJSON format is used here because single JSON records are spanned into multiple lines.

For more info on mapping syntax, see data mappings.

12 - facet operator

Learn how to use the facet operator to return a table for each specified column.

Returns a set of tables, one for each column specified in the facet clause. Each table contains the list of values taken by its column. An additional table can be created by using the with clause. Facet result tables can’t be renamed or referenced by any additional operators.

Syntax

T | facet by ColumnName [, ColumnName2, …] [with ( filterPipe )]

Parameters

NameTypeRequiredDescription
ColumnNamestring✔️The column name, or list of column names, to be summarized.
filterPipestringA query expression applied to the input table.

Returns

Multiple tables: one for the with clause, and one for each column.

Example

StormEvents
| where State startswith "A" and EventType has "Heavy"
| facet by State, EventType
    with 
    (
    where StartTime between(datetime(2007-01-04) .. 7d) 
    | project State, StartTime, Source, EpisodeId, EventType
    | take 5
    )

The following is the table generated by the with clause.

StateStartTimeSourceEpisodeIdEventType
ALASKA2007-01-04 12:00:00.0000000COOP Observer2192Heavy Snow
ALASKA2007-01-04 15:00:00.0000000Trained Spotter2192Heavy Snow
ALASKA2007-01-04 15:00:00.0000000Trained Spotter2192Heavy Snow
ALASKA2007-01-04 15:00:00.0000000Trained Spotter2192Heavy Snow
ALASKA2007-01-06 18:00:00.0000000COOP Observer2193Heavy Snow

The following table is the State facet output table.

Statecount_State
ALABAMA19
ARIZONA33
ARKANSAS1
AMERICAN SAMOA1
ALASKA58

The following table is the EventType facet output table.

EventTypecount_EventType
Heavy Rain34
Heavy Snow78

13 - find operator

Learn how to use the find operator to find rows that match a predicate across a set of tables.

Finds rows that match a predicate across a set of tables.

The scope of the find operator can also be cross-database or cross-cluster.

find in (Table1, Table2, Table3) where Fruit=="apple"

find in (database('*').*) where Fruit == "apple"

find in (cluster('cluster_name').database('MyDB*').*) where Fruit == "apple"
find in (Table1, Table2, Table3) where Fruit=="apple"

Syntax

  • find [withsource= ColumnName] [in (Tables)] where Predicate [project-smart | project ColumnName[: ColumnType , … ] [, pack_all()]]

  • find Predicate [project-smart | project ColumnName[: ColumnType , … ] [, pack_all()]]

Parameters

NameTypeRequiredDescription
ColumnNamestringBy default, the output includes a column called source_ whose values indicate which source table contributed to each row. If specified, ColumnName is used instead of source_. After wildcard matching, if the query references tables from more than one database including the default database, the value of this column has a table name qualified with the database. Similarly cluster and database qualifications are present in the value if more than one cluster is referenced.
Predicatebool✔️This boolean expression is evaluated for each row in each input table. For more information, see predicate-syntax details.
TablesstringZero or more comma-separated table references. By default, find looks in all the tables in the current database. You can use:
1. The name of a table, such as Events
2. A query expression, such as `(Events
project-smart or projectstringIf not specified, project-smart is used by default. For more information, see output-schema details.
  • withsource=ColumnName: Optional. By default, the output includes a column called source_ whose values indicate which source table contributed each row. If specified, ColumnName is used instead of source_.

  • Predicate: A boolean expression over the columns of the input tables Table [, Table, …]. It’s evaluated for each row in each input table. For more information, see predicate-syntax details.

  • Tables: Optional. Zero or more comma-separated table references. By default find searches all tables for:

    • The name of a table, such as Events
    • A query expression, such as (Events | where id==42)
    • A set of tables specified with a wildcard. For example, E* would form the union of all the tables whose names begin with E.
  • project-smart | project: If not specified project-smart is used by default. For more information, see output-schema details.

Returns

Transformation of rows in Table [, Table, …] for which Predicate is true. The rows are transformed according to the output schema.

Output schema

source_ column

The find operator output always includes a source_ column with the source table name. The column can be renamed using the withsource parameter.

results columns

Source tables that don’t contain any column used by the predicate evaluation, are filtered out.

When you use project-smart, the columns that appear in the output are:

  • Columns that appear explicitly in the predicate.
  • Columns that are common to all the filtered tables.

The rest of the columns are packed into a property bag and appear in an extra pack column. A column that is referenced explicitly by the predicate and appears in multiple tables with multiple types, has a different column in the result schema for each such type. Each of the column names is constructed from the original column name and type, separated by an underscore.

When using project ColumnName[: ColumnType , … ] [, pack_all()]:

  • The result table includes the columns specified in the list. If a source table doesn’t contain a certain column, the values in the corresponding rows are null.
  • When you specify a ColumnType with a ColumnName, this column in the “result” has the given type, and the values are cast to that type if needed. The casting doesn’t have an effect on the column type when evaluating the Predicate.
  • When pack_all() is used, all the columns, including the projected columns, are packed into a property bag and appear in an extra column, by default ‘column1’. In the property bag, the source column name serves as the property name and the column’s value serves as the property value.

Predicate syntax

The find operator supports an alternative syntax for the * has term, and using just term, searches a term across all input columns.

For a summary of some filtering functions, see where operator.

Considerations

  • If the project clause references a column that appears in multiple tables and has multiple types, a type must follow this column reference in the project clause
  • If a column appears in multiple tables and has multiple types and project-smart is in use, there’s a corresponding column for each type in the find’s result, as described in union
  • When you use project-smart, changes in the predicate, in the source tables set, or in the tables schema, might result in a change to the output schema. If a constant result schema is needed, use project instead
  • find scope can’t include functions. To include a function in the find scope, define a let statement with view keyword.

Performance tips

  • Use tables as opposed to tabular expressions. If tabular expression, the find operator falls back to a union query that can result in degraded performance.
  • If a column that appears in multiple tables and has multiple types, is part of the project clause, prefer adding a ColumnType to the project clause over modifying the table before passing it to find.
  • Add time-based filters to the predicate. Use a datetime column value or ingestion_time().
  • Search in specific columns rather than a full text search.
  • It’s better not to reference columns that appear in multiple tables and have multiple types. If the predicate is valid when resolving such columns type for more than one type, the query falls back to union. For example, see examples of cases where find acts as a union.

Examples

Term lookup across all tables

The query finds all rows from all tables in the current database in which any column includes the word Hernandez. The resulting records are transformed according to the output schema. The output includes rows from the Customers table and the SalesTable table of the ContosoSales database.

find "Hernandez"

Output

This table shows the first three rows of the output.

source_pack_
Customers{“CityName”:“Ballard”,“CompanyName”:“NULL”,“ContinentName”:“North America”,“CustomerKey”:5023,“Education”:“Partial High School”,“FirstName”:“Devin”,“Gender”:“M”,“LastName”:“Hernandez”,“MaritalStatus”:“S”,“Occupation”:“Clerical”,“RegionCountryName”:“United States”,“StateProvinceName”:“Washington”}
Customers{“CityName”:“Ballard”,“CompanyName”:“NULL”,“ContinentName”:“North America”,“CustomerKey”:7814,“Education”:“Partial College”,“FirstName”:“Kristy”,“Gender”:“F”,“LastName”:“Hernandez”,“MaritalStatus”:“S”,“Occupation”:“Professional”,“RegionCountryName”:“United States”,“StateProvinceName”:“Washington”}
Customers{“CityName”:“Ballard”,“CompanyName”:“NULL”,“ContinentName”:“North America”,“CustomerKey”:7888,“Education”:“Partial High School”,“FirstName”:“Kari”,“Gender”:“F”,“LastName”:“Hernandez”,“MaritalStatus”:“S”,“Occupation”:“Clerical”,“RegionCountryName”:“United States”,“StateProvinceName”:“Washington”}

Term lookup across all tables matching a name pattern

The query finds all rows from all tables in the current database whose name starts with C, and in which any column includes the word Hernandez. The resulting records are transformed according to the output schema. Now, the output only contains records from the Customers table.

find in (C*) where * has "Hernandez"

Output

This table shows the first three rows of the output.

source_pack_
ConferenceSessions{“conference”:“Build 2021”,“sessionid”:“CON-PRT103”,“session_title”:“Roundtable: Advanced Kusto query language topics”,“session_type”:“Roundtable”,“owner”:“Avner Aharoni”,“participants”:“Alexander Sloutsky, Tzvia Gitlin-Troyna”,“URL”:“https://sessions.mybuild.microsoft.com/sessions/details/4d4887e9-f08d-4f88-99ac-41e5feb869e7","level":200,"session_location":"Online","starttime":"2021-05-26T08:30:00.0000000Z","duration":60,"time_and_duration":"Wednesday, May 26\n8:30 AM - 9:30 AM GMT”,“kusto_affinity”:“Focused”}
ConferenceSessions{“conference”:“Ignite 2018”,“sessionid”:“THR3115”,“session_title”:“Azure Log Analytics: Deep dive into the Azure Kusto query language. “,“session_type”:“Theater”,“owner”:“Jean Francois Berenguer”,“participants”:””,“URL”:“https://myignite.techcommunity.microsoft.com/sessions/66329","level":300,"session_location":"","starttime":null,"duration":null,"time_and_duration":"","kusto_affinity":"Focused"}
ConferenceSessions{“conference”:“Build 2021”,“sessionid”:“CON-PRT103”,“session_title”:“Roundtable: Advanced Kusto query language topics”,“session_type”:“Roundtable”,“owner”:“Avner Aharoni”,“participants”:“Alexander Sloutsky, Tzvia Gitlin-Troyna”,“URL”:“https://sessions.mybuild.microsoft.com/sessions/details/4d4887e9-f08d-4f88-99ac-41e5feb869e7","level":200,"session_location":"Online","starttime":"2021-05-26T08:30:00.0000000Z","duration":60,"time_and_duration":"Wednesday, May 26\n8:30 AM - 9:30 AM GMT”,“kusto_affinity”:“Focused”}

Term lookup across the cluster

The query finds all rows from all tables in all databases in the cluster in which any column includes the word Kusto. This query is a cross-database query. The resulting records are transformed according to the output schema.

find in (database('*').*) where * has "Kusto"

Output

This table shows the first three rows of the output.

source_pack_
database(“Samples”).ConferenceSessions{“conference”:“Build 2021”,“sessionid”:“CON-PRT103”,“session_title”:“Roundtable: Advanced Kusto query language topics”,“session_type”:“Roundtable”,“owner”:“Avner Aharoni”,“participants”:“Alexander Sloutsky, Tzvia Gitlin-Troyna”,“URL”:“https://sessions.mybuild.microsoft.com/sessions/details/4d4887e9-f08d-4f88-99ac-41e5feb869e7","level":200,"session_location":"Online","starttime":"2021-05-26T08:30:00.0000000Z","duration":60,"time_and_duration":"Wednesday, May 26\n8:30 AM - 9:30 AM GMT”,“kusto_affinity”:“Focused”}
database(“Samples”).ConferenceSessions{“conference”:“Ignite 2018”,“sessionid”:“THR3115”,“session_title”:“Azure Log Analytics: Deep dive into the Azure Kusto query language. “,“session_type”:“Theater”,“owner”:“Jean Francois Berenguer”,“participants”:””,“URL”:“https://myignite.techcommunity.microsoft.com/sessions/66329","level":300,"session_location":"","starttime":null,"duration":null,"time_and_duration":"","kusto_affinity":"Focused"}
database(“Samples”).ConferenceSessions{“conference”:“Build 2021”,“sessionid”:“CON-PRT103”,“session_title”:“Roundtable: Advanced Kusto query language topics”,“session_type”:“Roundtable”,“owner”:“Avner Aharoni”,“participants”:“Alexander Sloutsky, Tzvia Gitlin-Troyna”,“URL”:“https://sessions.mybuild.microsoft.com/sessions/details/4d4887e9-f08d-4f88-99ac-41e5feb869e7","level":200,"session_location":"Online","starttime":"2021-05-26T08:30:00.0000000Z","duration":60,"time_and_duration":"Wednesday, May 26\n8:30 AM - 9:30 AM GMT”,“kusto_affinity”:“Focused”}

Term lookup matching a name pattern in the cluster

The query finds all rows from all tables whose name starts with K in all databases whose name start with B and in which any column includes the word Kusto. The resulting records are transformed according to the output schema.

find in (database("S*").C*) where * has "Kusto"

Output

This table shows the first three rows of the output.

source_pack_
ConferenceSessions{“conference”:“Build 2021”,“sessionid”:“CON-PRT103”,“session_title”:“Roundtable: Advanced Kusto query language topics”,“session_type”:“Roundtable”,“owner”:“Avner Aharoni”,“participants”:“Alexander Sloutsky, Tzvia Gitlin-Troyna”,“URL”:“https://sessions.mybuild.microsoft.com/sessions/details/4d4887e9-f08d-4f88-99ac-41e5feb869e7","level":200,"session_location":"Online","starttime":"2021-05-26T08:30:00.0000000Z","duration":60,"time_and_duration":"Wednesday, May 26\n8:30 AM - 9:30 AM GMT”,“kusto_affinity”:“Focused”}
ConferenceSessions{“conference”:“Build 2021”,“sessionid”:“CON-PRT103”,“session_title”:“Roundtable: Advanced Kusto query language topics”,“session_type”:“Roundtable”,“owner”:“Avner Aharoni”,“participants”:“Alexander Sloutsky, Tzvia Gitlin-Troyna”,“URL”:“https://sessions.mybuild.microsoft.com/sessions/details/4d4887e9-f08d-4f88-99ac-41e5feb869e7","level":200,"session_location":"Online","starttime":"2021-05-26T08:30:00.0000000Z","duration":60,"time_and_duration":"Wednesday, May 26\n8:30 AM - 9:30 AM GMT”,“kusto_affinity”:“Focused”}
ConferenceSessions{“conference”:“Build 2021”,“sessionid”:“CON-PRT103”,“session_title”:“Roundtable: Advanced Kusto query language topics”,“session_type”:“Roundtable”,“owner”:“Avner Aharoni”,“participants”:“Alexander Sloutsky, Tzvia Gitlin-Troyna”,“URL”:“https://sessions.mybuild.microsoft.com/sessions/details/4d4887e9-f08d-4f88-99ac-41e5feb869e7","level":200,"session_location":"Online","starttime":"2021-05-26T08:30:00.0000000Z","duration":60,"time_and_duration":"Wednesday, May 26\n8:30 AM - 9:30 AM GMT”,“kusto_affinity”:“Focused”}

Term lookup in several clusters

The query finds all rows from all tables whose name starts with K in all databases whose name start with B and in which any column includes the word Kusto. The resulting records are transformed according to the output schema.

find in (cluster("cluster1").database("B*").K*, cluster("cluster2").database("C*".*))
where * has "Kusto"

Term lookup across all tables

The query finds all rows from all tables in which any column includes the word Kusto. The resulting records are transformed according to the output schema.

find "Kusto"

Examples of find output results

The following examples show how find can be used over two tables: EventsTable1 and EventsTable2. Assume we have the next content of these two tables:

EventsTable1

Session_IdLevelEventTextVersion
acbd207d-51aa-4df7-bfa7-be70eb68f04eInformationSome Text1v1.0.0
acbd207d-51aa-4df7-bfa7-be70eb68f04eErrorSome Text2v1.0.0
28b8e46e-3c31-43cf-83cb-48921c3986fcErrorSome Text3v1.0.1
8f057b11-3281-45c3-a856-05ebb18a3c59InformationSome Text4v1.1.0

EventsTable2

Session_IdLevelEventTextEventName
f7d5f95f-f580-4ea6-830b-5776c8d64fddInformationSome Other Text1Event1
acbd207d-51aa-4df7-bfa7-be70eb68f04eInformationSome Other Text2Event2
acbd207d-51aa-4df7-bfa7-be70eb68f04eErrorSome Other Text3Event3
15eaeab5-8576-4b58-8fc6-478f75d8fee4ErrorSome Other Text4Event4

Search in common columns, project common, and uncommon columns, and pack the rest

The query searches for specific records in EventsTable1 and EventsTable2 based on a given Session_Id and an Error Level. It then projects three specific columns: EventText, Version, and EventName, and packs all other remaining columns into a dynamic object.

find in (EventsTable1, EventsTable2) 
     where Session_Id == 'acbd207d-51aa-4df7-bfa7-be70eb68f04e' and Level == 'Error' 
     project EventText, Version, EventName, pack_all()

Output

source_EventTextVersionEventNamepack_
EventsTable1Some Text2v1.0.0{“Session_Id”:“acbd207d-51aa-4df7-bfa7-be70eb68f04e”, “Level”:“Error”}
EventsTable2Some Other Text3Event3{“Session_Id”:“acbd207d-51aa-4df7-bfa7-be70eb68f04e”, “Level”:“Error”}

Search in common and uncommon columns

The query searches for records that either have Version as ‘v1.0.0’ or EventName as ‘Event1’, and then it projects (selects) four specific columns: Session_Id, EventText, Version, and EventName from those filtered results.

find Version == 'v1.0.0' or EventName == 'Event1' project Session_Id, EventText, Version, EventName

Output

source_Session_IdEventTextVersionEventName
EventsTable1acbd207d-51aa-4df7-bfa7-be70eb68f04eSome Text1v1.0.0
EventsTable1acbd207d-51aa-4df7-bfa7-be70eb68f04eSome Text2v1.0.0
EventsTable2f7d5f95f-f580-4ea6-830b-5776c8d64fddSome Other Text1Event1

Use abbreviated notation to search across all tables in the current database

This query searches the database for any records with a Session_Id that matches ‘acbd207d-51aa-4df7-bfa7-be70eb68f04e’. It retrieves records from all tables and columns that contain this specific Session_Id.

find Session_Id == 'acbd207d-51aa-4df7-bfa7-be70eb68f04e'

Output

source_Session_IdLevelEventTextpack_
EventsTable1acbd207d-51aa-4df7-bfa7-be70eb68f04eInformationSome Text1{“Version”:“v1.0.0”}
EventsTable1acbd207d-51aa-4df7-bfa7-be70eb68f04eErrorSome Text2{“Version”:“v1.0.0”}
EventsTable2acbd207d-51aa-4df7-bfa7-be70eb68f04eInformationSome Other Text2{“EventName”:“Event2”}
EventsTable2acbd207d-51aa-4df7-bfa7-be70eb68f04eErrorSome Other Text3{“EventName”:“Event3”}

Return the results from each row as a property bag

This query searches the database for records with the specified Session_Id and returns all columns of those records as a single dynamic object.

find Session_Id == 'acbd207d-51aa-4df7-bfa7-be70eb68f04e' project pack_all()

Output

source_pack_
EventsTable1{“Session_Id”:“acbd207d-51aa-4df7-bfa7-be70eb68f04e”, “Level”:“Information”, “EventText”:“Some Text1”, “Version”:“v1.0.0”}
EventsTable1{“Session_Id”:“acbd207d-51aa-4df7-bfa7-be70eb68f04e”, “Level”:“Error”, “EventText”:“Some Text2”, “Version”:“v1.0.0”}
EventsTable2{“Session_Id”:“acbd207d-51aa-4df7-bfa7-be70eb68f04e”, “Level”:“Information”, “EventText”:“Some Other Text2”, “EventName”:“Event2”}
EventsTable2{“Session_Id”:“acbd207d-51aa-4df7-bfa7-be70eb68f04e”, “Level”:“Error”, “EventText”:“Some Other Text3”, “EventName”:“Event3”}

Examples of cases where find acts as union

The find operator in Kusto can sometimes act like a union operator, mainly when it’s used to search across multiple tables.

Using a nontabular expression as find operand

The query first creates a view that filters EventsTable1 to only include error-level records. Then, it searches within this filtered view and the EventsTable2 table for records with a specific Session_Id.

let PartialEventsTable1 = view() { EventsTable1 | where Level == 'Error' };
find in (PartialEventsTable1, EventsTable2) 
     where Session_Id == 'acbd207d-51aa-4df7-bfa7-be70eb68f04e'

Referencing a column that appears in multiple tables and has multiple types

For this example, create two tables by running:

.create tables 
  Table1 (Level:string, Timestamp:datetime, ProcessId:string),
  Table2 (Level:string, Timestamp:datetime, ProcessId:int64)
  • The following query is executed as union.
find in (Table1, Table2) where ProcessId == 1001

The output result schema is (Level:string, Timestamp, ProcessId_string, ProcessId_int).

  • The following query is executed as union, but produces a different result schema.
find in (Table1, Table2) where ProcessId == 1001 project Level, Timestamp, ProcessId:string 

The output result schema is (Level:string, Timestamp, ProcessId_string)

14 - fork operator

Learn how to use the fork operator to run multiple consumer operators in parallel.

Runs multiple consumer operators in parallel.

Syntax

T | fork [name=](subquery) [name=](subquery)

Parameters

NameTypeRequiredDescription
subquerystring✔️A downstream pipeline of supported query operators.
namestringA temporary name for the subquery result table.

Supported query operators

Returns

Multiple result tables, one for each of the subquery arguments.

Tips

  • Use materialize as a replacement for join or union on fork legs. The input stream is cached by materialize and then the cached expression can be used in join/union legs.

  • Use batch with materialize of tabular expression statements instead of the fork operator.

Examples

The examples output multiple tables, with named and umnamed columns.

Unnamed subqueries

StormEvents
| where State == "FLORIDA"
| fork
    ( where DeathsDirect + DeathsIndirect > 1)
    ( where InjuriesDirect + InjuriesIndirect > 1)

Output

This output shows the first few rows and columns of the result table.

GenericResult

StartTimeEndTimeEpisodeIdEventIdStateEventTypeInjuriesDirectInjuriesIndirect
2007-02-02T03:17:00Z2007-02-02T03:25:00Z346418948FLORIDATornado100
2007-02-02T03:37:00Z2007-02-02T03:55:00Z346418950FLORIDATornado90
2007-03-13T08:20:00Z2007-03-13T08:20:00Z409422961FLORIDADense Fog30
2007-09-11T15:26:00Z2007-09-11T15:26:00Z957853798FLORIDARip Current00

GenericResult

StartTimeEndTimeEpisodeIdEventIdStateEventTypeInjuriesDirectInjuriesIndirect
2007-02-02T03:10:00Z2007-02-02T03:16:00Z254517515FLORIDATornado150
2007-02-02T03:17:00Z2007-02-02T03:25:00Z346418948FLORIDATornado100
2007-02-02T03:37:00Z2007-02-02T03:55:00Z346418950FLORIDATornado90
2007-02-02T03:55:00Z2007-02-02T04:10:00Z346420318FLORIDATornado420

Named subqueries

In the following examples, the result table is named “StormsWithDeaths” and “StormsWithInjuries”.

StormEvents
| where State == "FLORIDA"
| fork
    (where DeathsDirect + DeathsIndirect > 1 | as StormsWithDeaths)
    (where InjuriesDirect + InjuriesIndirect > 1 | as StormsWithInjuries)
StormEvents
| where State == "FLORIDA"
| fork
    StormsWithDeaths = (where DeathsDirect + DeathsIndirect > 1)
    StormsWithInjuries = (where InjuriesDirect + InjuriesIndirect > 1)

Output

This output shows the first few rows and columns of the result table.

StormsWithDeaths

StartTimeEndTimeEpisodeIdEventIdStateEventTypeInjuriesDirectInjuriesIndirect
2007-02-02T03:17:00Z2007-02-02T03:25:00Z346418948FLORIDATornado100
2007-02-02T03:37:00Z2007-02-02T03:55:00Z346418950FLORIDATornado90
2007-03-13T08:20:00Z2007-03-13T08:20:00Z409422961FLORIDADense Fog30
2007-09-11T15:26:00Z2007-09-11T15:26:00Z957853798FLORIDARip Current00

StormsWithInjuries

StartTimeEndTimeEpisodeIdEventIdStateEventTypeInjuriesDirectInjuriesIndirect
2007-02-02T03:10:00Z2007-02-02T03:16:00Z254517515FLORIDATornado150
2007-02-02T03:17:00Z2007-02-02T03:25:00Z346418948FLORIDATornado100
2007-02-02T03:37:00Z2007-02-02T03:55:00Z346418950FLORIDATornado90
2007-02-02T03:55:00Z2007-02-02T04:10:00Z346420318FLORIDATornado420
SamplePowerRequirementHistorizedData
| fork
    Dataset2 = (where twinId  <> "p_sol_01" | summarize count() by twinId, name)
    Dataset3 = (summarize count() by WeekOfYear = week_of_year(timestamp))

Fork operator

It is possible to use almost all the known features of the KQL language inside every single “sub” result set. For instance, the join operator inside a sub-statement does not work. This is not allowed by the engine.

15 - getschema operator

Learn how to use the getschema operator to create a tabular schema of the input.

Produce a table that represents a tabular schema of the input.

Syntax

T | getschema

Example

StormEvents
| getschema

Output

ColumnNameColumnOrdinalDataTypeColumnType
StartTime0System.DateTimedatetime
EndTime1System.DateTimedatetime
EpisodeId2System.Int32int
EventId3System.Int32int
State4System.Stringstring
EventType5System.Stringstring
InjuriesDirect6System.Int32int
InjuriesIndirect7System.Int32int
DeathsDirect8System.Int32int
DeathsIndirect9System.Int32int
DamageProperty10System.Int32int
DamageCrops11System.Int32int
Source12System.Stringstring
BeginLocation13System.Stringstring
EndLocation14System.Stringstring
BeginLat15System.Doublereal
BeginLon16System.Doublereal
EndLat17System.Doublereal
EndLon18System.Doublereal
EpisodeNarrative19System.Stringstring
EventNarrative20System.Stringstring
StormSummary21System.Objectdynamic

16 - invoke operator

Learn how to use the invoke operator to invoke a lambda expression that receives the source of invoke as a tabular parameter argument

Invokes a lambda expression that receives the source of invoke as a tabular argument.

Syntax

T | invoke function([param1, param2])

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular source.
functionstring✔️The name of the lambda let expression or stored function name to be evaluated.
param1, param2stringAny additional lambda arguments to pass to the function.

Returns

Returns the result of the evaluated expression.

Example

This example shows how to use the invoke operator to call lambda let expression:

// clipped_average(): calculates percentiles limits, and then makes another 
//                    pass over the data to calculate average with values inside the percentiles
let clipped_average = (T:(x: long), lowPercentile:double, upPercentile:double)
{
   let high = toscalar(T | summarize percentiles(x, upPercentile));
   let low = toscalar(T | summarize percentiles(x, lowPercentile));
   T 
   | where x > low and x < high
   | summarize avg(x) 
};
range x from 1 to 100 step 1
| invoke clipped_average(5, 99)

Output

avg_x
52

17 - lookup operator

Learn how to use the lookup operator to extend columns of a fact table.

Extends the columns of a fact table with values looked-up in a dimension table.

For example, the following query results in a table that extends the FactTable ($left) with data from the DimensionTable ($right) by performing a lookup. The lookup matches each pair (CommonColumn, Col1) from FactTable with each pair (CommonColumn, Col2) in the DimensionTable. For the differences between fact and dimension tables, see fact and dimension tables.

FactTable | lookup kind=leftouter (DimensionTable) on CommonColumn, $left.Col1 == $right.Col2

The lookup operator performs an operation similar to the join operator with the following differences:

  • The result doesn’t repeat columns from the $right table that are the basis for the join operation.
  • Only two kinds of lookup are supported, leftouter and inner, with leftouter being the default.
  • In terms of performance, the system by default assumes that the $left table is the larger (facts) table, and the $right table is the smaller (dimensions) table. This is exactly opposite to the assumption used by the join operator.
  • The lookup operator automatically broadcasts the $right table to the $left table (essentially, behaves as if hint.broadcast was specified). This limits the size of the $right table.

Syntax

LeftTable | lookup [kind = (leftouter|inner)] (RightTable) on Attributes

Parameters

NameTypeRequiredDescription
LeftTablestring✔️The table or tabular expression that is the basis for the lookup. Denoted as $left.
RightTablestring✔️The table or tabular expression that is used to “populate” new columns in the fact table. Denoted as $right.
Attributesstring✔️A comma-delimited list of one or more rules that describe how rows from LeftTable are matched to rows from RightTable. Multiple rules are evaluated using the and logical operator. See Rules.
kindstringDetermines how to treat rows in LeftTable that have no match in RightTable. By default, leftouter is used, which means all those rows appear in the output with null values used for the missing values of RightTable columns added by the operator. If inner is used, such rows are omitted from the output. Other kinds of join aren’t supported by the lookup operator.

Rules

Rule kindSyntaxPredicate
Equality by nameColumnNamewhere LeftTable.ColumnName == RightTable.ColumnName
Equality by value$left.LeftColumn == $right.RightColumnwhere $left.LeftColumn == $right.*RightColumn

Returns

A table with:

  • A column for every column in each of the two tables, including the matching keys. The columns of the right side are automatically renamed if there are name conflicts.
  • A row for every match between the input tables. A match is a row selected from one table that has the same value for all the on fields as a row in the other table.
  • The Attributes (lookup keys) appear only once in the output table.
  • If kind is unspecified or kind=leftouter, then in addition to the inner matches, there’s a row for every row on the left (and/or right), even if it has no match. In that case, the unmatched output cells contain nulls.
  • If kind=inner, then there’s a row in the output for every combination of matching rows from left and right.

Example

The following example shows how to perform a left outer join between the FactTable and DimTable, based on matching values in the Personal and Family columns.

let FactTable=datatable(Row:string,Personal:string,Family:string) [
  "1", "Rowan",   "Murphy",
  "2", "Ellis",   "Turner",
  "3", "Ellis",   "Turner",
  "4", "Maya",  "Robinson",
  "5", "Quinn",    "Campbell"
];
let DimTable=datatable(Personal:string,Family:string,Alias:string) [
  "Rowan",  "Murphy",   "rowanm",
  "Ellis",  "Turner", "ellist",
  "Maya", "Robinson", "mayar",
  "Quinn",   "Campbell",    "quinnc"
];
FactTable
| lookup kind=leftouter DimTable on Personal, Family

Output

RowPersonalFamilyAlias
1RowanMurphyrowanm
2EllisTurnerellist
3EllisTurnerellist
4MayaRobinsonmayar
5QuinnCampbellquinnc

18 - mv-apply operator

Learn how to use the mv-apply operator to apply a subquery to each record and union the results of each subquery.

Applies a subquery to each record, and returns the union of the results of all subqueries.

For example, assume a table T has a column Metric of type dynamic whose values are arrays of real numbers. The following query locates the two biggest values in each Metric value, and return the records corresponding to these values.

T | mv-apply Metric to typeof(real) on 
(
   top 2 by Metric desc
)

The mv-apply operator has the following processing steps:

  1. Uses the mv-expand operator to expand each record in the input into subtables (order is preserved).
  2. Applies the subquery for each of the subtables.
  3. Adds zero or more columns to the resulting subtable. These columns contain the values of the source columns that aren’t expanded, and are repeated where needed.
  4. Returns the union of the results.

The mv-apply operator gets the following inputs:

  1. One or more expressions that evaluate into dynamic arrays to expand. The number of records in each expanded subtable is the maximum length of each of those dynamic arrays. Null values are added where multiple expressions are specified and the corresponding arrays have different lengths.

  2. Optionally, the names to assign the values of the expressions after expansion. These names become the columns names in the subtables. If not specified, the original name of the column is used when the expression is a column reference. A random name is used otherwise.

    [!NOTE] It is recommended to use the default column names.

  3. The data types of the elements of those dynamic arrays, after expansion. These become the column types of the columns in the subtables. If not specified, dynamic is used.

  4. Optionally, the name of a column to add to the subtables that specifies the 0-based index of the element in the array that resulted in the subtable record.

  5. Optionally, the maximum number of array elements to expand.

The mv-apply operator can be thought of as a generalization of the mv-expand operator (in fact, the latter can be implemented by the former, if the subquery includes only projections.)

Syntax

T | mv-apply [ItemIndex] ColumnsToExpand [RowLimit] on ( SubQuery )

Where ItemIndex has the syntax:

with_itemindex = IndexColumnName

ColumnsToExpand is a comma-separated list of one or more elements of the form:

[Name =] ArrayExpression [to typeof (Typename)]

RowLimit is simply:

limit RowLimit

and SubQuery has the same syntax of any query statement.

Parameters

NameTypeRequiredDescription
ItemIndexstringIndicates the name of a column of type long that’s appended to the input as part of the array-expansion phase and indicates the 0-based array index of the expanded value.
NamestringThe name to assign the array-expanded values of each array-expanded expression. If not specified, the name of the column is used if available. A random name is generated if ArrayExpression isn’t a simple column name.
ArrayExpressiondynamic✔️The array whose values are array-expanded. If the expression is the name of a column in the input, the input column is removed from the input and a new column of the same name, or ColumnName if specified, appears in the output.
TypenamestringThe name of the type that the individual elements of the dynamic array ArrayExpression take. Elements that don’t conform to this type are replaced by a null value. If unspecified, dynamic is used by default.
RowLimitintA limit on the number of records to generate from each record of the input. If unspecified, 2147483647 is used.
SubQuerystringA tabular query expression with an implicit tabular source that gets applied to each array-expanded subtable.

Examples

Review the examples and run them in your Data Explorer query page.

Getting the largest element from the array

The query outputs the smallest even number (2) and the smallest odd number (1).

let _data =
    range x from 1 to 8 step 1
    | summarize l=make_list(x) by xMod2 = x % 2;
_data
| mv-apply element=l to typeof(long) on 
    (
    top 1 by element
    )

Output

xMod2lelement
1[1, 3, 5, 7]7
0[2, 4, 6, 8]8

Calculating the sum of the largest two elements in an array

The query outputs the sum of the top 2 even numbers (6 + 8 = 14) and the sum of the top 2 odd numbers (5 + 7 = 12).

let _data =
    range x from 1 to 8 step 1
    | summarize l=make_list(x) by xMod2 = x % 2;
_data
| mv-apply l to typeof(long) on
    (
    top 2 by l
    | summarize SumOfTop2=sum(l)
    )

Output

xMod2lSumOfTop2
1[1,3,5,7]12
0[2,4,6,8]14

Select elements in arrays

The query identifies the top 2 elements from each dynamic array based on the Arr2 values and summarizes them into new lists.

datatable (Val:int, Arr1:dynamic, Arr2:dynamic)
[ 1, dynamic(['A1', 'A2', 'A3']),       dynamic([10, 30, 7]), 
  7, dynamic(['B1', 'B2', 'B5']),       dynamic([15, 11, 50]),
  3, dynamic(['C1', 'C2', 'C3', 'C4']), dynamic([6, 40, 20, 8])
] 
| mv-apply NewArr1=Arr1, NewArr2=Arr2 to typeof(long) on (
 top 2 by NewArr2
 | summarize NewArr1=make_list(NewArr1), NewArr2=make_list(NewArr2)
)

Output

Val1Arr1Arr2NewArr1NewArr2
1[“A1”,“A2”,“A3”][10,30,7][“A2’,“A1”][30,10]
7[“B1”,“B2”,“B5”][15,11,50][“B5”,“B1”][50,15]
3[“C1”,“C2”,“C3”,“C4”][6,40,20,8][“C2”,“C3”][40,20]

Using with_itemindex for working with a subset of the array

The query results in a table with rows where the index is 3 or greater, including the index and element values from the original lists of even and odd numbers.

let _data =
    range x from 1 to 10 step 1
    | summarize l=make_list(x) by xMod2 = x % 2;
_data
| mv-apply with_itemindex=index element=l to typeof(long) on 
    (
    // here you have 'index' column
    where index >= 3
    )
| project index, element

Output

indexelement
37
49
38
410

Using mutiple columns to join element of 2 arrays

The query combines elements from two dynamic arrays into a new concatenated format and then summarizes them into lists.

datatable (Val: int, Arr1: dynamic, Arr2: dynamic)
[
    1, dynamic(['A1', 'A2', 'A3']), dynamic(['B1', 'B2', 'B3']), 
    5, dynamic(['C1', 'C2']), dynamic(['D1', 'D2'])
] 
| mv-apply Arr1, Arr2 on (
    extend Out = strcat(Arr1, "_", Arr2)
    | summarize Arr1 = make_list(Arr1), Arr2 = make_list(Arr2), Out= make_list(Out)
    )

Output

ValArr1Arr2Out
1[“A1”,“A2”,“A3”][“B1”,“B2”,“B3”][“A1_B1”,“A2_B2”,“A3_B3”]
5[“C1”,“C2”][“D1”,“D2”][“C1_D1”,“C2_D2”]

Applying mv-apply to a property bag

This query dynamically removes properties from the packed values object based on the criteria that their values do not start with “555”. The final result contains the original columns with unwanted properties removed.

datatable(SourceNumber: string, TargetNumber: string, CharsCount: long)
[
    '555-555-1234', '555-555-1212', 46,
    '555-555-1212', '', int(null)
]
| extend values = pack_all()
| mv-apply removeProperties = values on 
    (
    mv-expand kind = array values
    | where values[1] !startswith "555"
    | summarize propsToRemove = make_set(values[0])
    )
| extend values = bag_remove_keys(values, propsToRemove)
| project-away propsToRemove

Output

SourceNumberTargetNumberCharsCountvalues
555-555-1234555-555-121246{
“SourceNumber”: “555-555-1234”,
“TargetNumber”: “555-555-1212”
}
555-555-1212  {
“SourceNumber”: “555-555-1212”
}

19 - mv-expand operator

Learn how to use the mv-expand operator to expand multi-value dynamic arrays or property bags into multiple records.

Expands multi-value dynamic arrays or property bags into multiple records.

mv-expand can be described as the opposite of the aggregation operators that pack multiple values into a single dynamic-typed array or property bag, such as summarizemake-list() and make-series. Each element in the (scalar) array or property bag generates a new record in the output of the operator. All columns of the input that aren’t expanded are duplicated to all the records in the output.

Syntax

T |mv-expand [kind=(bag | array)] [with_itemindex= IndexColumnName] ColumnName [to typeof( Typename)] [, ColumnName …] [limit Rowlimit]

T |mv-expand [kind=(bag | array)] [Name =] ArrayExpression [to typeof(Typename)] [, [Name =] ArrayExpression [to typeof(Typename)] …] [limit Rowlimit]

Parameters

NameTypeRequiredDescription
ColumnName, ArrayExpressionstring✔️A column reference, or a scalar expression with a value of type dynamic that holds an array or a property bag. The individual top-level elements of the array or property bag get expanded into multiple records.
When ArrayExpression is used and Name doesn’t equal any input column name, the expanded value is extended into a new column in the output. Otherwise, the existing ColumnName is replaced.
NamestringA name for the new column.
Typenamestring✔️Indicates the underlying type of the array’s elements, which becomes the type of the column produced by the mv-expand operator. The operation of applying type is cast-only and doesn’t include parsing or type-conversion. Array elements that don’t conform with the declared type become null values.
RowLimitintThe maximum number of rows generated from each original row. The default is 2147483647. mvexpand is a legacy and obsolete form of the operator mv-expand. The legacy version has a default row limit of 128.
IndexColumnNamestringIf with_itemindex is specified, the output includes another column named IndexColumnName that contains the index starting at 0 of the item in the original expanded collection.

Returns

For each record in the input, the operator returns zero, one, or many records in the output, as determined in the following way:

  1. Input columns that aren’t expanded appear in the output with their original value. If a single input record is expanded into multiple output records, the value is duplicated to all records.

  2. For each ColumnName or ArrayExpression that is expanded, the number of output records is determined for each value as explained in modes of expansion. For each input record, the maximum number of output records is calculated. All arrays or property bags are expanded “in parallel” so that missing values (if any) are replaced by null values. Elements are expanded into rows in the order that they appear in the original array/bag.

  3. If the dynamic value is null, then a single record is produced for that value (null). If the dynamic value is an empty array or property bag, no record is produced for that value. Otherwise, as many records are produced as there are elements in the dynamic value.

The expanded columns are of type dynamic, unless they’re explicitly typed by using the to typeof() clause.

Modes of expansion

Two modes of property bag expansions are supported:

  • kind=bag or bagexpansion=bag: Property bags are expanded into single-entry property bags. This mode is the default mode.
  • kind=array or bagexpansion=array: Property bags are expanded into two-element [key,value] array structures, allowing uniform access to keys and values. This mode also allows, for example, running a distinct-count aggregation over property names.

Examples

The examples in this section show how to use the syntax to help you get started.

Single column - array expansion

datatable (a: int, b: dynamic)
[
    1, dynamic([10, 20]),
    2, dynamic(['a', 'b'])
]
| mv-expand b

Output

ab
110
120
2a
2b

Single column - bag expansion

A simple expansion of a single column:

datatable (a: int, b: dynamic)
[
    1, dynamic({"prop1": "a1", "prop2": "b1"}),
    2, dynamic({"prop1": "a2", "prop2": "b2"})
]
| mv-expand b

Output

ab
1{“prop1”: “a1”}
1{“prop2”: “b1”}
2{“prop1”: “a2”}
2{“prop2”: “b2”}

Single column - bag expansion to key-value pairs

A simple bag expansion to key-value pairs:

datatable (a: int, b: dynamic)
[
    1, dynamic({"prop1": "a1", "prop2": "b1"}),
    2, dynamic({"prop1": "a2", "prop2": "b2"})
]
| mv-expand kind=array b 
| extend key = b[0], val=b[1]

Output

abkeyval
1[“prop1”,“a1”]prop1a1
1[“prop2”,“b1”]prop2b1
2[“prop1”,“a2”]prop1a2
2[“prop2”,“b2”]prop2b2

Zipped two columns

Expanding two columns will first ‘zip’ the applicable columns and then expand them:

datatable (a: int, b: dynamic, c: dynamic)[
    1, dynamic({"prop1": "a", "prop2": "b"}), dynamic([5, 4, 3])
]
| mv-expand b, c

Output

abc
1{“prop1”:“a”}5
1{“prop2”:“b”}4
13

Cartesian product of two columns

If you want to get a Cartesian product of expanding two columns, expand one after the other:

datatable (a: int, b: dynamic, c: dynamic)
[
    1, dynamic({"prop1": "a", "prop2": "b"}), dynamic([5, 6])
]
| mv-expand b
| mv-expand c

Output

abc
1{ “prop1”: “a”}5
1{ “prop1”: “a”}6
1{ “prop2”: “b”}5
1{ “prop2”: “b”}6

Convert output

To force the output of an mv-expand to a certain type (default is dynamic), use to typeof:

datatable (a: string, b: dynamic, c: dynamic)[
    "Constant", dynamic([1, 2, 3, 4]), dynamic([6, 7, 8, 9])
]
| mv-expand b, c to typeof(int)
| getschema 

Output

ColumnNameColumnOrdinalDateTypeColumnType
a0System.Stringstring
b1System.Objectdynamic
c2System.Int32int

Notice column b is returned as dynamic while c is returned as int.

Using with_itemindex

Expansion of an array with with_itemindex:

range x from 1 to 4 step 1
| summarize x = make_list(x)
| mv-expand with_itemindex=Index x

Output

xIndex
10
21
32
43

20 - parse operator

Learn how to use the parse operator to parse the value of a string expression into one or more calculated columns.

Evaluates a string expression and parses its value into one or more calculated columns. The calculated columns return null values for unsuccessfully parsed strings. If there’s no need to use rows where parsing doesn’t succeed, prefer using the parse-where operator.

Syntax

T | parse [ kind=kind [ flags=regexFlags ]] expression with [ * ] stringConstant columnName [: columnType] [ * ] ,

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input to parse.
kindstring✔️One of the supported kind values. The default value is simple.
regexFlagsstringIf kind is regex, then you can specify regex flags to be used like U for ungreedy, m for multi-line mode, s for match new line \n, and i for case-insensitive. More flags can be found in Flags.
expressionstring✔️An expression that evaluates to a string.
stringConstantstring✔️A string constant for which to search and parse.
columnNamestring✔️The name of a column to assign a value to, extracted from the string expression.
columnTypestringThe scalar value that indicates the type to convert the value to. The default is string.

Supported kind values

TextDescription
simpleThis is the default value. stringConstant is a regular string value and the match is strict. All string delimiters should appear in the parsed string, and all extended columns must match the required types.
regexstringConstant can be a regular expression and the match is strict. All string delimiters, which can be a regex for this mode, should appear in the parsed string, and all extended columns must match the required types.
relaxedstringConstant is a regular string value and the match is relaxed. All string delimiters should appear in the parsed string, but extended columns might partially match the required types. Extended columns that didn’t match the required types get the value null.

Regex mode

In regex mode, parse translates the pattern to a regex. Use regular expressions to do the matching and use numbered captured groups that are handled internally. For example:

parse kind=regex Col with * <regex1> var1:string <regex2> var2:long

In the parse statement, the regex internally generated by the parse is .*?<regex1>(.*?)<regex2>(\-\d+).

  • * was translated to .*?.

  • string was translated to .*?.

  • long was translated to \-\d+.

Returns

The input table extended according to the list of columns that are provided to the operator.

Examples

The examples in this section show how to use the syntax to help you get started.

The parse operator provides a streamlined way to extend a table by using multiple extract applications on the same string expression. This result is useful, when the table has a string column that contains several values that you want to break into individual columns. For example, a column that’s produced by a developer trace ("printf"/"Console.WriteLine") statement.

Parse and extend results

In the following example, the column EventText of table Traces contains strings of the form Event: NotifySliceRelease (resourceName={0}, totalSlices={1}, sliceNumber={2}, lockTime={3}, releaseTime={4}, previousLockTime={5}). The operation extends the table with six columns: resourceName, totalSlices, sliceNumber, lockTime, releaseTime, and previousLockTime.

let Traces = datatable(EventText: string)
    [
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=23, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=02/17/2016 08:40:00, previousLockTime=02/17/2016 08:39:00)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces  
| parse EventText with * "resourceName=" resourceName ", totalSlices=" totalSlices: long * "sliceNumber=" sliceNumber: long * "lockTime=" lockTime ", releaseTime=" releaseTime: date "," * "previousLockTime=" previousLockTime: date ")" *  
| project resourceName, totalSlices, sliceNumber, lockTime, releaseTime, previousLockTime

Output

resourceNametotalSlicessliceNumberlockTimereleaseTimepreviousLockTime
PipelineScheduler271502/17/2016 08:40:002016-02-17 08:40:00.00000002016-02-17 08:39:00.0000000
PipelineScheduler272302/17/2016 08:40:012016-02-17 08:40:01.00000002016-02-17 08:39:01.0000000
PipelineScheduler272002/17/2016 08:40:012016-02-17 08:40:01.00000002016-02-17 08:39:01.0000000
PipelineScheduler271602/17/2016 08:41:002016-02-17 08:41:00.00000002016-02-17 08:40:00.0000000
PipelineScheduler272202/17/2016 08:41:012016-02-17 08:41:00.00000002016-02-17 08:40:01.0000000

Extract email alias and DNS

In the following example, entries from the Contacts table are parsed to extract the alias and domain from an email address, and the domain from a website URL. The query returns the EmailAddress, EmailAlias, and WebsiteDomain columns, where the fullEmail column combines the parsed email aliases and domains.

let Leads=datatable(Contacts: string)
    [
    "Event: LeadContact (email=john@contosohotel.com, Website=https:contosohotel.com)",
	"Event: LeadContact (email=abi@fourthcoffee.com, Website=https:www.fourthcoffee.com)",
	"Event: LeadContact (email=nevena@treyresearch.com, Website=https:treyresearch.com)",
	"Event: LeadContact (email=faruk@tailspintoys.com, Website=https:tailspintoys.com)",
	"Event: LeadContact (email=ebere@relecloud.com, Website=https:relecloud.com)",
];
Leads
| parse Contacts with * "email=" alias:string "@" domain: string ", Website=https:" WebsiteDomain: string ")"
| project EmailAddress=strcat(alias, "@", domain), EmailAlias=alias, WebsiteDomain

Output

EmailAddressEmailAliasWebsiteDomain
nevena@treyresearch.comnevenatreyresearch.com
john@contosohotel.comjohncontosohotel.com
faruk@tailspintoys.comfaruktailspintoys.com
ebere@relecloud.comebererelecloud.com
abi@fourthcoffee.comabiwww.fourthcoffee.com

Regex mode

In the following example, regular expressions are used to parse and extract data from the EventText column. The extracted data is projected into new fields.

let Traces=datatable(EventText: string)
    [
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=23, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=02/17/2016 08:40:00, previousLockTime=02/17/2016 08:39:00)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces  
| parse kind=regex EventText with "(.*?)[a-zA-Z]*=" resourceName @", totalSlices=\s*\d+\s*.*?sliceNumber=" sliceNumber: long  ".*?(previous)?lockTime=" lockTime ".*?releaseTime=" releaseTime ".*?previousLockTime=" previousLockTime: date "\\)"  
| project resourceName, sliceNumber, lockTime, releaseTime, previousLockTime

Output

resourceNamesliceNumberlockTimereleaseTimepreviousLockTime
PipelineScheduler1502/17/2016 08:40:00,02/17/2016 08:40:00,2016-02-17 08:39:00.0000000
PipelineScheduler2302/17/2016 08:40:01,02/17/2016 08:40:01,2016-02-17 08:39:01.0000000
PipelineScheduler2002/17/2016 08:40:01,02/17/2016 08:40:01,2016-02-17 08:39:01.0000000
PipelineScheduler1602/17/2016 08:41:00,02/17/2016 08:41:00,2016-02-17 08:40:00.0000000
PipelineScheduler2202/17/2016 08:41:01,02/17/2016 08:41:00,2016-02-17 08:40:01.0000000

Regex mode with regex flags

In the following example resourceName is extracted.

let Traces=datatable(EventText: string)
    [
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=23, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=02/17/2016 08:40:00, previousLockTime=02/17/2016 08:39:00)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces
| parse kind=regex EventText with * "resourceName=" resourceName ',' *
| project resourceName

Output

resourceName
PipelineScheduler, totalSlices=27, sliceNumber=23, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01
PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=02/17/2016 08:40:00
PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01
PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00
PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00

If there are records where resourceName sometimes appears as lower-case and sometimes as upper-case, you might get nulls for some values.

The results in the previous example are unexpected, and include full event data since the default mode is greedy. To extract only resourceName, run the previous query with the non-greedy U, and disable case-sensitive i regex flags.

let Traces=datatable(EventText: string)
    [
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=23, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=02/17/2016 08:40:00, previousLockTime=02/17/2016 08:39:00)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces
| parse kind=regex flags=Ui EventText with * "RESOURCENAME=" resourceName ',' *
| project resourceName

Output

resourceName
PipelineScheduler
PipelineScheduler
PipelineScheduler
PipelineScheduler
PipelineScheduler

If the parsed string has newlines, use the flag s, to parse the text.

let Traces=datatable(EventText: string)
    [
    "Event: NotifySliceRelease (resourceName=PipelineScheduler\ntotalSlices=27\nsliceNumber=23\nlockTime=02/17/2016 08:40:01\nreleaseTime=02/17/2016 08:40:01\npreviousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler\ntotalSlices=27\nsliceNumber=15\nlockTime=02/17/2016 08:40:00\nreleaseTime=02/17/2016 08:40:00\npreviousLockTime=02/17/2016 08:39:00)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler\ntotalSlices=27\nsliceNumber=20\nlockTime=02/17/2016 08:40:01\nreleaseTime=02/17/2016 08:40:01\npreviousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler\ntotalSlices=27\nsliceNumber=22\nlockTime=02/17/2016 08:41:01\nreleaseTime=02/17/2016 08:41:00\npreviousLockTime=02/17/2016 08:40:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler\ntotalSlices=27\nsliceNumber=16\nlockTime=02/17/2016 08:41:00\nreleaseTime=02/17/2016 08:41:00\npreviousLockTime=02/17/2016 08:40:00)"
];
Traces
| parse kind=regex flags=s EventText with * "resourceName=" resourceName: string "(.*?)totalSlices=" totalSlices: long "(.*?)lockTime=" lockTime: datetime "(.*?)releaseTime=" releaseTime: datetime "(.*?)previousLockTime=" previousLockTime: datetime "\\)" 
| project-away EventText

Output

resourceNametotalSliceslockTimereleaseTimepreviousLockTime
PipelineScheduler
272016-02-17 08:40:00.00000002016-02-17 08:40:00.00000002016-02-17 08:39:00.0000000
PipelineScheduler
272016-02-17 08:40:01.00000002016-02-17 08:40:01.00000002016-02-17 08:39:01.0000000
PipelineScheduler
272016-02-17 08:40:01.00000002016-02-17 08:40:01.00000002016-02-17 08:39:01.0000000
PipelineScheduler
272016-02-17 08:41:00.00000002016-02-17 08:41:00.00000002016-02-17 08:40:00.0000000
PipelineScheduler
272016-02-17 08:41:01.00000002016-02-17 08:41:00.00000002016-02-17 08:40:01.0000000

Relaxed mode

In the following relaxed mode example, the extended column totalSlices must be of type long. However, in the parsed string, it has the value nonValidLongValue. For the extended column, releaseTime, the value nonValidDateTime can’t be parsed as datetime. These two extended columns result in null values while the other columns, such as sliceNumber, still result in the correct values.

If you use option kind = simple for the following query, you get null results for all extended columns. This option is strict on extended columns, and is the difference between relaxed and simple mode.

let Traces=datatable(EventText: string)
    [
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=23, lockTime=02/17/2016 08:40:01, releaseTime=nonValidDateTime 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=nonValidDateTime, previousLockTime=02/17/2016 08:39:00)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=nonValidLongValue, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=nonValidDateTime 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=nonValidLongValue, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces
| parse kind=relaxed EventText with * "resourceName=" resourceName ", totalSlices=" totalSlices: long ", sliceNumber=" sliceNumber: long * "lockTime=" lockTime ", releaseTime=" releaseTime: date "," * "previousLockTime=" previousLockTime: date ")" *
| project-away EventText

Output

resourceNametotalSlicessliceNumberlockTimereleaseTimepreviousLockTime
PipelineScheduler271502/17/2016 08:40:002016-02-17 08:39:00.0000000
PipelineScheduler272302/17/2016 08:40:012016-02-17 08:39:01.0000000
PipelineScheduler2002/17/2016 08:40:012016-02-17 08:39:01.0000000
PipelineScheduler1602/17/2016 08:41:002016-02-17 08:41:00.00000002016-02-17 08:40:00.0000000
PipelineScheduler272202/17/2016 08:41:012016-02-17 08:41:00.00000002016-02-17 08:40:01.0000000

21 - parse-kv operator

Learn how to use the parse-kv operator to represent structured information extracted from a string expression in a key/value form.

Extracts structured information from a string expression and represents the information in a key/value form.

The following extraction modes are supported:

Syntax

Specified delimiter

T | parse-kv Expression as ( KeysList ) with ( pair_delimiter = PairDelimiter , kv_delimiter = KvDelimiter [, quote = QuoteChars … [, escape = EscapeChar …]] [, greedy = true] )

Nonspecified delimiter

T | parse-kv Expression as ( KeysList ) with ( [quote = QuoteChars … [, escape = EscapeChar …]] )

Regex

T | parse-kv Expression as ( KeysList ) with ( regex = RegexPattern) )

Parameters

NameTypeRequiredDescription
Expressionstring✔️The expression from which to extract key values.
KeysListstring✔️A comma-separated list of key names and their value data types. The order of the keys doesn’t have to match the order in which they appear in the text.
PairDelimiterstringA delimiter that separates key value pairs from each other.
KvDelimiterstringA delimiter that separates keys from values.
QuoteCharsstringA one- or two-character string literal representing opening and closing quotes that key name or the extracted value may be wrapped with. The parameter can be repeated to specify a separate set of opening/closing quotes.
EscapeCharstringA one-character string literal describing a character that may be used for escaping special characters in a quoted value. The parameter can be repeated if multiple escape characters are used.
RegexPatternstringA regular expression containing two capturing groups exactly. The first group represents the key name, and the second group represents the key value.

Returns

The original input tabular expression T, extended with columns per specified keys to extract.

Examples

The examples in this section show how to use the syntax to help you get started.

Extraction with well-defined delimiters

In this query, keys and values are separated by well defined delimiters. These delimeters are comma and colon characters.

print str="ThreadId:458745723, Machine:Node001, Text: The service is up, Level: Info"
| parse-kv str as (Text: string, ThreadId:long, Machine: string) with (pair_delimiter=',', kv_delimiter=':')
| project-away str

Output

TextThreadIdMachine
The service is up458745723Node001

Extraction with value quoting

Sometimes key names or values are wrapped in quotes, which allow the values themselves to contain delimiter characters. The following examples show how a quote argument is used for extracting such values.

print str='src=10.1.1.123 dst=10.1.1.124 bytes=125 failure="connection aborted" "event time"=2021-01-01T10:00:54'
| parse-kv str as (['event time']:datetime, src:string, dst:string, bytes:long, failure:string) with (pair_delimiter=' ', kv_delimiter='=', quote='"')
| project-away str

Output

event timesrcdstbytesfailure
2021-01-01 10:00:54.000000010.1.1.12310.1.1.124125connection aborted

This query uses different opening and closing quotes:

print str='src=10.1.1.123 dst=10.1.1.124 bytes=125 failure=(connection aborted) (event time)=(2021-01-01 10:00:54)'
| parse-kv str as (['event time']:datetime, src:string, dst:string, bytes:long, failure:string) with (pair_delimiter=' ', kv_delimiter='=', quote='()')
| project-away str

Output

event timesrcdstbytesfailure
2021-01-01 10:00:54.000000010.1.1.12310.1.1.124125connection aborted

The values themselves may contain properly escaped quote characters, as the following example shows:

print str='src=10.1.1.123 dst=10.1.1.124 bytes=125 failure="the remote host sent \\"bye!\\"" time=2021-01-01T10:00:54'
| parse-kv str as (['time']:datetime, src:string, dst:string, bytes:long, failure:string) with (pair_delimiter=' ', kv_delimiter='=', quote='"', escape='\\')
| project-away str

Output

timesrcdstbytesfailure
2021-01-01 10:00:54.000000010.1.1.12310.1.1.124125the remote host sent “bye!”

Extraction in greedy mode

There are cases when unquoted values may contain pair delimiters. In this case, use the greedy mode to indicate to the operator to scan until the next key appearance (or end of string) when looking for the value ending.

The following examples compare how the operator works with and without the greedy mode specified:

print str='name=John Doe phone=555 5555 city=New York'
| parse-kv str as (name:string, phone:string, city:string) with (pair_delimiter=' ', kv_delimiter='=')
| project-away str

Output

namephonecity
John555New
print str='name=John Doe phone=555 5555 city=New York'
| parse-kv str as (name:string, phone:string, city:string) with (pair_delimiter=' ', kv_delimiter='=', greedy=true)
| project-away str

Output

namephonecity
John Doe555 5555New York

Extraction with no well-defined delimiters

In the following example, any nonalphanumeric character is considered a valid delimiter:

print str="2021-01-01T10:00:34 [INFO] ThreadId:458745723, Machine:Node001, Text: Started"
| parse-kv str as (Text: string, ThreadId:long, Machine: string)
| project-away str

Output

TextThreadIdMachine
Started458745723Node001

Values quoting and escaping is allowed in this mode as shown in the following example:

print str="2021-01-01T10:00:34 [INFO] ThreadId:458745723, Machine:Node001, Text: 'The service \\' is up'"
| parse-kv str as (Text: string, ThreadId:long, Machine: string) with (quote="'", escape='\\')
| project-away str

Output

TextThreadIdMachine
The service ’ is up458745723Node001

Extraction using regex

When no delimiters define text structure enough, regular expression-based extraction can be useful.

print str=@'["referer url: https://hostname.com/redirect?dest=/?h=1234", "request url: https://hostname.com/?h=1234", "advertiser id: 24fefbca-cf27-4d62-a623-249c2ad30c73"]'
| parse-kv str as (['referer url']:string, ['request url']:string, ['advertiser id']: guid) with (regex=@'"([\w ]+)\s*:\s*([^"]*)"')
| project-away str

Output

referer urlrequest urladvertiser id
https://hostname.com/redirect?dest=/?h=1234https://hostname.com/?h=123424fefbca-cf27-4d62-a623-249c2ad30c73

22 - parse-where operator

Learn how to use the parse-where operator to parse the value of a string expression into one or more calculated columns.

Evaluates a string expression, and parses its value into one or more calculated columns. The result is only the successfully parsed strings.

parse-where parses the strings in the same way as parse, and filters out strings that were not parsed successfully.

See parse operator, which produces nulls for unsuccessfully parsed strings.

Syntax

T | parse-where [kind=kind [flags= regexFlags]] expression with * (stringConstant columnName [: columnType]) *

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input to parse.
kindstring✔️One of the supported kind values. The default value is simple.
regexFlagsstringIf kind is regex, then you can specify regex flags to be used like U for ungreedy, m for multi-line mode, s for match new line \n, and i for case-insensitive. More flags can be found in Flags.
expressionstring✔️An expression that evaluates to a string.
stringConstantstring✔️A string constant for which to search and parse.
columnNamestring✔️The name of a column to assign a value to, extracted from the string expression.
columnTypestringThe scalar value that indicates the type to convert the value to. The default is the string.

Supported kind values

TextDescription
simpleThis is the default value. stringConstant is a regular string value and the match is strict. All string delimiters should appear in the parsed string, and all extended columns must match the required types.
regexstringConstant may be a regular expression and the match is strict. All string delimiters, which can be a regex for this mode, should appear in the parsed string, and all extended columns must match the required types.

Regex mode

In regex mode, parse will translate the pattern to a regex and use regular expressions in order to do the matching using numbered captured groups that are handled internally. For example:

parse-where kind=regex Col with * <regex1> var1:string <regex2> var2:long

The regex that will be generated by the parse internally is .*?<regex1>(.*?)<regex2>(\-\d+).

  • * was translated to .*?.
  • string was translated to .*?.
  • long was translated to \-\d+.

Returns

The input table, which is extended according to the list of columns that are provided to the operator.

Examples

The examples in this section show how to use the syntax to help you get started.

The parse-where operator provides a streamlined way to extend a table by using multiple extract applications on the same string expression. This is most useful when the table has a string column that contains several values that you want to break into individual columns. For example, you can break up a column that was produced by a developer trace ("printf"/"Console.WriteLine") statement.

Using parse

In the example below, the column EventText of table Traces contains strings of the form Event: NotifySliceRelease (resourceName={0}, totalSlices= {1}, sliceNumber={2}, lockTime={3}, releaseTime={4}, previousLockTime={5}). The operation below will extend the table with six columns: resourceName , totalSlices, sliceNumber, lockTime, releaseTime, previousLockTime, Month, and Day.

A few of the strings don’t have a full match.

Using parse, the calculated columns will have nulls.

let Traces = datatable(EventText: string)
    [
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=invalid_number, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=invalid_datetime, previousLockTime=02/17/2016 08:39:00)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=invalid_number, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces  
| parse EventText with * "resourceName=" resourceName ", totalSlices=" totalSlices: long * "sliceNumber=" sliceNumber: long * "lockTime=" lockTime ", releaseTime=" releaseTime: date "," * "previousLockTime=" previouLockTime: date ")" *  
| project
    resourceName,
    totalSlices,
    sliceNumber,
    lockTime,
    releaseTime,
    previouLockTime

Output

resourceNametotalSlicessliceNumberlockTimereleaseTimepreviousLockTime
PipelineScheduler272002/17/2016 08:40:012016-02-17 08:40:01.00000002016-02-17 08:39:01.0000000
PipelineScheduler272202/17/2016 08:41:012016-02-17 08:41:00.00000002016-02-17 08:40:01.0000000

Using parse-where

Using ‘parse-where’ will filter-out unsuccessfully parsed strings from the result.

let Traces = datatable(EventText: string)
    [
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=invalid_number, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=invalid_datetime, previousLockTime=02/17/2016 08:39:00)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=invalid_number, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces  
| parse-where EventText with * "resourceName=" resourceName ", totalSlices=" totalSlices: long * "sliceNumber=" sliceNumber: long * "lockTime=" lockTime ", releaseTime=" releaseTime: date "," * "previousLockTime=" previousLockTime: date ")" *  
| project
    resourceName,
    totalSlices,
    sliceNumber,
    lockTime,
    releaseTime,
    previousLockTime

Output

resourceNametotalSlicessliceNumberlockTimereleaseTimepreviousLockTime
PipelineScheduler272002/17/2016 08:40:012016-02-17 08:40:01.00000002016-02-17 08:39:01.0000000
PipelineScheduler272202/17/2016 08:41:012016-02-17 08:41:00.00000002016-02-17 08:40:01.0000000

Regex mode using regex flags

To get the resourceName and totalSlices, use the following query:

let Traces = datatable(EventText: string)
    [
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=non_valid_integer, sliceNumber=11, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=02/17/2016 08:40:00, previousLockTime=02/17/2016 08:39:00)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=non_valid_integer, sliceNumber=44, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces
| parse-where kind = regex EventText with * "RESOURCENAME=" resourceName "," * "totalSlices=" totalSlices: long "," *
| project resourceName, totalSlices

Output

resourceNametotalSlices

parse-where with case-insensitive regex flag

In the above query, the default mode was case-sensitive, so the strings were parsed successfully. No result was obtained.

To get the required result, run parse-where with a case-insensitive (i) regex flag.

Only three strings will be parsed successfully, so the result is three records (some totalSlices hold invalid integers).

let Traces = datatable(EventText: string)
    [
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=non_valid_integer, sliceNumber=11, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=02/17/2016 08:40:00, previousLockTime=02/17/2016 08:39:00)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=non_valid_integer, sliceNumber=44, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces
| parse-where kind = regex flags=i EventText with * "RESOURCENAME=" resourceName "," * "totalSlices=" totalSlices: long "," *
| project resourceName, totalSlices

Output

resourceNametotalSlices
PipelineScheduler27
PipelineScheduler27
PipelineScheduler27

23 - partition operator

Learn how to use the partition operator to partition the records of the input table into multiple subtables.

The partition operator partitions the records of its input table into multiple subtables according to values in a key column. The operator runs a subquery on each subtable, and produces a single output table that is the union of the results of all subqueries.

The partition operator is useful when you need to perform a subquery only on a subset of rows that belong to the same partition key, and not a query of the whole dataset. These subqueries could include aggregate functions, window functions, top N and others.

The partition operator supports several strategies of subquery operation:

  • Native - use with an implicit data source with thousands of key partition values.
  • Shuffle - use with an implicit source with millions of key partition values.
  • Legacy - use with an implicit or explicit source for 64 or less key partition values.

Syntax

T | partition [ hint.strategy=Strategy ] [ Hints ] by Column ( TransformationSubQuery )

T | partition [ hint.strategy=legacy ] [ Hints ] by Column { SubQueryWithSource }

Parameters

NameTypeRequiredDescription
Tstring✔️The input tabular source.
StrategystringThe value legacy, shuffle, or native. This hint defines the execution strategy of the partition operator.

If no strategy is specified, the legacy strategy is used. For more information, see Strategies.
Columnstring✔️The name of a column in T whose values determine how to partition the input tabular source.
TransformationSubQuerystring✔️A tabular transformation expression. The source is implicitly the subtables produced by partitioning the records of T. Each subtable is homogenous on the value of Column.

The expression must provide only one tabular result and shouldn’t have other types of statements, such as let statements.
SubQueryWithSourcestring✔️A tabular expression that includes its own tabular source, such as a table reference. This syntax is only supported with the legacy strategy. The subquery can only reference the key column, Column, from T. To reference the column, use the syntax toscalar(Column).

The expression must provide only one tabular result and shouldn’t have other types of statements, such as let statements.
HintsstringZero or more space-separated parameters in the form of: HintName = Value that control the behavior of the operator. See the supported hints per strategy type.

Supported hints

Hint nameTypeStrategyDescription
hint.shufflekeystringshuffleThe partition key used to run the partition operator with the shuffle strategy.
hint.materializedboollegacyIf set to true, materializes the source of the partition operator. The default value is false.
hint.concurrencyintlegacyDetermines how many partitions to run in parallel. The default value is 16.
hint.spreadintlegacyDetermines how to distribute the partitions among cluster nodes. The default value is 1.

For example, if there are N partitions and the spread hint is set to P, then the N partitions are processed by P different cluster nodes equally, in parallel/sequentially depending on the concurrency hint.

Returns

The operator returns a union of the results of the individual subqueries.

Strategies

The partition operator supports several strategies of subquery operation: native, shuffle, and legacy.

Native strategy

This strategy should be applied when the number of distinct values of the partition key isn’t large, roughly in the thousands.

The subquery must be a tabular transformation that doesn’t specify a tabular source. The source is implicit and is assigned according to the subtable partitions. Only certain supported operators can be used in the subquery. There’s no restriction on the number of partitions.

To use this strategy, specify hint.strategy=native.

Shuffle strategy

This strategy should be applied when the number of distinct values of the partition key is large, in the millions.

The subquery must be a tabular transformation that doesn’t specify a tabular source. The source is implicit and is assigned according to the subtable partitions. Only certain supported operators can be used in the subquery. There’s no restriction on the number of partitions.

To use this strategy, specify hint.strategy=shuffle. For more information about shuffle strategy and performance, see shuffle query.

Supported operators for the native and shuffle strategies

The following list of operators can be used in subqueries with the native or shuffle strategies:

Legacy strategy

For historical reasons, the legacy strategy is the default strategy. However, we recommend favoring the native or shuffle strategies, as the legacy approach is limited to 64 partitions and is less efficient.

In some scenarios, the legacy strategy might be necessary due to its support for including a tabular source in the subquery. In such cases, the subquery can only reference the key column, Column, from the input tabular source, T. To reference the column, use the syntax toscalar(Column).

If the subquery is a tabular transformation without a tabular source, the source is implicit and is based on the subtable partitions.

To use this strategy, specify hint.strategy=legacy or omit any other strategy indication.

Examples

The examples in this section show how to use the syntax to help you get started.

Find top values

In some cases, it’s more performant and easier to write a query using the partition operator than using the top-nested operator. The following query runs a subquery calculating summarize and top for each State starting with W: “WYOMING”, “WASHINGTON”, “WEST VIRGINIA”, and “WISCONSIN”.

StormEvents
| where State startswith 'W'
| partition hint.strategy=native by State 
    (
    summarize Events=count(), Injuries=sum(InjuriesDirect) by EventType, State
    | top 3 by Events 
    ) 

Output

EventTypeStateEventsInjuries
HailWYOMING1080
High WindWYOMING815
Winter StormWYOMING720
Heavy SnowWASHINGTON820
High WindWASHINGTON5813
WildfireWASHINGTON290
Thunderstorm WindWEST VIRGINIA1801
HailWEST VIRGINIA1030
Winter WeatherWEST VIRGINIA880
Thunderstorm WindWISCONSIN4161
Winter StormWISCONSIN3100
HailWISCONSIN3031

Native strategy

The following query returns the top 2 EventType values by TotalInjuries for each State that starts with ‘W’:

StormEvents
| where State startswith 'W'
| partition hint.strategy = native by State
    (
    summarize TotalInjueries = sum(InjuriesDirect) by EventType
    | top 2 by TotalInjueries
    )

Output

EventTypeTotalInjueries
Tornado4
Hail1
Thunderstorm Wind1
Excessive Heat0
High Wind13
Lightning5
High Wind5
Avalanche3

Shuffle strategy

The following query returns the top 3 DamagedProperty values foreach EpisodeId and the columns EpisodeId and State.

StormEvents
| partition hint.strategy=shuffle by EpisodeId
    (
    top 3 by DamageProperty
    | project EpisodeId, State, DamageProperty
    )
| count

Output

Count
22345

Legacy strategy with explicit source

The following query runs two subqueries:

  • When x == 1, the query returns all rows from StormEvents that have InjuriesIndirect == 1.
  • When x == 2, the query returns all rows from StormEvents that have InjuriesIndirect == 2.

The final result is the union of these two subqueries.

range x from 1 to 2 step 1
| partition hint.strategy=legacy by x {StormEvents | where x == InjuriesIndirect}
| count 

Output

Count
113

Partition reference

The following example shows how to use the as operator to give a “name” to each data partition and then reuse that name within the subquery. This approach is only relevant to the legacy strategy.

T
| partition by Dim
(
    as Partition
    | extend MetricPct = Metric * 100.0 / toscalar(Partition | summarize sum(Metric))
)

24 - print operator

Learn how to use the print operator to output a single row with one or more scalar expression results as columns.

Outputs a single row with one or more scalar expression results as columns.

Syntax

print [ColumnName =] ScalarExpression [, …]

Parameters

NameTypeRequiredDescription
ColumnNamestringThe name to assign to the output column.
ScalarExpressionstring✔️The expression to evaluate.

Returns

A table with one or more columns and a single row. Each column returns the corresponding value of the evaluated ScalarExpression.

Examples

The examples in this section show how to use the syntax to help you get started.

The following example outputs a row with two columns. One column contains the sum of a series of numbers and the other column contains the value of the variable, x.

print 0 + 1 + 2 + 3 + 4 + 5, x = "Wow!"

Output

print_0x
15Wow!

The following example outputs the results of the strcat() function as a concatenated string.

print banner=strcat("Hello", ", ", "World!")

Output

banner
Hello, World!

25 - Project operator

Learn how to use the project operator to select columns to include, rename or drop, and to insert new computed columns in the output table.

Select the columns to include, rename or drop, and insert new computed columns.

The order of the columns in the result is specified by the order of the arguments. Only the columns specified in the arguments are included in the result. Any other columns in the input are dropped.

Syntax

T | project [ColumnName | (ColumnName[,]) =] Expression [, …]

or

T | project ColumnName [= Expression] [, …]

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input for which to project certain columns.
ColumnNamestringA column name or comma-separated list of column names to appear in the output.
ExpressionstringThe scalar expression to perform over the input.
  • Either ColumnName or Expression must be specified.
  • If there’s no Expression, then a column of ColumnName must appear in the input.
  • If ColumnName is omitted, the output column name of Expression will be automatically generated.
  • If Expression returns more than one column, a list of column names can be specified in parentheses. If a list of the column names isn’t specified, all Expression’s output columns with generated names will be added to the output.

Returns

A table with columns that were named as arguments. Contains same number of rows as the input table.

Examples

The examples in this section show how to use the syntax to help you get started.

Only show specific columns

Only show the EventId, State, EventType of the StormEvents table.

StormEvents
| project EventId, State, EventType

Output

The table shows the first 10 results.

EventIdStateEventType
61032ATLANTIC SOUTHWaterspout
60904FLORIDAHeavy Rain
60913FLORIDATornado
64588GEORGIAThunderstorm Wind
68796MISSISSIPPIThunderstorm Wind
68814MISSISSIPPITornado
68834MISSISSIPPIThunderstorm Wind
68846MISSISSIPPIHail
73241AMERICAN SAMOAFlash Flood
64725KENTUCKYFlood

Potential manipulations using project

The following query renames the BeginLocation column and creates a new column called TotalInjuries from a calculation over two existing columns.

StormEvents
| project StartLocation = BeginLocation, TotalInjuries = InjuriesDirect + InjuriesIndirect
| where TotalInjuries > 5

Output

The table shows the first 10 results.

StartLocationTotalInjuries
LYDIA15
ROYAL15
GOTHENBURG9
PLAINS8
KNOXVILLE9
CAROL STREAM11
HOLLY9
RUFFIN9
ENTERPRISE MUNI ARPT50
COLLIERVILLE6

26 - project-away operator

Learn how to use the project-away operator to select columns from the input table to exclude from the output table.

Select what columns from the input table to exclude from the output table.

Syntax

T | project-away ColumnNameOrPattern [, …]

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input from which to remove columns.
ColumnNameOrPatternstring✔️One or more column names or column wildcard-patterns to be removed from the output.

Returns

A table with columns that weren’t named as arguments. Contains same number of rows as the input table.

Examples

The input table PopulationData has 2 columns: State and Population. Project-away the Population column and you’re left with a list of state names.

PopulationData
| project-away Population

Output

The following table shows only the first 10 results.

State
ALABAMA
ALASKA
ARIZONA
ARKANSAS
CALIFORNIA
COLORADO
CONNECTICUT
DELAWARE
DISTRICT OF COLUMBIA
FLORIDA

Project-away using a column name pattern

This query removes columns starting with the word “session”.

ConferenceSessions
| project-away session*

Output

The table shows only the first 10 results.

conferenceownerparticipantsURLlevelstarttimedurationtime_and_durationkusto_affinity
PASS Summit 2019Avner Aharonihttps://www.eventbrite.com/e/near-real-time-interact-analytics-on-big-data-using-azure-data-explorer-fg-tickets-775327756192019-11-07T19:15:00ZThu, Nov 7, 11:15 AM-12:15 PM PSTFocused
PASS SummitRohan KumarAriel Pisetzkyhttps://www.pass.org/summit/2018/Learn/Keynotes.aspx2018-11-07T08:15:00Z90Wed, Nov 7, 8:15-9:45 amMention
Intelligent Cloud 2019Rohan KumarHenning Rauch2019-04-09T09:00:00Z90Tue, Apr 9, 9:00-10:30 AMMention
Ignite 2019Jie Fenghttps://myignite.techcommunity.microsoft.com/sessions/839401002019-11-06T14:35:00Z20Wed, Nov 6, 9:35 AM - 9:55 AMMention
Ignite 2019Bernhard RodeLe Hai Dang, Ricardo Niepelhttps://myignite.techcommunity.microsoft.com/sessions/815962002019-11-06T16:45:00Z45Wed, Nov 6, 11:45 AM-12:30 PMMention
Ignite 2019Tzvia GitlinTroynahttps://myignite.techcommunity.microsoft.com/sessions/839334002019-11-06T17:30:00Z75Wed, Nov 6, 12:30 PM-1:30 PMFocused
Ignite 2019Jie Fenghttps://myignite.techcommunity.microsoft.com/sessions/810573002019-11-06T20:30:00Z45Wed, Nov 6, 3:30 PM-4:15 PMMention
Ignite 2019Manoj Rahejahttps://myignite.techcommunity.microsoft.com/sessions/839393002019-11-07T18:15:00Z20Thu, Nov 7, 1:15 PM-1:35 PMFocused
Ignite 2019Uri Barashhttps://myignite.techcommunity.microsoft.com/sessions/810603002019-11-08T17:30:00Z45Fri, Nov8, 10:30 AM-11:15 AMFocused
Ignite 2018Manoj Rahejahttps://azure.microsoft.com/resources/videos/ignite-2018-azure-data-explorer-%E2%80%93-query-billions-of-records-in-seconds/20020Focused

27 - project-keep operator

Learn how to use the project-keep operator to select columns from the input to keep in the output.

Select what columns from the input to keep in the output. Only the columns that are specified as arguments will be shown in the result. The other columns are excluded.

Syntax

T | project-keep ColumnNameOrPattern [, …]

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input from which to keep columns.
ColumnNameOrPatternstring✔️One or more column names or column wildcard-patterns to be kept in the output.

Returns

A table with columns that were named as arguments. Contains same number of rows as the input table.

Example

This query returns columns from the ConferenceSessions table that contain the word “session”.

ConferenceSessions
| project-keep session*

Output

The output table shows only the first 10 results.

sessionidsession_titlesession_typesession_location
COM64Focus Group: Azure Data ExplorerFocus GroupOnline
COM65Focus Group: Azure Data ExplorerFocus GroupOnline
COM08Ask the Team: Azure Data ExplorerAsk the TeamOnline
COM137Focus Group: Built-In Dashboard and Smart Auto Scaling Capabilities in Azure Data ExplorerFocus GroupOnline
CON-PRT157Roundtable: Monitoring and managing your Azure Data Explorer deploymentsRoundtableOnline
CON-PRT103Roundtable: Advanced Kusto query language topicsRoundtableOnline
CON-PRT157Roundtable: Monitoring and managing your Azure Data Explorer deploymentsRoundtableOnline
CON-PRT103Roundtable: Advanced Kusto query language topicsRoundtableOnline
CON-PRT130Roundtable: Data exploration and visualization with Azure Data ExplorerRoundtableOnline
CON-PRT130Roundtable: Data exploration and visualization with Azure Data ExplorerRoundtableOnline

28 - project-rename operator

Learn how to use the project-rename operator to rename columns in the output table.

Renames columns in the output table.

Syntax

T | project-rename NewColumnName = ExistingColumnName [, …]

Parameters

NameTypeRequiredDescription
Tstring✔️The input tabular data.
NewColumnNamestring✔️The new column name.
ExistingColumnNamestring✔️The name of the existing column to rename.

Returns

A table that has the columns in the same order as in an existing table, with columns renamed.

Example

If you have a table with columns a, b, and c, and you want to rename a to new_a and b to new_b while keeping the same order, the query would look like this:

print a='alpha', b='bravo', c='charlie'
| project-rename new_a=a, new_b=b, new_c=c

Output

new_anew_bnew_c
alphabravocharlie

29 - project-reorder operator

Learn how to use the project-reorder operator to reorder columns in the output table.

Reorders columns in the output table.

Syntax

T | project-reorder ColumnNameOrPattern [asc | desc | granny-asc | granny-desc] [, …]

Parameters

NameTypeRequiredDescription
Tstring✔️The input tabular data.
ColumnNameOrPatternstring✔️The name of the column or column wildcard pattern by which to order the columns.
asc, desc, granny-asc, granny-descstringIndicates how to order the columns when a wildcard pattern is used. asc or desc orders columns by column name in ascending or descending manner, respectively. granny-asc or granny-desc orders by ascending or descending, respectively, while secondarily sorting by the next numeric value. For example, a20 comes before a100 when granny-asc is specified.

Returns

A table that contains columns in the order specified by the operator arguments. project-reorder doesn’t rename or remove columns from the table, therefore, all columns that existed in the source table, appear in the result table.

Examples

The examples in this section show how to use the syntax to help you get started.

Reorder with b first

Reorder a table with three columns (a, b, c) so the second column (b) will appear first.

print a='a', b='b', c='c'
|  project-reorder b

Output

bac
bac

Reorder with a first

Reorder columns of a table so that columns starting with a will appear before other columns.

print b = 'b', a2='a2', a3='a3', a1='a1'
|  project-reorder a* asc

Output

a1a2a3b
a1a2a3b

30 - Queries

Learn how to use queries to explore and process data in the context of databases.

A query is a read-only operation against data ingested into your cluster. Queries always run in the context of a particular database in the cluster. They may also refer to data in another database, or even in another cluster.

As ad-hoc query of data is the top-priority scenario for Kusto, the Kusto Query Language syntax is optimized for non-expert users authoring and running queries over their data and being able to understand unambiguously what each query does (logically).

The language syntax is that of a data flow, where “data” means “tabular data” (data in one or more rows/columns rectangular shape). At a minimum, a query consists of source data references (references to Kusto tables) and one or more query operators applied in sequence, indicated visually by the use of a pipe character (|) to delimit operators.

For example:

StormEvents 
| where State == 'FLORIDA' and StartTime > datetime(2000-01-01)
| count

Each filter prefixed by the pipe character | is an instance of an operator, with some parameters. The input to the operator is the table that is the result of the preceding pipeline. In most cases, any parameters are scalar expressions over the columns of the input. In a few cases, the parameters are the names of input columns, and in a few cases, the parameter is a second table. The result of a query is always a table, even if it only has one column and one row.

T is used in query to denote the preceding pipeline or source table.

31 - range operator

Learn how to use the range operator to generate a single-column table of values.

Generates a single-column table of values.

Syntax

range columnName from start to stop step step

Parameters

NameTypeRequiredDescription
columnNamestring✔️The name of the single column in the output table.
startint, long, real, datetime, or timespan✔️The smallest value in the output.
stopint, long, real, datetime, or timespan✔️The highest value being generated in the output or a bound on the highest value if step is over this value.
stepint, long, real, datetime, or timespan✔️The difference between two consecutive values.

Returns

A table with a single column called columnName, whose values are start, start + step, … up to and until stop.

Examples

The example in this section shows how to use the syntax to help you get started.

Range over the past seven days

The following example creates a table with entries for the current time stamp extended over the past seven days, once a day.

range LastWeek from ago(7d) to now() step 1d

Output

LastWeek
2015-12-05 09:10:04.627
2015-12-06 09:10:04.627
2015-12-12 09:10:04.627

Combine different stop times

The following example shows how to extend ranges to use multiple stop times by using the union operator.

let Range1 = range Time from datetime(2024-01-01) to datetime(2024-01-05) step 1d;
let Range2 = range Time from datetime(2024-01-06) to datetime(2024-01-10) step 1d;
union Range1, Range2
| order by Time asc

Output

Time
2024-01-04 00:00:00.0000000
2024-01-05 00:00:00.0000000
2024-01-06 00:00:00.0000000
2024-01-07 00:00:00.0000000
2024-01-08 00:00:00.0000000
2024-01-09 00:00:00.0000000
2024-01-10 00:00:00.0000000

Range using parameters

The following example shows how to use the range operator with parameters, which are then extended and consumed as a table.

let toUnixTime = (dt:datetime) 
{ 
    (dt - datetime(1970-01-01)) / 1s 
};
let MyMonthStart = startofmonth(now()); //Start of month
let StepBy = 4.534h; //Supported timespans
let nn = 64000; // Row Count parametrized
let MyTimeline = range MyMonthHour from MyMonthStart to now() step StepBy
| extend MyMonthHourinUnixTime = toUnixTime(MyMonthHour), DateOnly = bin(MyMonthHour,1d), TimeOnly = MyMonthHour - bin(MyMonthHour,1d)
; MyTimeline | order by MyMonthHour asc | take nn

Output

MyMonthHourMyMonthHourinUnixTimeDateOnlyTimeOnly
2023-02-0100:00:00.000000016752096002023-02-01 00:00:00.0000000
2023-02-0104:32:02.40000001675225922.42023-02-01 00:00:00.0000000
2023-02-0109:04:04.80000001675242244.82023-02-01 00:00:00.0000000
2023-02-0113:36:07.20000001675258567.22023-02-01 00:00:00.0000000

Incremented steps

The following example creates a table with a single column called Steps whose type is long and results in values from one to eight incremented by three.

range Steps from 1 to 8 step 3

Output

Steps
1
4
7

Traces over a time range

The following example shows how the range operator can be used to create a dimension table that is used to introduce zeros where the source data has no values. It takes timestamps from the last four hours and counts traces for each one-minute interval. When there are no traces for a specific interval, the count is zero.

range TIMESTAMP from ago(4h) to now() step 1m
| join kind=fullouter
  (Traces
      | where TIMESTAMP > ago(4h)
      | summarize Count=count() by bin(TIMESTAMP, 1m)
  ) on TIMESTAMP
| project Count=iff(isnull(Count), 0, Count), TIMESTAMP
| render timechart  

32 - reduce operator

Learn how to use the reduce operator to group a set of strings together based on value similarity.

Groups a set of strings together based on value similarity.

For each such group, the operator returns a pattern, count, and representative. The pattern best describes the group, in which the * character represents a wildcard. The count is the number of values in the group, and the representative is one of the original values in the group.

Syntax

T | reduce [kind = ReduceKind] by Expr [with [threshold = Threshold] [, characters = Characters]]

Parameters

NameTypeRequiredDescription
Exprstring✔️The value by which to reduce.
ThresholdrealA value between 0 and 1 that determines the minimum fraction of rows required to match the grouping criteria in order to trigger a reduction operation. The default value is 0.1.

We recommend setting a small threshold value for large inputs. With a smaller threshold value, more similar values are grouped together, resulting in fewer but more similar groups. A larger threshold value requires less similarity, resulting in more groups that are less similar. See Examples.
ReduceKindstringThe only valid value is source. If source is specified, the operator appends the Pattern column to the existing rows in the table instead of aggregating by Pattern.

Returns

A table with as many rows as there are groups and columns titled pattern, count, and representative. The pattern best describes the group, in which the * character represents a wildcard, or placeholder for an arbitrary insertion string. The count is the number of values in the group, and the representative is one of the original values in the group.

For example, the result of reduce by city might include:

PatternCountRepresentative
San *5182San Bernard
Saint *2846Saint Lucy
Moscow3726Moscow
* -on- *2730One -on- One
Paris2716Paris

Examples

The example in this section shows how to use the syntax to help you get started.

Small threshold value

This query generates a range of numbers, creates a new column with concatenated strings and random integers, and then groups the rows by the new column with specific reduction parameters.

range x from 1 to 1000 step 1
| project MyText = strcat("MachineLearningX", tostring(toint(rand(10))))
| reduce by MyText  with threshold=0.001 , characters = "X" 

Output

PatternCountRepresentative
MachineLearning*1000MachineLearningX4

Large threshold value

This query generates a range of numbers, creates a new column with concatenated strings and random integers, and then groups the rows by the new column with specific reduction parameters.

range x from 1 to 1000 step 1
| project MyText = strcat("MachineLearningX", tostring(toint(rand(10))))
| reduce by MyText  with threshold=0.9 , characters = "X" 

Output

The result includes only those groups where the MyText value appears in at least 90% of the rows.

PatternCountRepresentative
MachineLearning*177MachineLearningX9
MachineLearning*102MachineLearningX0
MachineLearning*106MachineLearningX1
MachineLearning*96MachineLearningX6
MachineLearning*110MachineLearningX4
MachineLearning*100MachineLearningX3
MachineLearning*99MachineLearningX8
MachineLearning*104MachineLearningX7
MachineLearning*106MachineLearningX2

Behavior of Characters parameter

If the Characters parameter is unspecified, then every non-ascii numeric character becomes a term separator.

range x from 1 to 10 step 1 | project str = strcat("foo", "Z", tostring(x)) | reduce by str

Output

PatternCountRepresentative
others10

However, if you specify that “Z” is a separator, then it’s as if each value in str is two terms: foo and tostring(x):

range x from 1 to 10 step 1 | project str = strcat("foo", "Z", tostring(x)) | reduce by str with characters="Z"

Output

PatternCountRepresentative
foo*10fooZ1

Apply reduce to sanitized input

The following example shows how one might apply the reduce operator to a “sanitized” input, in which GUIDs in the column being reduced are replaced before reducing:

Start with a few records from the Trace table. Then reduce the Text column which includes random GUIDs. As random GUIDs interfere with the reduce operation, replace them all by the string “GUID”. Now perform the reduce operation. In case there are other “quasi-random” identifiers with embedded ‘-’ or ‘_’ characters in them, treat characters as non-term-breakers.

Trace
| take 10000
| extend Text = replace(@"[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}", "GUID", Text)
| reduce by Text with characters="-_"

33 - sample operator

Learn how to use the sample operator to return up to the specified number of rows from the input table.

Returns up to the specified number of random rows from the input table.

Syntax

T | sample NumberOfRows

Parameters

NameTypeRequiredDescription
Tstring✔️The input tabular expression.
NumberOfRowsint, long, or real✔️The number of rows to return. You can specify any numeric expression.

Examples

The example in this section shows how to use the syntax to help you get started.

Generate a sample

This query creates a range of numbers, samples one value, and then duplicates that sample.

let _data = range x from 1 to 100 step 1;
let _sample = _data | sample 1;
union (_sample), (_sample)

Output

x
74
63

To ensure that in example above _sample is calculated once, one can use materialize() function:

let _data = range x from 1 to 100 step 1;
let _sample = materialize(_data | sample 1);
union (_sample), (_sample)

Output

x
24
24

Generate a sample of a certain percentage of data

To sample a certain percentage of your data (rather than a specified number of rows), you can use

StormEvents | where rand() < 0.1

Output

The table contains the first few rows of the output. Run the query to view the full result.

StartTimeEndTimeEpisodeIdEventIdStateEventType
2007-01-01T00:00:00Z2007-01-20T10:24:00Z240311914INDIANAFlood
2007-01-01T00:00:00Z2007-01-24T18:47:00Z240811930INDIANAFlood
2007-01-01T00:00:00Z2007-01-01T12:00:00Z197912631DELAWAREHeavy Rain
2007-01-01T00:00:00Z2007-01-01T00:00:00Z259213208NORTH CAROLINAThunderstorm Wind
2007-01-01T00:00:00Z2007-01-31T23:59:00Z14927069MINNESOTADrought
2007-01-01T00:00:00Z2007-01-31T23:59:00Z224010858TEXASDrought

Generate a sample of keys

To sample keys rather than rows (for example - sample 10 Ids and get all rows for these Ids), you can use sample-distinct in combination with the in operator.

let sampleEpisodes = StormEvents | sample-distinct 10 of EpisodeId;
StormEvents
| where EpisodeId in (sampleEpisodes)

Output

The table contains the first few rows of the output. Run the query to view the full result.

StartTimeEndTimeEpisodeIdEventIdStateEventType
2007-09-18T20:00:00Z2007-09-19T18:00:00Z1107460904FLORIDAHeavy Rain
2007-09-20T21:57:00Z2007-09-20T22:05:00Z1107860913FLORIDATornado
2007-09-29T08:11:00Z2007-09-29T08:11:00Z1109161032ATLANTIC SOUTHWaterspout
2007-12-07T14:00:00Z2007-12-08T04:00:00Z1318373241AMERICAN SAMOAFlash Flood
2007-12-11T21:45:00Z2007-12-12T16:45:00Z1282670787KANSASFlood
2007-12-13T09:02:00Z2007-12-13T10:30:00Z1178064725KENTUCKYFlood

34 - sample-distinct operator

Learn how to use the sample-distinct operator to return a column that contains up to the specified number of distinct values of the requested columns.

Returns a single column that contains up to the specified number of distinct values of the requested column.

The operator tries to return an answer as quickly as possible rather than trying to make a fair sample.

Syntax

T | sample-distinct NumberOfValues of ColumnName

Parameters

NameTypeRequiredDescription
Tstring✔️The input tabular expression.
NumberOfValuesint, long, or real✔️The number distinct values of T to return. You can specify any numeric expression.
ColumnNamestring✔️The name of the column from which to sample.

Examples

The example in this section shows how to use the syntax to help you get started.

Get 10 distinct values from a population

StormEvents | sample-distinct 10 of EpisodeId

Output

EpisodeId
11074
11078
11749
12554
12561
13183
11780
11781
12826

Further compute the sample values

let sampleEpisodes = StormEvents | sample-distinct 10 of EpisodeId;
StormEvents 
| where EpisodeId in (sampleEpisodes) 
| summarize totalInjuries=sum(InjuriesDirect) by EpisodeId

Output

EpisodeIdtotalInjuries
110910
110740
110780
117490
125543
125610
131830
117800
117810
128260

35 - scan operator

Learn how to use the scan operator to scan data, match, and build sequences based on the predicates.

Scans data, matches, and builds sequences based on the predicates.

Matching records are determined according to predicates defined in the operator’s steps. A predicate can depend on the state that is generated by previous steps. The output for the matching record is determined by the input record and assignments defined in the operator’s steps.

Syntax

T | scan [ with_match_id = MatchIdColumnName ] [ declare ( ColumnDeclarations ) ] with ( StepDefinitions )

ColumnDeclarations syntax

ColumnName : ColumnType[= DefaultValue ] [, … ]

StepDefinition syntax

step StepName [ output = all | last | none] : Condition [ => Column = Assignment [, … ] ] ;

Parameters

NameTypeRequiredDescription
Tstring✔️The input tabular source.
MatchIdColumnNamestringThe name of a column of type long that is appended to the output as part of the scan execution. Indicates the 0-based index of the match for the record.
ColumnDeclarationsstringDeclares an extension to the schema of T. These columns are assigned values in the steps. If not assigned, the DefaultValue is returned. Unless otherwise specified, DefaultValue is null.
StepNamestring✔️Used to reference values in the state of scan for conditions and assignments. The step name must be unique.
Conditionstring✔️An expression that evaluates to true or false that defines which records from the input match the step. A record matches the step when the condition is true with the step’s state or with the previous step’s state.
AssignmentstringA scalar expression that is assigned to the corresponding column when a record matches a step.
outputstringControls the output logic of the step on repeated matches. all outputs all records matching the step, last outputs only the last record in a series of repeating matches for the step, and none doesn’t output records matching the step. The default is all.

Returns

A record for each match of a record from the input to a step. The schema of the output is the schema of the source extended with the column in the declare clause.

Scan logic

scan goes over the serialized input data, record by record, comparing each record against each step’s condition while taking into account the current state of each step.

State

The underlying state of the scan operator can be thought of as a table with a row for each step. Each step maintains its own state with the latest values of the columns and declared variables from all of the previous steps and the current step. If relevant, it also holds the match ID for the ongoing sequence.

If a scan operator has n steps named s_1, s_2, …, s_n then step s_k would have k records in its state corresponding to s_1, s_2, …, s_k. The StepName.ColumnName format is used to reference a value in the state. For instance, s_2.col1 would reference column col1 that belongs to step s_2 in the state of s_k. For a detailed example, see the scan logic walkthrough.

The state starts empty and updates whenever a scanned input record matches a step. When the state of the current step is nonempty, the step is referred to as having an active sequence.

Matching logic

Each input record is evaluated against all of the steps in reverse order, from the last step to the first. When a record r is evaluated against some step s_k, the following logic is applied:

  • Check 1: If the state of the previous step (s_k-1) is nonempty, and r meets the Condition of s_k, then a match occurs. The match leads to the following actions:

    1. The state of s_k is cleared.
    2. The state of s_k-1 is promoted to become the state of s_k.
    3. The assignments of s_k are calculated and extend r.
    4. The extended r is added to the output and to the state of s_k.

    [!NOTE] If Check 1 results in a match, Check 2 is disregarded, and r moves on to be evaluated against s_k-1.

  • Check 2: If the state of s_k has an active sequence or s_k is the first step, and r meets the Condition of s_k, then a match occurs. The match leads to the following actions:

    1. The assignments of s_k are calculated and extend r.
    2. The values that represent s_k in the state of s_k are replaced with the values of the extended r.
    3. If s_k is defined as output=all, the extended r is added to the output.
    4. If s_k is the first step, a new sequence begins and the match ID increases by 1. This only affects the output when with_match_id is used.

Once the checks for s_k are complete, r moves on to be evaluated against s_k-1.

For a detailed example of this logic, see the scan logic walkthrough.

Examples

The example in this section shows how to use the syntax to help you get started.

Cumulative sum

Calculate the cumulative sum for an input column. The result of this example is equivalent to using row_cumsum().

range x from 1 to 5 step 1 
| scan declare (cumulative_x:long=0) with 
(
    step s1: true => cumulative_x = x + s1.cumulative_x;
)

Output

xcumulative_x
11
23
36
410
515

Cumulative sum on multiple columns with a reset condition

Calculate the cumulative sum for two input columns, reset the sum value to the current record value whenever the cumulative sum reached 10 or more.

range x from 1 to 5 step 1
| extend y = 2 * x
| scan declare (cumulative_x:long=0, cumulative_y:long=0) with 
(
    step s1: true => cumulative_x = iff(s1.cumulative_x >= 10, x, x + s1.cumulative_x), 
                     cumulative_y = iff(s1.cumulative_y >= 10, y, y + s1.cumulative_y);
)

Output

xycumulative_xcumulative_y
1212
2436
36612
48108
510518

Fill forward a column

Fill forward a string column. Each empty value is assigned the last seen nonempty value.

let Events = datatable (Ts: timespan, Event: string) [
    0m, "A",
    1m, "",
    2m, "B",
    3m, "",
    4m, "",
    6m, "C",
    8m, "",
    11m, "D",
    12m, ""
]
;
Events
| sort by Ts asc
| scan declare (Event_filled: string="") with 
(
    step s1: true => Event_filled = iff(isempty(Event), s1.Event_filled, Event);
)

Output

TsEventEvent_filled
00:00:00AA
00:01:00A
00:02:00BB
00:03:00B
00:04:00B
00:06:00CC
00:08:00C
00:11:00DD
00:12:00D

Sessions tagging

Divide the input into sessions: a session ends 30 minutes after the first event of the session, after which a new session starts. Note the use of with_match_id flag, which assigns a unique value for each distinct match (session) of scan. Also note the special use of two steps in this example, inSession has true as condition so it captures and outputs all the records from the input while endSession captures records that happen more than 30m from the sessionStart value for the current match. The endSession step has output=none meaning it doesn’t produce output records. The endSession step is used to advance the state of the current match from inSession to endSession, allowing a new match (session) to begin, starting from the current record.

let Events = datatable (Ts: timespan, Event: string) [
    0m, "A",
    1m, "A",
    2m, "B",
    3m, "D",
    32m, "B",
    36m, "C",
    38m, "D",
    41m, "E",
    75m, "A"
]
;
Events
| sort by Ts asc
| scan with_match_id=session_id declare (sessionStart: timespan) with 
(
    step inSession: true => sessionStart = iff(isnull(inSession.sessionStart), Ts, inSession.sessionStart);
    step endSession output=none: Ts - inSession.sessionStart > 30m;
)

Output

TsEventsessionStartsession_id
00:00:00A00:00:000
00:01:00A00:00:000
00:02:00B00:00:000
00:03:00D00:00:000
00:32:00B00:32:001
00:36:00C00:32:001
00:38:00D00:32:001
00:41:00E00:32:001
01:15:00A01:15:002

Events between Start and Stop

Find all sequences of events between the event Start and the event Stop that occur within 5 minutes. Assign a match ID for each sequence.

let Events = datatable (Ts: timespan, Event: string) [
    0m, "A",
    1m, "Start",
    2m, "B",
    3m, "D",
    4m, "Stop",
    6m, "C",
    8m, "Start",
    11m, "E",
    12m, "Stop"
]
;
Events
| sort by Ts asc
| scan with_match_id=m_id with 
(
    step s1: Event == "Start";
    step s2: Event != "Start" and Event != "Stop" and Ts - s1.Ts <= 5m;
    step s3: Event == "Stop" and Ts - s1.Ts <= 5m;
)

Output

TsEventm_id
00:01:00Start0
00:02:00B0
00:03:00D0
00:04:00Stop0
00:08:00Start1
00:11:00E1
00:12:00Stop1

Calculate a custom funnel of events

Calculate a funnel completion of the sequence Hail -> Tornado -> Thunderstorm Wind by State with custom thresholds on the times between the events (Tornado within 1h and Thunderstorm Wind within 2h). This example is similar to the funnel_sequence_completion plugin, but allows greater flexibility.

StormEvents
| partition hint.strategy=native by State 
    (
    sort by StartTime asc
    | scan with 
    (
        step hail: EventType == "Hail";
        step tornado: EventType == "Tornado" and StartTime - hail.StartTime <= 1h;
        step thunderstormWind: EventType == "Thunderstorm Wind" and StartTime - tornado.StartTime <= 2h;
    )
    )
| summarize dcount(State) by EventType

Output

EventTypedcount_State
Hail50
Tornado34
Thunderstorm Wind32

Scan logic walkthrough

This section demonstrates the scan logic using a step-by-step walkthrough of the Events between start and stop example:

let Events = datatable (Ts: timespan, Event: string) [
    0m, "A",
    1m, "Start",
    2m, "B",
    3m, "D",
    4m, "Stop",
    6m, "C",
    8m, "Start",
    11m, "E",
    12m, "Stop"
]
;
Events
| sort by Ts asc
| scan with_match_id=m_id with 
(
    step s1: Event == "Start";
    step s2: Event != "Start" and Event != "Stop" and Ts - s1.Ts <= 5m;
    step s3: Event == "Stop" and Ts - s1.Ts <= 5m;
)

Output

TsEventm_id
00:01:00Start0
00:02:00B0
00:03:00D0
00:04:00Stop0
00:08:00Start1
00:11:00E1
00:12:00Stop1

The state

Think of the state of the scan operator as a table with a row for each step, in which each step has its own state. This state contains the latest values of the columns and declared variables from all of the previous steps and the current step. To learn more, see State.

For this example, the state can be represented with the following table:

stepm_ids1.Tss1.Events2.Tss2.Events3.Tss3.Event
s1XXXX
s2XX
s3

The “X” indicates that a specific field is irrelevant for that step.

The matching logic

This section follows the matching logic through each record of the Events table, explaining the transformation of the state and output at each step.

Record 1

TsEvent
0m“A”

Record evaluation at each step:

  • s3: Check 1 isn’t passed because the state of s2 is empty, and Check 2 isn’t passed because s3 lacks an active sequence.
  • s2: Check 1 isn’t passed because the state of s1 is empty, and Check 2 isn’t passed because s2 lacks an active sequence.
  • s1: Check 1 is irrelevant because there’s no previous step. Check 2 isn’t passed because the record doesn’t meet the condition of Event == "Start". Record 1 is discarded without affecting the state or output.

State:

stepm_ids1.Tss1.Events2.Tss2.Events3.Tss3.Event
s1XXXX
s2XX
s3

Record 2

TsEvent
1m“Start”

Record evaluation at each step:

  • s3: Check 1 isn’t passed because the state of s2 is empty, and Check 2 isn’t passed because s3 lacks an active sequence.
  • s2: Check 1 isn’t passed because the state of s1 is empty, and Check 2 isn’t passed because s2 lacks an active sequence.
  • s1: Check 1 is irrelevant because there’s no previous step. Check 2 is passed because the record meets the condition of Event == "Start". This match initiates a new sequence, and the m_id is assigned. Record 2 and its m_id (0) are added to the state and the output.

State:

stepm_ids1.Tss1.Events2.Tss2.Events3.Tss3.Event
s1000:01:00“Start”XXXX
s2XX
s3

Record 3

TsEvent
2m“B”

Record evaluation at each step:

  • s3: Check 1 isn’t passed because the state of s2 is empty, and Check 2 isn’t passed because s3 lacks an active sequence.
  • s2: Check 1 is passed because the state of s1 is nonempty and the record meets the condition of Ts - s1.Ts < 5m. This match causes the state of s1 to be cleared and the sequence in s1 to be promoted to s2. Record 3 and its m_id (0) are added to the state and the output.
  • s1: Check 1 is irrelevant because there’s no previous step, and Check 2 isn’t passed because the record doesn’t meet the condition of Event == "Start".

State:

stepm_ids1.Tss1.Events2.Tss2.Events3.Tss3.Event
s1XXXX
s2000:01:00“Start”00:02:00“B”XX
s3

Record 4

TsEvent
3m“D”

Record evaluation at each step:

  • s3: Check 1 isn’t passed because the record doesn’t meet the condition of Event == "Stop", and Check 2 isn’t passed because s3 lacks an active sequence.
  • s2: Check 1 isn’t passed because the state of s1 is empty. it passes Check 2 because it meets the condition of Ts - s1.Ts < 5m. Record 4 and its m_id (0) are added to the state and the output. The values from this record overwrite the previous state values for s2.Ts and s2.Event.
  • s1: Check 1 is irrelevant because there’s no previous step, and Check 2 isn’t passed because the record doesn’t meet the condition of Event == "Start".

State:

stepm_ids1.Tss1.Events2.Tss2.Events3.Tss3.Event
s1XXXX
s2000:01:00“Start”00:03:00“D”XX
s3

Record 5

TsEvent
4m“Stop”

Record evaluation at each step:

  • s3: Check 1 is passed because s2 is nonempty and it meets the s3 condition of Event == "Stop". This match causes the state of s2 to be cleared and the sequence in s2 to be promoted to s3. Record 5 and its m_id (0) are added to the state and the output.
  • s2: Check 1 isn’t passed because the state of s1 is empty, and Check 2 isn’t passed because s2 lacks an active sequence.
  • s1: Check 1 is irrelevant because there’s no previous step. Check 2 isn’t passed because the record doesn’t meet the condition of Event == "Start".

State:

stepm_ids1.Tss1.Events2.Tss2.Events3.Tss3.Event
s1XXXX
s2XX
s3000:01:00“Start”00:03:00“D”00:04:00“Stop”

Record 6

TsEvent
6m“C”

Record evaluation at each step:

  • s3: Check 1 isn’t passed because the state of s2 is empty, and Check 2 isn’t passed because s3 doesn’t meet the s3 condition of Event == "Stop".
  • s2: Check 1 isn’t passed because the state of s1 is empty, and Check 2 isn’t passed because s2 lacks an active sequence.
  • s1: Check 1 isn’t passed because there’s no previous step, and Check 2 isn’t passed because it doesn’t meet the condition of Event == "Start". Record 6 is discarded without affecting the state or output.

State:

stepm_ids1.Tss1.Events2.Tss2.Events3.Tss3.Event
s1XXXX
s2XX
s3000:01:00“Start”00:03:00“D”00:04:00“Stop”

Record 7

TsEvent
8m“Start”

Record evaluation at each step:

  • s3: Check 1 isn’t passed because the state of s2 is empty, and Check 2 isn’t passed because it doesn’t meet the condition of Event == "Stop".
  • s2: Check 1 isn’t passed because the state of s1 is empty, and Check 2 isn’t passed because s2 lacks an active sequence.
  • s1: Check 1 isn’t passed because there’s no previous step. it passes Check 2 because it meets the condition of Event == "Start". This match initiates a new sequence in s1 with a new m_id. Record 7 and its m_id (1) are added to the state and the output.

State:

stepm_ids1.Tss1.Events2.Tss2.Events3.Tss3.Event
s1100:08:00“Start”XXXX
s2XX
s3000:01:00“Start”00:03:00“D”00:04:00“Stop”

Record 8

TsEvent
11m“E”

Record evaluation at each step:

  • s3: Check 1 isn’t passed because the state of s2 is empty, and Check 2 isn’t passed because it doesn’t meet the s3 condition of Event == "Stop".
  • s2: Check 1 is passed because the state of s1 is nonempty and the record meets the condition of Ts - s1.Ts < 5m. This match causes the state of s1 to be cleared and the sequence in s1 to be promoted to s2. Record 8 and its m_id (1) are added to the state and the output.
  • s1: Check 1 is irrelevant because there’s no previous step, and Check 2 isn’t passed because the record doesn’t meet the condition of Event == "Start".

State:

stepm_ids1.Tss1.Events2.Tss2.Events3.Tss3.Event
s1XXXX
s2100:08:00“Start”00:11:00“E”XX
s3000:01:00“Start”00:03:00“D”00:04:00“Stop”

Record 9

TsEvent
12m“Stop”

Record evaluation at each step:

  • s3: Check 1 is passed because s2 is nonempty and it meets the s3 condition of Event == "Stop". This match causes the state of s2 to be cleared and the sequence in s2 to be promoted to s3. Record 9 and its m_id (1) are added to the state and the output.
  • s2: Check 1 isn’t passed because the state of s1 is empty, and Check 2 isn’t passed because s2 lacks an active sequence.
  • s1: Check 1 isn’t passed because there’s no previous step. it passes Check 2 because it meets the condition of Event == "Start". This match initiates a new sequence in s1 with a new m_id.

State:

stepm_ids1.Tss1.Events2.Tss2.Events3.Tss3.Event
s1XXXX
s2XX
s3100:08:00“Start”00:11:00“E”00:12:00“Stop”

Final output

TsEventm_id
00:01:00Start0
00:02:00B0
00:03:00D0
00:04:00Stop0
00:08:00Start1
00:11:00E1
00:12:00Stop1

36 - search operator

Learn how to use the search operator to search for a text pattern in multiple tables and columns.

Searches a text pattern in multiple tables and columns.

Syntax

[T |] search [kind= CaseSensitivity ] [in (TableSources)] SearchPredicate

Parameters

NameTypeRequiredDescription
TstringThe tabular data source to be searched over, such as a table name, a union operator, or the results of a tabular query. Can’t be specified together with TableSources.
CaseSensitivitystringA flag that controls the behavior of all string scalar operators, such as has, with respect to case sensitivity. Valid values are default, case_insensitive, case_sensitive. The options default and case_insensitive are synonymous, since the default behavior is case insensitive.
TableSourcesstringA comma-separated list of “wildcarded” table names to take part in the search. The list has the same syntax as the list of the union operator. Can’t be specified together with tabular data source (T).
SearchPredicatestring✔️A boolean expression to be evaluated for every record in the input. If it returns true, the record is outputted. See Search predicate syntax.

Search predicate syntax

The SearchPredicate allows you to search for specific terms in all columns of a table. The operator that is applied to a search term depends on the presence and placement of a wildcard asterisk (*) in the term, as shown in the following table.

LiteralOperator
billghas
*billghassuffix
billg*hasprefix
*billg*contains
bi*lgmatches regex

You can also restrict the search to a specific column, look for an exact match instead of a term match, or search by regular expression. The syntax for each of these cases is shown in the following table.

SyntaxExplanation
ColumnName:StringLiteralThis syntax can be used to restrict the search to a specific column. The default behavior is to search all columns.
ColumnName==StringLiteralThis syntax can be used to search for exact matches of a column against a string value. The default behavior is to look for a term-match.
Column matches regex StringLiteralThis syntax indicates regular expression matching, in which StringLiteral is the regex pattern.

Use boolean expressions to combine conditions and create more complex searches. For example, "error" and x==123 would result in a search for records that have the term error in any columns and the value 123 in the x column.

Search predicate syntax examples

#SyntaxMeaning (equivalent where)Comments
1search "err"where * has "err"
2search in (T1,T2,A*) "err"union T1,T2,A* | where * has “err”
3search col:"err"where col has "err"
4search col=="err"where col=="err"
5search "err*"where * hasprefix "err"
6search "*err"where * hassuffix "err"
7search "*err*"where * contains "err"
8search "Lab*PC"where * matches regex @"\bLab.*PC\b"
9search *where 0==0
10search col matches regex "..."where col matches regex "..."
11search kind=case_sensitiveAll string comparisons are case-sensitive
12search "abc" and ("def" or "hij")where * has "abc" and (* has "def" or * has hij")
13search "err" or (A>a and A<b)where * has "err" or (A>a and A<b)

Remarks

Unlike the find operator, the search operator doesn’t support the following syntax:

  1. withsource=: The output always includes a column called $table of type string whose value is the table name from which each record was retrieved (or some system-generated name if the source isn’t a table but a composite expression).
  2. project=, project-smart: The output schema is equivalent to project-smart output schema.

Examples

The example in this section shows how to use the syntax to help you get started.

Search for the term Green in all the tables of the ContosoSales database.

The output finds records with the term Green as a last name or a color in the Customers, Products, and SalesTable tables.

 search "Green"

Output

$tableCityNameContinentNameCustomerKeyEducationFirstNameGenderLastName
CustomersBallardNorth America16549Partial CollegeMasonMGreen
CustomersBellinghamNorth America2070High SchoolAdamMGreen
CustomersBellinghamNorth America10658BachelorsSaraFGreen
CustomersBeverly HillsNorth America806Graduate DegreeRichardMGreen
CustomersBeverly HillsNorth America7674Graduate DegreeJamesMGreen
CustomersBurbankNorth America5241Graduate DegreeMadelineFGreen

Search for records that contain the term Green and one of either terms Deluxe or Proseware in the ContosoSales database.

search "Green" and ("Deluxe" or "Proseware")

Output

$tableProductNameManufacturerColorNameClassNameProductCategoryName
ProductsContoso 8GB Clock & Radio MP3 Player X850 GreenContoso, LtdGreenDeluxeAudio
ProductsProseware Scan Jet Digital Flat Bed Scanner M300 GreenProseware, Inc.GreenRegularComputers
ProductsProseware All-In-One Photo Printer M200 GreenProseware, Inc.GreenRegularComputers
ProductsProseware Ink Jet Wireless All-In-One Printer M400 GreenProseware, Inc.GreenRegularComputers
ProductsProseware Ink Jet Instant PDF Sheet-Fed Scanner M300 GreenProseware, Inc.GreenRegularComputers
ProductsProseware Desk Jet All-in-One Printer, Scanner, Copier M350 GreenProseware, Inc.GreenRegularComputers
ProductsProseware Duplex Scanner M200 GreenProseware, Inc.GreenRegularComputers

Search a specific table

Search for the term Green only in the Customers table.

search in (Products) "Green"

Output

$tableProductNameManufacturerColorName
ProductsContoso 4G MP3 Player E400 GreenContoso, LtdGreen
ProductsContoso 8GB Super-Slim MP3/Video Player M800 GreenContoso, LtdGreen
ProductsContoso 16GB Mp5 Player M1600 GreenContoso, LtdGreen
ProductsContoso 8GB Clock & Radio MP3 Player X850 GreenContoso, LtdGreen
ProductsNT Wireless Bluetooth Stereo Headphones M402 GreenNorthwind TradersGreen
ProductsNT Wireless Transmitter and Bluetooth Headphones M150 GreenNorthwind TradersGreen

Search for records that match the case-sensitive term in the ContosoSales database.

search kind=case_sensitive "blue"

Output

$tableProductNameManufacturerColorNameClassName
ProductsContoso 16GB New Generation MP5 Player M1650 blueContoso, LtdblueRegular
ProductsContoso Bright Light battery E20 blueContoso, LtdblueEconomy
ProductsLitware 120mm Blue LED Case Fan E901 blueLitware, Inc.blueEconomy
NewSalesLitware 120mm Blue LED Case Fan E901 blueLitware, Inc.blueEconomy
NewSalesLitware 120mm Blue LED Case Fan E901 blueLitware, Inc.blueEconomy
NewSalesLitware 120mm Blue LED Case Fan E901 blueLitware, Inc.blueEconomy
NewSalesLitware 120mm Blue LED Case Fan E901 blueLitware, Inc.blueEconomy

Search specific columns

Search for the terms Aaron and Hughes, in the “FirstName” and “LastName” columns respectively, in the ContosoSales database.

search FirstName:"Aaron" or LastName:"Hughes"

Output

$tableCustomerKeyEducationFirstNameGenderLastName
Customers18285High SchoolRileyFHughes
Customers802Graduate DegreeAaronMSharma
Customers986BachelorsMelanieFHughes
Customers12669High SchoolJessicaFHughes
Customers13436Graduate DegreeMariahFHughes
Customers10152Graduate DegreeAaronMCampbell

Limit search by timestamp

Search for the term Hughes in the ContosoSales database, if the term appears in a record with a date greater than the given date in ‘datetime’.

search "Hughes" and DateKey > datetime('2009-01-01')

Output

$tableDateKeySalesAmount_real
SalesTable2021-12-13T00:00:00Z446.4715
SalesTable2021-12-13T00:00:00Z120.555
SalesTable2021-12-13T00:00:00Z48.4405
SalesTable2021-12-13T00:00:00Z39.6435
SalesTable2021-12-13T00:00:00Z56.9905

Performance Tips

#TipPreferOver
1Prefer to use a single search operator over several consecutive search operatorssearch "billg" and ("steveb" or "satyan")search “billg” | search “steveb” or “satyan”
2Prefer to filter inside the search operatorsearch "billg" and "steveb"search * | where * has “billg” and * has “steveb”

37 - serialize operator

Learn how to use the serialize operator to mark the input row set as serialized and ready for window functions.

Marks that the order of the input row set is safe to use for window functions.

The operator has a declarative meaning. It marks the input row set as serialized (ordered), so that window functions can be applied to it.

Syntax

serialize [Name1 = Expr1 [, Name2 = Expr2]…]

Parameters

NameTypeRequiredDescription
NamestringThe name of the column to add or update. If omitted, the output column name is automatically generated.
Exprstring✔️The calculation to perform over the input.

Examples

The example in this section shows how to use the syntax to help you get started.

Serialize subset of rows by condition

This query retrieves all log entries from the TraceLogs table that have a specific ClientRequestId and preserves the order of these entries during processing.

TraceLogs
| where ClientRequestId == "5a848f70-9996-eb17-15ed-21b8eb94bf0e"
| serialize

Output

This table only shows the top 5 query results.

TimestampNodeComponentClientRequestIdMessage
2014-03-08T12:24:55.5464757ZEngine000000000757INGESTOR_GATEWAY5a848f70-9996-eb17-15ed-21b8eb94bf0e$$IngestionCommand table=fogEvents format=json
2014-03-08T12:24:56.0929514ZEngine000000000757DOWNLOADER5a848f70-9996-eb17-15ed-21b8eb94bf0eDownloading file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_0.json.gz""
2014-03-08T12:25:40.3574831ZEngine000000000341INGESTOR_EXECUTER5a848f70-9996-eb17-15ed-21b8eb94bf0eIngestionCompletionEvent: finished ingestion file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_0.json.gz""
2014-03-08T12:25:40.9039588ZEngine000000000341DOWNLOADER5a848f70-9996-eb17-15ed-21b8eb94bf0eDownloading file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_1.json.gz""
2014-03-08T12:26:25.1684905ZEngine000000000057INGESTOR_EXECUTER5a848f70-9996-eb17-15ed-21b8eb94bf0eIngestionCompletionEvent: finished ingestion file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_1.json.gz""

Add row number to the serialized table

To add a row number to the serialized table, use the row_number() function.

TraceLogs
| where ClientRequestId == "5a848f70-9996-eb17-15ed-21b8eb94bf0e"
| serialize rn = row_number()

Output

This table only shows the top 5 query results.

TimestamprnNodeComponentClientRequestIdMessage
2014-03-08T13:00:01.6638235Z1Engine000000000899INGESTOR_EXECUTER5a848f70-9996-eb17-15ed-21b8eb94bf0eIngestionCompletionEvent: finished ingestion file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_46.json.gz""
2014-03-08T13:00:02.2102992Z2Engine000000000899DOWNLOADER5a848f70-9996-eb17-15ed-21b8eb94bf0eDownloading file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_47.json.gz""
2014-03-08T13:00:46.4748309Z3Engine000000000584INGESTOR_EXECUTER5a848f70-9996-eb17-15ed-21b8eb94bf0eIngestionCompletionEvent: finished ingestion file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_47.json.gz""
2014-03-08T13:00:47.0213066Z4Engine000000000584DOWNLOADER5a848f70-9996-eb17-15ed-21b8eb94bf0eDownloading file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_48.json.gz""
2014-03-08T13:01:31.2858383Z5Engine000000000380INGESTOR_EXECUTER5a848f70-9996-eb17-15ed-21b8eb94bf0eIngestionCompletionEvent: finished ingestion file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_48.json.gz""

Serialization behavior of operators

The output row set of the following operators is marked as serialized.

The output row set of the following operators is marked as nonserialized.

All other operators preserve the serialization property. If the input row set is serialized, then the output row set is also serialized.

38 - Shuffle query

This article describes Shuffle query.

The shuffle query is a semantic-preserving transformation used with a set of operators that support the shuffle strategy. Depending on the data involved, querying with the shuffle strategy can yield better performance. It’s better to use the shuffle query strategy when the shuffle key (a join key, summarize key, make-series key or partition key) has a high cardinality and the regular operator query hits query limits.

You can use the following operators with the shuffle command:

To use the shuffle query strategy, add the expression hint.strategy = shuffle or hint.shufflekey = <key>. When you use hint.strategy=shuffle, the operator data will be shuffled by all the keys. Use this expression when the compound key is unique but each key isn’t unique enough, so you’ll shuffle the data using all the keys of the shuffled operator.

When partitioning data with the shuffle strategy, the data load is shared on all cluster nodes. Each node processes one partition of the data. The default number of partitions is equal to the number of cluster nodes.

The partition number can be overridden by using the syntax hint.num_partitions = total_partitions, which will control the number of partitions. This is useful when the cluster has a small number of cluster nodes and the default partitions number will be small, and the query fails or takes a long execution time.

In some cases, the hint.strategy = shuffle is ignored, and the query won’t run in shuffle strategy. This can happen when:

  • The join operator has another shuffle-compatible operator (join, summarize, make-series or partition) on the left side or the right side.
  • The summarize operator appears after another shuffle-compatible operator (join, summarize, make-series or partition) in the query.

Syntax

With hint.strategy = shuffle

T | DataExpression | join hint.strategy = shuffle ( DataExpression )

T | summarize hint.strategy = shuffle DataExpression

T | Query | partition hint.strategy = shuffle ( SubQuery )

With hint.shufflekey = key

T | DataExpression | join hint.shufflekey = key ( DataExpression )

T | summarize hint.shufflekey = key DataExpression

T | make-series hint.shufflekey = key DataExpression

T | Query | partition hint.shufflekey = key ( SubQuery )

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular source whose data is to be processed by the operator.
DataExpressionstringAn implicit or explicit tabular transformation expression.
QuerystringA transformation expression run on the records of T.
keystringUse a join key, summarize key, make-series key or partition key.
SubQuerystringA transformation expression.

Examples

The example in this section shows how to use the syntax to help you get started.

Use summarize with shuffle

The shuffle strategy query with summarize operator shares the load on all cluster nodes, where each node processes one partition of the data.

StormEvents
| summarize hint.strategy = shuffle count(), avg(InjuriesIndirect) by State
| count 

Output

Count
67

Use join with shuffle

StormEvents
| where State has "West"
| where EventType has "Flood"
| join hint.strategy=shuffle 
    (
    StormEvents
    | where EventType has "Hail"
    | project EpisodeId, State, DamageProperty
    )
    on State
| count

Output

Count
103

Use make-series with shuffle

StormEvents
| where State has "North"
| make-series hint.shufflekey = State sum(DamageProperty) default = 0 on StartTime in range(datetime(2007-01-01 00:00:00.0000000), datetime(2007-01-31 23:59:00.0000000), 15d) by State

Output

Statesum_DamagePropertyStartTime
NORTH DAKOTA[60000,0,0][“2006-12-31T00:00:00.0000000Z”,“2007-01-15T00:00:00.0000000Z”,“2007-01-30T00:00:00.0000000Z”]
NORTH CAROLINA[20000,0,1000][“2006-12-31T00:00:00.0000000Z”,“2007-01-15T00:00:00.0000000Z”,“2007-01-30T00:00:00.0000000Z”]
ATLANTIC NORTH[0,0,0][“2006-12-31T00:00:00.0000000Z”,“2007-01-15T00:00:00.0000000Z”,“2007-01-30T00:00:00.0000000Z”]

Use partition with shuffle

StormEvents
| partition hint.strategy=shuffle by EpisodeId
(
    top 3 by DamageProperty
    | project EpisodeId, State, DamageProperty
)
| count

Output

Count
22345

Compare hint.strategy=shuffle and hint.shufflekey=key

When you use hint.strategy=shuffle, the shuffled operator will be shuffled by all the keys. In the following example, the query shuffles the data using both EpisodeId and EventId as keys:

StormEvents
| where StartTime > datetime(2007-01-01 00:00:00.0000000)
| join kind = inner hint.strategy=shuffle (StormEvents | where DamageCrops > 62000000) on EpisodeId, EventId
| count

Output

Count
14

The following query uses hint.shufflekey = key. The query above is equivalent to this query.

StormEvents
| where StartTime > datetime(2007-01-01 00:00:00.0000000)
| join kind = inner hint.shufflekey = EpisodeId hint.shufflekey = EventId (StormEvents | where DamageCrops > 62000000) on EpisodeId, EventId

Output

Count
14

Shuffle the data with multiple keys

In some cases, the hint.strategy=shuffle will be ignored, and the query won’t run in shuffle strategy. For example, in the following example, the join has summarize on its left side, so using hint.strategy=shuffle won’t apply shuffle strategy to the query:

StormEvents
| where StartTime > datetime(2007-01-01 00:00:00.0000000)
| summarize count() by EpisodeId, EventId
| join kind = inner hint.strategy=shuffle (StormEvents | where DamageCrops > 62000000) on EpisodeId, EventId

Output

EpisodeIdEventIdEpisodeId1EventId1
1030440710304407
103013721103013721
247712530247712530
210310237210310237
210310239210310239

To overcome this issue and run in shuffle strategy, choose the key that is common for the summarize and join operations. In this case, this key is EpisodeId. Use the hint hint.shufflekey to specify the shuffle key on the join to hint.shufflekey = EpisodeId:

StormEvents
| where StartTime > datetime(2007-01-01 00:00:00.0000000)
| summarize count() by EpisodeId, EventId
| join kind = inner hint.shufflekey=EpisodeId (StormEvents | where DamageCrops > 62000000) on EpisodeId, EventId

Output

EpisodeIdEventIdEpisodeId1EventId1
1030440710304407
103013721103013721
247712530247712530
210310237210310237
210310239210310239

Use summarize with shuffle to improve performance

In this example, using the summarize operator with shuffle strategy improves performance. The source table has 150M records and the cardinality of the group by key is 10M, which is spread over 10 cluster nodes.

Using summarize operator without shuffle strategy, the query ends after 1:08 and the memory usage peak is ~3 GB:

orders
| summarize arg_max(o_orderdate, o_totalprice) by o_custkey 
| where o_totalprice < 1000
| count

Output

Count
1086

While using shuffle strategy with summarize, the query ends after ~7 seconds and the memory usage peak is 0.43 GB:

orders
| summarize hint.strategy = shuffle arg_max(o_orderdate, o_totalprice) by o_custkey 
| where o_totalprice < 1000
| count

Output

Count
1086

The following example demonstrates performance on a cluster that has two cluster nodes, with a table that has 60M records, where the cardinality of the group by key is 2M.

Running the query without hint.num_partitions will use only two partitions (as cluster nodes number) and the following query will take ~1:10 mins:

lineitem 
| summarize hint.strategy = shuffle dcount(l_comment), dcount(l_shipdate) by l_partkey 
| consume

If setting the partitions number to 10, the query will end after 23 seconds:

lineitem 
| summarize hint.strategy = shuffle hint.num_partitions = 10 dcount(l_comment), dcount(l_shipdate) by l_partkey 
| consume

Use join with shuffle to improve performance

The following example shows how using shuffle strategy with the join operator improves performance.

The examples were sampled on a cluster with 10 nodes where the data is spread over all these nodes.

The query’s left-side source table has 15M records where the cardinality of the join key is ~14M. The query’s right-side source has 150M records and the cardinality of the join key is 10M. The query ends after ~28 seconds and the memory usage peak is 1.43 GB:

customer
| join
    orders
on $left.c_custkey == $right.o_custkey
| summarize sum(c_acctbal) by c_nationkey

When using shuffle strategy with a join operator, the query ends after ~4 seconds and the memory usage peak is 0.3 GB:

customer
| join
    hint.strategy = shuffle orders
on $left.c_custkey == $right.o_custkey
| summarize sum(c_acctbal) by c_nationkey

In another example, we try the same queries on a larger dataset with the following conditions:

  • Left-side source of the join is 150M and the cardinality of the key is 148M.
  • Right-side source of the join is 1.5B, and the cardinality of the key is ~100M.

The query with just the join operator hits limits and times-out after 4 mins. However, when using shuffle strategy with the join operator, the query ends after ~34 seconds and the memory usage peak is 1.23 GB.

The following example shows the improvement on a cluster that has two cluster nodes, with a table of 60M records, where the cardinality of the join key is 2M. Running the query without hint.num_partitions will use only two partitions (as cluster nodes number) and the following query will take ~1:10 mins:

lineitem
| summarize dcount(l_comment), dcount(l_shipdate) by l_partkey
| join
    hint.shufflekey = l_partkey   part
on $left.l_partkey == $right.p_partkey
| consume

When setting the partitions number to 10, the query will end after 23 seconds:

lineitem
| summarize dcount(l_comment), dcount(l_shipdate) by l_partkey
| join
    hint.shufflekey = l_partkey  hint.num_partitions = 10    part
on $left.l_partkey == $right.p_partkey
| consume

39 - sort operator

Learn how to use the sort operator to sort the rows of the input table by one or more columns.

Sorts the rows of the input table into order by one or more columns.

Syntax

T | sort by column [asc | desc] [nulls first | nulls last] [, …]

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input to sort.
columnscalar✔️The column of T by which to sort. The type of the column values must be numeric, date, time or string.
asc or descstringasc sorts into ascending order, low to high. Default is desc, high to low.
nulls first or nulls laststringnulls first will place the null values at the beginning and nulls last will place the null values at the end. Default for asc is nulls first. Default for desc is nulls last.

Returns

A copy of the input table sorted in either ascending or descending order based on the provided column.

Using special floating-point values

When the input table contains the special values null, NaN, -inf and +inf, the order will be as follows:

ValueAscendingDescending
Nulls firstnull,NaN,-inf,-5,0,5,+infnull,NaN,+inf,5,0,-5
Nulls last-inf,-5,0,+inf,NaN,null+inf,5,0,-5,NaN,null

Example

The following example shows storm events by state in alphabetical order with the most recent storms in each state appearing first.

StormEvents
| sort by State asc, StartTime desc

Output

This table only shows the top 10 query results.

StartTimeStateEventType
2007-12-28T12:10:00ZALABAMAHail
2007-12-28T04:30:00ZALABAMAHail
2007-12-28T04:16:00ZALABAMAHail
2007-12-28T04:15:00ZALABAMAHail
2007-12-28T04:13:00ZALABAMAHail
2007-12-21T14:30:00ZALABAMAStrong Wind
2007-12-20T18:15:00ZALABAMAStrong Wind
2007-12-20T18:00:00ZALABAMAStrong Wind
2007-12-20T18:00:00ZALABAMAStrong Wind
2007-12-20T17:45:00ZALABAMAStrong Wind
2007-12-20T17:45:00ZALABAMAStrong Wind

40 - take operator

Learn how to use the take operator to return a specified number of rows.

Return up to the specified number of rows.

There is no guarantee which records are returned, unless the source data is sorted. If the data is sorted, then the top values will be returned.

Syntax

take NumberOfRows

Parameters

NameTypeRequiredDescription
NumberOfRowsint✔️The number of rows to return.

Paging of query results

Methods for implementing paging include:

  • Export the result of a query to an external storage and paging through the generated data.
  • Write a middle-tier application that provides a stateful paging API by caching the results of a Kusto query.
  • Use pagination in Stored query results

Example

StormEvents | take 5

41 - top operator

Learn how to use the top operator to return the first specified number of records sorted by the specified column.

Returns the first N records sorted by the specified column.

Syntax

T | top NumberOfRows by Expression [asc | desc] [nulls first | nulls last]

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input to sort.
NumberOfRowsint✔️The number of rows of T to return.
Expressionstring✔️The scalar expression by which to sort.
asc or descstringControls whether the selection is from the “bottom” or “top” of the range. Default desc.
nulls first or nulls laststringControls whether null values appear at the “bottom” or “top” of the range. Default for asc is nulls first. Default for desc is nulls last.

Example

Show top three storms with most direct injuries.

StormEvents
| top 3 by InjuriesDirect

The below table shows only the relevant column. Run the query above to see more storm details for these events.

InjuriesDirect
519
422
200
  • Use top-nested operator to produce hierarchical (nested) top results.

42 - top-hitters operator

Learn how to use the top-hitters operator to return an approximation for the most popular distinct values in the input.

Returns an approximation for the most popular distinct values, or the values with the largest sum, in the input.

Syntax

T | top-hitters NumberOfValues of ValueExpression [ by SummingExpression ]

Parameters

NameTypeRequiredDescription
Tstring✔️The input tabular expression.
NumberOfValuesint, long, or real✔️The number of distinct values of ValueExpression.
ValueExpressionstring✔️An expression over the input table T whose distinct values are returned.
SummingExpressionstringIf specified, a numeric expression over the input table T whose sum per distinct value of ValueExpression establishes which values to emit. If not specified, the count of each distinct value of ValueExpression is used instead.

Remarks

The first syntax (no SummingExpression) is conceptually equivalent to:

T | summarize C``=``count() by ValueExpression | top NumberOfValues by C desc

The second syntax (with SummingExpression) is conceptually equivalent to:

T | summarize S``=``sum(*SummingExpression*) by ValueExpression | top NumberOfValues by S desc

Examples

Get most frequent items

StormEvents
| top-hitters 5 of EventType 

Output

EventTypeapproximate_count_EventType
Thunderstorm Wind13015
Hail12711
Flash Flood3688
Drought3616
Winter Weather3349

Get top hitters based on column value

The next example shows how to find the States with the most “Thunderstorm Wind” events.

StormEvents
| where EventType == "Thunderstorm Wind"
| top-hitters 10 of State 

Output

Stateapproximate_sum_State
TEXAS830
GEORGIA609
MICHIGAN602
IOWA585
PENNSYLVANIA549
ILLINOIS533
NEW YORK502
VIRGINIA482
KANSAS476
OHIO455

43 - top-nested operator

Learn how to use the top-nested operator to produce a hierarchical aggregation.

The top-nested operator performs hierarchical aggregation and value selection.

Imagine you have a table with sales information like regions, salespeople, and amounts sold. The top-nested operator can help you answer complex questions, such as “What are the top five regions by sales, and who are the top three salespeople in each of those regions?”

The source data is partitioned based on the criteria set in the first top-nested clause, such as region. Next, the operator picks the top records in each partition using an aggregation, such as adding sales amounts. Each subsequent top-nested clause refines the partitions created by the previous clause, creating a hierarchy of more precise groups.

The result is a table with two columns per clause. One column holds the partitioning values, such as region, while the other column holds the outcomes of the aggregation calculation, like the sum of sales.

Syntax

T | top-nested [ N ] of Expr [with others = ConstExpr] by Aggregation [asc | desc] [,
  top-nested … ]

Parameters

NameTypeRequiredDescription
Tstring✔️The input tabular expression.
NintThe number of top values to be returned for this hierarchy level. If omitted, all distinct values are returned.
Exprstring✔️An expression over the input record indicating which value to return for this hierarchy level. Typically, it refers to a column from T or involves a calculation like bin() on a column. Optionally, set an output column name as Name = Expr.
ConstExprstringIf specified, for each hierarchy level, one record is added with the value that is the aggregation over all records that didn’t make it to the top.
AggregationstringThe aggregation function applied to records with the same Expr value. The result determines the top records. See Supported aggregation functions. Optionally, set an output column name as Name = Aggregation.

Supported aggregation functions

The following aggregation functions are supported:

Returns

A table with two columns for each clause. One column contains unique values computed using Expr, and the other column shows the results obtained from the Aggregation calculation.

Using the with others clause

Using the top-nested operator with with others adds the ability to see your top content contextualized in a wider data set. Evaluating your data in this way is valuable when rendering the data visually.

Include data from other columns

Only columns specified as a top-nested clause Expr are displayed in the output table.

To include all values of a column at a specific level:

  1. Don’t specify the value of N.
  2. Use the column name as the value of Expr.
  3. Use Ignore=max(1) as the value of Aggregation.
  4. Remove the unnecessary Ignore column with project-away.

For an example, see Most recent events per state with other column data.

Performance considerations

The number of records can grow exponentially with the number of top-nested clauses, and record growth is even faster if the N parameter is not specified. This operator can consume a considerable amount of resources.

If the aggregation distribution is irregular, limit the number of distinct values to return by specifying N. Then, use the with others = ConstExpr clause to get a sense of the weight of all other cases.

Examples

Top damaged states, event types, and end locations by property damage

The following query partitions the StormEvents table by the State column and calculates the total property damage for each state. The query selects the top two states with the largest amount of property damage. Within these top two states, the query groups the data by EventType and selects the top three event types with the most damage. Then the query groups the data by EndLocation and selects the EndLocation with the highest damage. Only one EndLocation value appears in the results, possibly due to the large nature of the storm events or not documenting the end location.

StormEvents  // Data source.
| top-nested 2 of State by sum(DamageProperty),       // Top 2 States by total damaged property.
  top-nested 3 of EventType by sum(DamageProperty),   // Top 3 EventType by total damaged property for each State.
  top-nested 1 of EndLocation by sum(DamageProperty)  // Top 1 EndLocation by total damaged property for each EventType and State.
| project State, EventType, EndLocation, StateTotalDamage = aggregated_State, EventTypeTotalDamage = aggregated_EventType, EndLocationDamage = aggregated_EndLocation

Output

StateEventTypeEndLocationStateTotalDamageEventTypeTotalDamageEndLocationDamage
CALIFORNIAWildfire144593760013263150001326315000
CALIFORNIAHighWind14459376006132000061320000
CALIFORNIADebrisFlow14459376004800000048000000
OKLAHOMAIceStorm915470300826000000826000000
OKLAHOMAWinterStorm9154703004002700040027000
OKLAHOMAFloodCOMMERCE9154703002148500020000000

Top five states with property damage with others grouped

The following example uses the top-nested operator to identify the top five states with the most property damage and uses the with others clause to group damaged property for all other states. It then visualizes damaged property for the top five states and all other states as a piechart using the render command.

StormEvents
| top-nested 5 of State with others="OtherStates" by sum(DamageProperty)
| render piechart  

Output

Screenshot of the top five states with the most property damaged, and all other states grouped separately rendered as a pie-chart.

Most recent events per state with other column data

The following query retrieves the two most recent events for each US state with relevant event details. It uses max(1) within certain columns to propagate data without using the top-nested selection logic. The generated Ignore aggregation columns are removed using project-away.

StormEvents
| top-nested of State by Ignore0=max(1),                  // Partition the data by each unique value of state.
  top-nested 2 of StartTime by Ignore1=max(StartTime),    // Get the 2 most recent events in each state.
  top-nested of EndTime by Ignore2=max(1),                // Append the EndTime for each event.
  top-nested of EpisodeId by Ignore3=max(1)               // Append the EpisodeId for each event.
| project-away Ignore*                                    // Remove the unnecessary aggregation columns.
| order by State asc, StartTime desc                      // Sort results alphabetically and chronologically.

Latest records per identity with other column data

The following top-nested example extracts the latest records per identity and builds on the concepts introduced in the previous example. The first top-nested clause partitions the data by distinct values of id using Ignore0=max(1) as a placeholder. For each id, it identifies the two most recent records based on the timestamp. Other information is appended using a top-nested operator without specifying a count and using Ignore2=max(1) as a placeholder. Finally, unnecessary aggregation columns are removed using the project-away operator.

datatable(id: string, timestamp: datetime, otherInformation: string) // Create a source datatable.
[
    "Barak", datetime(2015-01-01), "1",
    "Barak", datetime(2016-01-01), "2",
    "Barak", datetime(2017-01-20), "3",
    "Donald", datetime(2017-01-20), "4",
    "Donald", datetime(2017-01-18), "5",
    "Donald", datetime(2017-01-19), "6"
]
| top-nested of id by Ignore0=max(1),                     // Partition the data by each unique value of id.
  top-nested 2 of timestamp by Ignore1=max(timestamp),    // Get the 2 most recent events for each state.
  top-nested of otherInformation by Ignore2=max(1)        // Append otherInformation for each event.
| project-away Ignore0, Ignore1, Ignore2                  // Remove the unnecessary aggregation columns.

Output

idtimestampotherInformation
Barak2016-01-01T00:00:00Z2
Donald2017-01-19T00:00:00Z6
Barak2017-01-20T00:00:00Z3
Donald2017-01-20T00:00:00Z4

44 - union operator

This article describes union operator.

Takes two or more tables and returns the rows of all of them.

Syntax

[ T | ] union [ UnionParameters ] [kind= inner|outer] [withsource= ColumnName] [isfuzzy= true|false] Tables

[ T | ] union [kind= inner|outer] [withsource= ColumnName] [isfuzzy= true|false] Tables

Parameters

NameTypeRequiredDescription
TstringThe input tabular expression.
UnionParametersstringZero or more space-separated parameters in the form of Name = Value that control the behavior of the row-match operation and execution plan. See supported union parameters.
kindstringEither inner or outer. inner causes the result to have the subset of columns that are common to all of the input tables. outer causes the result to have all the columns that occur in any of the inputs. Cells that aren’t defined by an input row are set to null. The default is outer.

With outer, the result has all the columns that occur in any of the inputs, one column for each name and type occurrences. This means that if a column appears in multiple tables and has multiple types, it has a corresponding column for each type in the union’s result. This column name is suffixed with a ‘_’ followed by the origin column type.
withsource=ColumnNamestringIf specified, the output includes a column called ColumnName whose value indicates which source table has contributed each row. If the query effectively references tables from more than one database including the default database, then the value of this column has a table name qualified with the database. cluster and database qualifications are present in the value if more than one cluster is referenced.
isfuzzyboolIf set to true, allows fuzzy resolution of union legs. The set of union sources is reduced to the set of table references that exist and are accessible at the time while analyzing the query and preparing for execution. If at least one such table was found, any resolution failure yields a warning in the query status results, but won’t prevent the query execution. If no resolutions were successful, the query returns an error. The default is false.

isfuzzy=true only applies to the union sources resolution phase. Once the set of source tables is determined, possible additional query failures won’t be suppressed.
TablesstringOne or more comma-separated table references, a query expression enclosed with parenthesis, or a set of tables specified with a wildcard. For example, E* would form the union of all the tables in the database whose names begin E.

Supported union parameters

NameTypeRequiredDescription
hint.concurrencyintHints the system how many concurrent subqueries of the union operator should be executed in parallel. The default is the number of CPU cores on the single node of the cluster (2 to 16).
hint.spreadintHints the system how many nodes should be used by the concurrent union subqueries execution. The default is 1.
NameTypeRequiredDescription
TstringThe input tabular expression.
kindstringEither inner or outer. inner causes the result to have the subset of columns that are common to all of the input tables. outer causes the result to have all the columns that occur in any of the inputs. Cells that aren’t defined by an input row are set to null. The default is outer.

With outer, the result has all the columns that occur in any of the inputs, one column for each name and type occurrences. This means that if a column appears in multiple tables and has multiple types, it has a corresponding column for each type in the union’s result. This column name is suffixed with a ‘_’ followed by the origin column type.
withsource=ColumnNamestringIf specified, the output includes a column called ColumnName whose value indicates which source table has contributed each row. If the query effectively references tables from more than one database including the default database, then the value of this column has a table name qualified with the database. cluster and database qualifications are present in the value if more than one cluster is referenced.
isfuzzyboolIf set to true, allows fuzzy resolution of union legs. The set of union sources is reduced to the set of table references that exist and are accessible at the time while analyzing the query and preparing for execution. If at least one such table was found, any resolution failure yields a warning in the query status results, but won’t prevent the query execution. If no resolutions were successful, the query returns an error. However, in cross-workspace and cross-app queries, if any of the workspaces or apps is not found, the query will fail. The default is false.

isfuzzy=true only applies to the union sources resolution phase. Once the set of source tables is determined, possible additional query failures won’t be suppressed.
TablesstringOne or more comma-separated table references, a query expression enclosed with parenthesis, or a set of tables specified with a wildcard. For example, E* would form the union of all the tables in the database whose names begin E.

Whenever the list of tables is known, refrain from using wildcards. Some workspaces contains very large number of tables that would lead to inefficient execution. Tables may also be added over time leading to unpredicted results.

Returns

A table with as many rows as there are in all the input tables.

Examples

Tables with string in name or column

union K* | where * has "Kusto"

Rows from all tables in the database whose name starts with K, and in which any column includes the word Kusto.

Distinct count

union withsource=SourceTable kind=outer Query, Command
| where Timestamp > ago(1d)
| summarize dcount(UserId)

The number of distinct users that have produced either a Query event or a Command event over the past day. In the result, the ‘SourceTable’ column will indicate either “Query” or “Command”.

Query
| where Timestamp > ago(1d)
| union withsource=SourceTable kind=outer 
   (Command | where Timestamp > ago(1d))
| summarize dcount(UserId)

This more efficient version produces the same result. It filters each table before creating the union.

Using isfuzzy=true

// Using union isfuzzy=true to access non-existing view:                                     
let View_1 = view () { print x=1 };
let View_2 = view () { print x=1 };
let OtherView_1 = view () { print x=1 };
union isfuzzy=true
(View_1 | where x > 0), 
(View_2 | where x > 0),
(View_3 | where x > 0)
| count 

Output

Count
2

Observing Query Status - the following warning returned: Failed to resolve entity 'View_3'

// Using union isfuzzy=true and wildcard access:
let View_1 = view () { print x=1 };
let View_2 = view () { print x=1 };
let OtherView_1 = view () { print x=1 };
union isfuzzy=true View*, SomeView*, OtherView*
| count 

Output

Count
3

Observing Query Status - the following warning returned: Failed to resolve entity 'SomeView*'

Source columns types mismatch

let View_1 = view () { print x=1 };
let View_2 = view () { print x=toint(2) };
union withsource=TableName View_1, View_2

Output

TableNamex_longx_int
View_11
View_22
let View_1 = view () { print x=1 };
let View_2 = view () { print x=toint(2) };
let View_3 = view () { print x_long=3 };
union withsource=TableName View_1, View_2, View_3 

Output

TableNamex_long1x_intx_long
View_11
View_22
View_33

Column x from View_1 received the suffix _long, and as a column named x_long already exists in the result schema, the column names were de-duplicated, producing a new column- x_long1

45 - where operator

Learn how to use the where operator to filter a table to the subset of rows that satisfy a predicate.

Filters a table to the subset of rows that satisfy a predicate.

Syntax

T | where Predicate

Parameters

NameTypeRequiredDescription
Tstring✔️Tabular input whose records are to be filtered.
Predicatestring✔️Expression that evaluates to a bool for each row in T.

Returns

Rows in T for which Predicate is true.

Performance tips

  • Use simple comparisons between column names and constants. (‘Constant’ means constant over the table - so now() and ago() are OK, and so are scalar values assigned using a let statement.)

    For example, prefer where Timestamp >= ago(1d) to where bin(Timestamp, 1d) == ago(1d).

  • Simplest terms first: If you have multiple clauses conjoined with and, put first the clauses that involve just one column. So Timestamp > ago(1d) and OpId == EventId is better than the other way around.

For more information, see the summary of available String operators and the summary of available Numerical operators.

Examples

Order comparisons by complexity

The following query returns storm records that report damaged property, are floods, and start and end in different places.

Notice that we put the comparison between two columns last, as the where operator can’t use the index and forces a scan.

StormEvents
| project DamageProperty, EventType, BeginLocation, EndLocation
| where DamageProperty > 0
    and EventType == "Flood"
    and BeginLocation != EndLocation 

The following table only shows the top 10 results. To see the full output, run the query.

DamagePropertyEventTypeBeginLocationEndLocation
5000FloodFAYETTE CITY LOWBER
5000FloodMORRISVILLE WEST WAYNESBURG
10000FloodCOPELAND HARRIS GROVE
5000FloodGLENFORD MT PERRY
25000FloodEAST SENECA BUFFALO AIRPARK ARPT
20000FloodEBENEZER SLOAN
10000FloodBUEL CALHOUN
10000FloodGOODHOPE WEST MILFORD
5000FloodDUNKIRK FOREST
20000FloodFARMINGTON MANNINGTON

Check if column contains string

The following query returns the rows in which the word “cow” appears in any column.

StormEvents
| where * has "cow"