This is the multi-page printable view of this section. Click here to print.
Tabular operators
- 1: Join operator
- 1.1: join flavors
- 1.1.1: fullouter join
- 1.1.2: inner join
- 1.1.3: innerunique join
- 1.1.4: leftanti join
- 1.1.5: leftouter join
- 1.1.6: leftsemi join
- 1.1.7: rightanti join
- 1.1.8: rightouter join
- 1.1.9: rightsemi join
- 1.2: Broadcast join
- 1.3: Cross-cluster join
- 1.4: join operator
- 1.5: Joining within time window
- 2: Render operator
- 2.1: visualizations
- 2.1.1: Anomaly chart visualization
- 2.1.2: Area chart visualization
- 2.1.3: Bar chart visualization
- 2.1.4: Card visualization
- 2.1.5: Column chart visualization
- 2.1.6: Ladder chart visualization
- 2.1.7: Line chart visualization
- 2.1.8: Pie chart visualization
- 2.1.9: Pivot chart visualization
- 2.1.10: Plotly visualization
- 2.1.11: Scatter chart visualization
- 2.1.12: Stacked area chart visualization
- 2.1.13: Table visualization
- 2.1.14: Time chart visualization
- 2.1.15: Time pivot visualization
- 2.1.16: Treemap visualization
- 2.2: render operator
- 3: Summarize operator
- 4: as operator
- 5: consume operator
- 6: count operator
- 7: datatable operator
- 8: distinct operator
- 9: evaluate plugin operator
- 10: extend operator
- 11: externaldata operator
- 12: facet operator
- 13: find operator
- 14: fork operator
- 15: getschema operator
- 16: invoke operator
- 17: lookup operator
- 18: mv-apply operator
- 19: mv-expand operator
- 20: parse operator
- 21: parse-kv operator
- 22: parse-where operator
- 23: partition operator
- 24: print operator
- 25: Project operator
- 26: project-away operator
- 27: project-keep operator
- 28: project-rename operator
- 29: project-reorder operator
- 30: Queries
- 31: range operator
- 32: reduce operator
- 33: sample operator
- 34: sample-distinct operator
- 35: scan operator
- 36: search operator
- 37: serialize operator
- 38: Shuffle query
- 39: sort operator
- 40: take operator
- 41: top operator
- 42: top-hitters operator
- 43: top-nested operator
- 44: union operator
- 45: where operator
1 - Join operator
1.1 - join flavors
1.1.1 - fullouter join
A fullouter
join combines the effect of applying both left and right outer-joins. For columns of the table that lack a matching row, the result set contains null
values. For those records that do match, a single row is produced in the result set containing fields populated from both tables.
Syntax
LeftTable |
join
kind=fullouter
[ Hints ] RightTable on
Conditions
Returns
Schema: All columns from both tables, including the matching keys.
Rows: All records from both tables with unmatched cells populated with null.
Example
This example query combines rows from both tables X and Y, filling in missing values with NULL where there’s no match in the other table. This allows you to see all possible combinations of keys from both tables.
let X = datatable(Key:string, Value1:long)
[
'a',1,
'b',2,
'b',3,
'c',4
];
let Y = datatable(Key:string, Value2:long)
[
'b',10,
'c',20,
'c',30,
'd',40
];
X | join kind=fullouter Y on Key
Output
Key | Value1 | Key1 | Value2 |
---|---|---|---|
b | 3 | b | 10 |
b | 2 | b | 10 |
c | 4 | c | 20 |
c | 4 | c | 30 |
d | 40 | ||
a | 1 |
Related content
- Learn about other join flavors
1.1.2 - inner join
The inner
join flavor is like the standard inner join from the SQL world. An output record is produced whenever a record on the left side has the same join key as the record on the right side.
Syntax
LeftTable |
join
kind=inner
[ Hints ] RightTable on
Conditions
Returns
Schema: All columns from both tables, including the matching keys.
Rows: Only matching rows from both tables.
Example
The example query combines rows from tables X and Y where the keys match, showing only the rows that exist in both tables.
let X = datatable(Key:string, Value1:long)
[
'a',1,
'b',2,
'b',3,
'k',5,
'c',4
];
let Y = datatable(Key:string, Value2:long)
[
'b',10,
'c',20,
'c',30,
'd',40,
'k',50
];
X | join kind=inner Y on Key
Output
Key | Value1 | Key1 | Value2 |
---|---|---|---|
b | 3 | b | 10 |
b | 2 | b | 10 |
c | 4 | c | 20 |
c | 4 | c | 30 |
k | 5 | k | 50 |
Related content
- Learn about other join flavors
1.1.3 - innerunique join
The innerunique
join flavor removes duplicate keys from the left side. This behavior ensures that the output contains a row for every combination of unique left and right keys.
By default, the innerunique
join flavor is used if the kind
parameter isn’t specified. This default implementation is useful in log/trace analysis scenarios, where you aim to correlate two events based on a shared correlation ID. It allows you to retrieve all instances of the phenomenon while disregarding duplicate trace records that contribute to the correlation.
Syntax
LeftTable |
join
kind=innerunique
[ Hints ] RightTable on
Conditions
Returns
Schema: All columns from both tables, including the matching keys.
Rows: All deduplicated rows from the left table that match rows from the right table.
Examples
Review the examples and run them in your Data Explorer query page.
Use the default innerunique join
The example query combines rows from tables X and Y where the keys match, showing only the rows that exist in both tables
let X = datatable(Key:string, Value1:long)
[
'a',1,
'b',2,
'b',3,
'c',4
];
let Y = datatable(Key:string, Value2:long)
[
'b',10,
'c',20,
'c',30,
'd',40
];
X | join Y on Key
Output
Key | Value1 | Key1 | Value2 |
---|---|---|---|
b | 2 | b | 10 |
c | 4 | c | 20 |
c | 4 | c | 30 |
The query executed the default join, which is an inner join after deduplicating the left side based on the join key. The deduplication keeps only the first record. The resulting left side of the join after deduplication is:
Key | Value1 |
---|---|
a | 1 |
b | 2 |
c | 4 |
Two possible outputs from innerunique join
let t1 = datatable(key: long, value: string)
[
1, "val1.1",
1, "val1.2"
];
let t2 = datatable(key: long, value: string)
[
1, "val1.3",
1, "val1.4"
];
t1
| join kind = innerunique
t2
on key
Output
key | value | key1 | value1 |
---|---|---|---|
1 | val1.1 | 1 | val1.3 |
1 | val1.1 | 1 | val1.4 |
let t1 = datatable(key: long, value: string)
[
1, "val1.1",
1, "val1.2"
];
let t2 = datatable(key: long, value: string)
[
1, "val1.3",
1, "val1.4"
];
t1
| join kind = innerunique
t2
on key
Output
key | value | key1 | value1 |
---|---|---|---|
1 | val1.2 | 1 | val1.3 |
1 | val1.2 | 1 | val1.4 |
- Kusto is optimized to push filters that come after the
join
, towards the appropriate join side, left or right, when possible. - Sometimes, the flavor used is innerunique and the filter is propagated to the left side of the join. The flavor is automatically propagated and the keys that apply to that filter appear in the output.
- Use the previous example and add a filter
where value == "val1.2"
. It gives the second result and will never give the first result for the datasets:
let t1 = datatable(key: long, value: string)
[
1, "val1.1",
1, "val1.2"
];
let t2 = datatable(key: long, value: string)
[
1, "val1.3",
1, "val1.4"
];
t1
| join kind = innerunique
t2
on key
| where value == "val1.2"
Output
key | value | key1 | value1 |
---|---|---|---|
1 | val1.2 | 1 | val1.3 |
1 | val1.2 | 1 | val1.4 |
Get extended sign-in activities
Get extended activities from a login
that some entries mark as the start and end of an activity.
let Events = MyLogTable | where type=="Event" ;
Events
| where Name == "Start"
| project Name, City, ActivityId, StartTime=timestamp
| join (Events
| where Name == "Stop"
| project StopTime=timestamp, ActivityId)
on ActivityId
| project City, ActivityId, StartTime, StopTime, Duration = StopTime - StartTime
let Events = MyLogTable | where type=="Event" ;
Events
| where Name == "Start"
| project Name, City, ActivityIdLeft = ActivityId, StartTime=timestamp
| join (Events
| where Name == "Stop"
| project StopTime=timestamp, ActivityIdRight = ActivityId)
on $left.ActivityIdLeft == $right.ActivityIdRight
| project City, ActivityId, StartTime, StopTime, Duration = StopTime - StartTime
Related content
- Learn about other join flavors
1.1.4 - leftanti join
The leftanti
join flavor returns all records from the left side that don’t match any record from the right side. The anti join models the “NOT IN” query.
Syntax
LeftTable |
join
kind=leftanti
[ Hints ] RightTable on
Conditions
Returns
Schema: All columns from the left table.
Rows: All records from the left table that don’t match records from the right table.
Example
The example query combines rows from tables X and Y where there is no match in Y for the keys in X, effectively filtering out any rows in X that have corresponding rows in Y.
let X = datatable(Key:string, Value1:long)
[
'a',1,
'b',2,
'b',3,
'c',4
];
let Y = datatable(Key:string, Value2:long)
[
'b',10,
'c',20,
'c',30,
'd',40
];
X | join kind=leftanti Y on Key
Output
Key | Value1 |
---|---|
a | 1 |
Related content
- Learn about other join flavors
1.1.5 - leftouter join
The leftouter
join flavor returns all the records from the left side table and only matching records from the right side table.
Syntax
LeftTable |
join
kind=leftouter
[ Hints ] RightTable on
Conditions
Returns
Schema: All columns from both tables, including the matching keys.
Rows: All records from the left table and only matching rows from the right table.
Example
The result of a left outer join for tables X and Y always contains all records of the left table (X), even if the join condition doesn’t find any matching record in the right table (Y).
let X = datatable(Key:string, Value1:long)
[
'a',1,
'b',2,
'b',3,
'c',4
];
let Y = datatable(Key:string, Value2:long)
[
'b',10,
'c',20,
'c',30,
'd',40
];
X | join kind=leftouter Y on Key
Output
Key | Value1 | Key1 | Value2 |
---|---|---|---|
a | 1 | ||
b | 2 | b | 10 |
b | 3 | b | 10 |
c | 4 | c | 20 |
c | 4 | c | 30 |
Related content
- Learn about other join flavors
1.1.6 - leftsemi join
The leftsemi
join flavor returns all records from the left side that match a record from the right side. Only columns from the left side are returned.
Syntax
LeftTable |
join
kind=leftsemi
[ Hints ] RightTable on
Conditions
Returns
Schema: All columns from the left table.
Rows: All records from the left table that match records from the right table.
Example
This query filters and returns only those rows from table X that have a matching key in table Y.
let X = datatable(Key:string, Value1:long)
[
'a',1,
'b',2,
'b',3,
'c',4
];
let Y = datatable(Key:string, Value2:long)
[
'b',10,
'c',20,
'c',30,
'd',40
];
X | join kind=leftsemi Y on Key
Output
Key | Value1 |
---|---|
b | 2 |
b | 3 |
c | 4 |
Related content
- Learn about other join flavors
1.1.7 - rightanti join
The rightanti
join flavor returns all records from the right side that don’t match any record from the left side. The anti join models the “NOT IN” query.
Syntax
LeftTable |
join
kind=rightanti
[ Hints ] RightTable on
Conditions
Returns
Schema: All columns from the right table.
Rows: All records from the right table that don’t match records from the left table.
Example
This query filters and returns only those rows from table Y that do not have a matching key in table X.
let X = datatable(Key:string, Value1:long)
[
'a',1,
'b',2,
'b',3,
'c',4
];
let Y = datatable(Key:string, Value2:long)
[
'b',10,
'c',20,
'c',30,
'd',40
];
X | join kind=rightanti Y on Key
Output
Key | Value1 |
---|---|
d | 40 |
Related content
- Learn about other join flavors
1.1.8 - rightouter join
The rightouter
join flavor returns all the records from the right side and only matching records from the left side. This join flavor resembles the leftouter
join flavor, but the treatment of the tables is reversed.
Syntax
LeftTable |
join
kind=rightouter
[ Hints ] RightTable on
Conditions
Returns
Schema: All columns from both tables, including the matching keys.
Rows: All records from the right table and only matching rows from the left table.
Example
This query returns all rows from table Y and any matching rows from table X, filling in NULL values where there is no match from X.
let X = datatable(Key:string, Value1:long)
[
'a',1,
'b',2,
'b',3,
'c',4
];
let Y = datatable(Key:string, Value2:long)
[
'b',10,
'c',20,
'c',30,
'd',40
];
X | join kind=rightouter Y on Key
Output
Key | Value1 | Key1 | Value2 |
---|---|---|---|
b | 2 | b | 10 |
b | 3 | b | 10 |
c | 4 | c | 20 |
c | 4 | c | 30 |
d | 40 |
Related content
- Learn about other join flavors
1.1.9 - rightsemi join
The rightsemi
join flavor returns all records from the right side that match a record from the left side. Only columns from the right side are returned.
Syntax
LeftTable |
join
kind=rightsemi
[ Hints ] RightTable on
Conditions
Returns
Schema: All columns from the right table.
Rows: All records from the right table that match records from the left table.
Example
This query filters and returns only those rows from table Y that have a matching key in table X.
let X = datatable(Key:string, Value1:long)
[
'a',1,
'b',2,
'b',3,
'c',4
];
let Y = datatable(Key:string, Value2:long)
[
'b',10,
'c',20,
'c',30,
'd',40
];
X | join kind=rightsemi Y on Key
Output
Key | Value2 |
---|---|
b | 10 |
c | 20 |
c | 30 |
Related content
- Learn about other join flavors
1.2 - Broadcast join
Today, regular joins are executed on a cluster single node. Broadcast join is an execution strategy of join that distributes the join over cluster nodes. This strategy is useful when the left side of the join is small (up to several tens of MBs). In this case, a broadcast join is more performant than a regular join.
Today, regular joins are executed on an Eventhouse single node. Broadcast join is an execution strategy of join that distributes the join over Eventhouse nodes. This strategy is useful when the left side of the join is small (up to several tens of MBs). In this case, a broadcast join is more performant than a regular join.
Use the lookup operator if the right side is smaller than the left side. The lookup operator runs in broadcast strategy by default when the right side is smaller than the left.
If left side of the join is a small dataset, then you may run join in broadcast mode using the following syntax (hint.strategy = broadcast):
leftSide
| join hint.strategy = broadcast (factTable) on key
The performance improvement is more noticeable in scenarios where the join is followed by other operators such as summarize
. See the following query for example:
leftSide
| join hint.strategy = broadcast (factTable) on Key
| summarize dcount(Messages) by Timestamp, Key
Related content
1.3 - Cross-cluster join
A cross-cluster join involves joining data from datasets that reside in different clusters.
In a cross-cluster join, the query can be executed in three possible locations, each with a specific designation for reference throughout this document:
- Local cluster: The cluster to which the request is sent, which is also known as the cluster hosting the database in context.
- Left cluster: The cluster hosting the data on the left side of the join operation.
- Right cluster: The cluster hosting the data on the right side of the join operation.
The cluster that runs the query fetches the data from the other cluster.
Syntax
[ cluster(
ClusterName).database(
DatabaseName).
]LeftTable |
…|
join
[ hint.remote=
Strategy ] (
[ cluster(
ClusterName).database(
DatabaseName).
]RightTable |
…
)
on Conditions
Parameters
Name | Type | Required | Description |
---|---|---|---|
LeftTable | string | ✔️ | The left table or tabular expression whose rows are to be merged. Denoted as $left . |
Strategy | string | Determines the cluster on which to execute the join. Supported values are: left , right , local , and auto . For more information, see Strategies. | |
ClusterName | string | If the data for the join resides outside of the local cluster, use the cluster() function to specify the cluster. | |
DatabaseName | string | If the data for the join resides outside of the local database context, use the database() function to specify the database. | |
RightTable | string | ✔️ | The right table or tabular expression whose rows are to be merged. Denoted as $right . |
Conditions | string | ✔️ | Determines how rows from LeftTable are matched with rows from RightTable. If the columns you want to match have the same name in both tables, use the syntax ON ColumnName. Otherwise, use the syntax ON $left. LeftColumn == $right. RightColumn. To specify multiple conditions, you can either use the “and” keyword or separate them with commas. If you use commas, the conditions are evaluated using the “and” logical operator. |
Strategies
The following list explains the supported values for the Strategy parameter:
left
: Execute join on the cluster of the left table, or left cluster.right
: Execute join on the cluster of the right table, or right cluster.local
: Execute join on the cluster of the current cluster, or local cluster.auto
: (Default) Kusto makes the remoting decision.
How the auto strategy works
By default, the auto
strategy determines where the cross-cluster join is executed based on the following rules:
If one of the tables is hosted in the local cluster, then the join is performed on the local cluster. For example, with the auto strategy, this query is executed on the local cluster:
T | ... | join (cluster("B").database("DB").T2 | ...) on Col1
If both tables are hosted outside of the local cluster, then join is performed on the right cluster. For example, assuming neither cluster is the local cluster, the join would be executed on the right cluster:
cluster("B").database("DB").T | ... | join (cluster("C").database("DB2").T2 | ...) on Col1
Performance considerations
For optimal performance, we recommend running the query on the cluster that contains the largest table.
In the following example, if the dataset produced by T | ...
is smaller than one produced by cluster("B").database("DB").T2 | ...
then it would be more efficient to execute the join operation on cluster B
, in this case the right cluster instead of on the local cluster.
T | ... | join (cluster("B").database("DB").T2 | ...) on Col1
You can rewrite the query to use hint.remote=right
to optimize the performance. In this way, the join operation is performed on the right cluster, even if left table is in the local cluster.
T | ... | join hint.remote=right (cluster("B").database("DB").T2 | ...) on Col1
Related content
1.4 - join operator
Merge the rows of two tables to form a new table by matching values of the specified columns from each table.
Kusto Query Language (KQL) offers many kinds of joins that each affect the schema and rows in the resultant table in different ways. For example, if you use an inner
join, the table has the same columns as the left table, plus the columns from the right table. For best performance, if one table is always smaller than the other, use it as the left side of the join
operator.
The following image provides a visual representation of the operation performed by each join. The color of the shading represents the columns returned, and the areas shaded represent the rows returned.
Syntax
LeftTable |
join
[ kind
=
JoinFlavor ] [ Hints ] (
RightTable)
on
Conditions
Parameters
Name | Type | Required | Description |
---|---|---|---|
LeftTable | string | ✔️ | The left table or tabular expression, sometimes called the outer table, whose rows are to be merged. Denoted as $left . |
JoinFlavor | string | The type of join to perform: innerunique , inner , leftouter , rightouter , fullouter , leftanti , rightanti , leftsemi , rightsemi . The default is innerunique . For more information about join flavors, see Returns. | |
Hints | string | Zero or more space-separated join hints in the form of Name = Value that control the behavior of the row-match operation and execution plan. For more information, see Hints. | |
RightTable | string | ✔️ | The right table or tabular expression, sometimes called the inner table, whose rows are to be merged. Denoted as $right . |
Conditions | string | ✔️ | Determines how rows from LeftTable are matched with rows from RightTable. If the columns you want to match have the same name in both tables, use the syntax ON ColumnName. Otherwise, use the syntax ON $left. LeftColumn == $right. RightColumn. To specify multiple conditions, you can either use the “and” keyword or separate them with commas. If you use commas, the conditions are evaluated using the “and” logical operator. |
Hints
Hint key | Values | Description |
---|---|---|
hint.remote | auto , left , local , right | See Cross-Cluster Join |
hint.strategy=broadcast | Specifies the way to share the query load on cluster nodes. | See broadcast join |
hint.shufflekey=<key> | The shufflekey query shares the query load on cluster nodes, using a key to partition data. | See shuffle query |
hint.strategy=shuffle | The shuffle strategy query shares the query load on cluster nodes, where each node processes one partition of the data. | See shuffle query |
Name | Values | Description |
---|---|---|
hint.remote | auto , left , local , right | |
hint.strategy=broadcast | Specifies the way to share the query load on cluster nodes. | See broadcast join |
hint.shufflekey=<key> | The shufflekey query shares the query load on cluster nodes, using a key to partition data. | See shuffle query |
hint.strategy=shuffle | The shuffle strategy query shares the query load on cluster nodes, where each node processes one partition of the data. | See shuffle query |
Returns
The return schema and rows depend on the join flavor. The join flavor is specified with the kind keyword. The following table shows the supported join flavors. To see examples for a specific join flavor, select the link in the Join flavor column.
Join flavor | Returns | Illustration |
---|---|---|
innerunique (default) | Inner join with left side deduplication Schema: All columns from both tables, including the matching keys Rows: All deduplicated rows from the left table that match rows from the right table | :::image type=“icon” source=“media/joinoperator/join-innerunique.png” border=“false”::: |
inner | Standard inner join Schema: All columns from both tables, including the matching keys Rows: Only matching rows from both tables | :::image type=“icon” source=“media/joinoperator/join-inner.png” border=“false”::: |
leftouter | Left outer join Schema: All columns from both tables, including the matching keys Rows: All records from the left table and only matching rows from the right table | :::image type=“icon” source=“media/joinoperator/join-leftouter.png” border=“false”::: |
rightouter | Right outer join Schema: All columns from both tables, including the matching keys Rows: All records from the right table and only matching rows from the left table | :::image type=“icon” source=“media/joinoperator/join-rightouter.png” border=“false”::: |
fullouter | Full outer join Schema: All columns from both tables, including the matching keys Rows: All records from both tables with unmatched cells populated with null | :::image type=“icon” source=“media/joinoperator/join-fullouter.png” border=“false”::: |
leftsemi | Left semi join Schema: All columns from the left table Rows: All records from the left table that match records from the right table | :::image type=“icon” source=“media/joinoperator/join-leftsemi.png” border=“false”::: |
leftanti , anti , leftantisemi | Left anti join and semi variant Schema: All columns from the left table Rows: All records from the left table that don’t match records from the right table | :::image type=“icon” source=“media/joinoperator/join-leftanti.png” border=“false”::: |
rightsemi | Right semi join Schema: All columns from the right table Rows: All records from the right table that match records from the left table | :::image type=“icon” source=“media/joinoperator/join-rightsemi.png” border=“false”::: |
rightanti , rightantisemi | Right anti join and semi variant Schema: All columns from the right table Rows: All records from the right table that don’t match records from the left table | :::image type=“icon” source=“media/joinoperator/join-rightanti.png” border=“false”::: |
Cross-join
KQL doesn’t provide a cross-join flavor. However, you can achieve a cross-join effect by using a placeholder key approach.
In the following example, a placeholder key is added to both tables and then used for the inner join operation, effectively achieving a cross-join-like behavior:
X | extend placeholder=1 | join kind=inner (Y | extend placeholder=1) on placeholder
Related content
1.5 - Joining within time window
It’s often useful to join between two large datasets on some high-cardinality key, such as an operation ID or a session ID, and further limit the right-hand-side ($right) records that need to match up with each left-hand-side ($left) record by adding a restriction on the “time-distance” between datetime
columns on the left and on the right.
The above operation differs from the usual join operation, since for the equi-join
part of matching the high-cardinality key between the left and right datasets, the system can also apply a distance function and use it to considerably speed up the join.
Example to identify event sequences without time window
To identify event sequences within a relatively small time window, this example uses a table T
with the following schema:
SessionId
: A column of typestring
with correlation IDs.EventType
: A column of typestring
that identifies the event type of the record.Timestamp
: A column of typedatetime
indicates when the event described by the record happened.
SessionId | EventType | Timestamp |
---|---|---|
0 | A | 2017-10-01T00:00:00Z |
0 | B | 2017-10-01T00:01:00Z |
1 | B | 2017-10-01T00:02:00Z |
1 | A | 2017-10-01T00:03:00Z |
3 | A | 2017-10-01T00:04:00Z |
3 | B | 2017-10-01T00:10:00Z |
The following query creates the dataset and then identifies all the session IDs in which event type A
was followed by an event type B
within a 1min
time window.
let T = datatable(SessionId:string, EventType:string, Timestamp:datetime)
[
'0', 'A', datetime(2017-10-01 00:00:00),
'0', 'B', datetime(2017-10-01 00:01:00),
'1', 'B', datetime(2017-10-01 00:02:00),
'1', 'A', datetime(2017-10-01 00:03:00),
'3', 'A', datetime(2017-10-01 00:04:00),
'3', 'B', datetime(2017-10-01 00:10:00),
];
T
| where EventType == 'A'
| project SessionId, Start=Timestamp
| join kind=inner
(
T
| where EventType == 'B'
| project SessionId, End=Timestamp
) on SessionId
| where (End - Start) between (0min .. 1min)
| project SessionId, Start, End
Output
SessionId | Start | End |
---|---|---|
0 | 2017-10-01 00:00:00.0000000 | 2017-10-01 00:01:00.0000000 |
Example optimized with time window
To optimize this query, we can rewrite it to account for the time window. THe time window is expressed as a join key. Rewrite the query so that the datetime
values are “discretized” into buckets whose size is half the size of the time window. Use equi-join
to compare the bucket IDs.
The query finds pairs of events within the same session (SessionId) where an ‘A’ event is followed by a ‘B’ event within 1 minute. It projects the session ID, the start time of the ‘A’ event, and the end time of the ‘B’ event.
let T = datatable(SessionId:string, EventType:string, Timestamp:datetime)
[
'0', 'A', datetime(2017-10-01 00:00:00),
'0', 'B', datetime(2017-10-01 00:01:00),
'1', 'B', datetime(2017-10-01 00:02:00),
'1', 'A', datetime(2017-10-01 00:03:00),
'3', 'A', datetime(2017-10-01 00:04:00),
'3', 'B', datetime(2017-10-01 00:10:00),
];
let lookupWindow = 1min;
let lookupBin = lookupWindow / 2.0;
T
| where EventType == 'A'
| project SessionId, Start=Timestamp, TimeKey = bin(Timestamp, lookupBin)
| join kind=inner
(
T
| where EventType == 'B'
| project SessionId, End=Timestamp,
TimeKey = range(bin(Timestamp-lookupWindow, lookupBin),
bin(Timestamp, lookupBin),
lookupBin)
| mv-expand TimeKey to typeof(datetime)
) on SessionId, TimeKey
| where (End - Start) between (0min .. lookupWindow)
| project SessionId, Start, End
Output
SessionId | Start | End |
---|---|---|
0 | 2017-10-01 00:00:00.0000000 | 2017-10-01 00:01:00.0000000 |
5 million data query
The next query emulates an extensive dataset of 5M records and approximately 1M Session IDs and runs the query with the time window technique.
let T = range x from 1 to 5000000 step 1
| extend SessionId = rand(1000000), EventType = rand(3), Time=datetime(2017-01-01)+(x * 10ms)
| extend EventType = case(EventType < 1, "A",
EventType < 2, "B",
"C");
let lookupWindow = 1min;
let lookupBin = lookupWindow / 2.0;
T
| where EventType == 'A'
| project SessionId, Start=Time, TimeKey = bin(Time, lookupBin)
| join kind=inner
(
T
| where EventType == 'B'
| project SessionId, End=Time,
TimeKey = range(bin(Time-lookupWindow, lookupBin),
bin(Time, lookupBin),
lookupBin)
| mv-expand TimeKey to typeof(datetime)
) on SessionId, TimeKey
| where (End - Start) between (0min .. lookupWindow)
| project SessionId, Start, End
| count
Output
Count |
---|
3344 |
Related content
2 - Render operator
2.1 - visualizations
2.1.1 - Anomaly chart visualization
The anomaly chart visualization is similar to a timechart, but highlights anomalies using the series_decompose_anomalies function.
Syntax
T |
render
anomalychart
[with
(
propertyName =
propertyValue [,
…])
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Input table name. |
propertyName, propertyValue | string | A comma-separated list of key-value property pairs. See supported properties. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
accumulate | Whether the value of each measure gets added to all its predecessors. (true or false ) |
legend | Whether to display a legend or not (visible or hidden ). |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
ymin | The minimum value to be displayed on Y-axis. |
ymax | The maximum value to be displayed on Y-axis. |
title | The title of the visualization (of type string ). |
xaxis | How to scale the x-axis (linear or log ). |
xcolumn | Which column in the result is used for the x-axis. |
xtitle | The title of the x-axis (of type string ). |
yaxis | How to scale the y-axis (linear or log ). |
ycolumns | Comma-delimited list of columns that consist of the values provided per value of the x column. |
ysplit | How to split the visualization into multiple y-axis values. For more information, see Multiple y-axes. |
ytitle | The title of the y-axis (of type string ). |
anomalycolumns | Comma-delimited list of columns, which will be considered as anomaly series and displayed as points on the chart |
ysplit
property
This visualization supports splitting into multiple y-axis values. The supported values of this property are:
ysplit | Description |
---|---|
none | A single y-axis is displayed for all series data. (Default) |
axes | A single chart is displayed with multiple y-axes (one per series). |
panels | One chart is rendered for each ycolumn value. Maximum five panels. |
Example
The example in this section shows how to use the syntax to help you get started.
let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t step dt by sid
| where sid == 'TS1' // select a single time series for a cleaner visualization
| extend (anomalies, score, baseline) = series_decompose_anomalies(num, 1.5, -1, 'linefit')
| render anomalychart with(anomalycolumns=anomalies, title='Web app. traffic of a month, anomalies') //use "| render anomalychart with anomalycolumns=anomalies" to render the anomalies as bold points on the series charts.
2.1.2 - Area chart visualization
The area chart visual shows a time-series relationship. The first column of the query should be numeric and is used as the x-axis. Other numeric columns are the y-axes. Unlike line charts, area charts also visually represent volume. Area charts are ideal for indicating the change among different datasets.
Syntax
T |
render
areachart
[with
(
propertyName =
propertyValue [,
…])
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Input table name. |
propertyName, propertyValue | string | A comma-separated list of key-value property pairs. See supported properties. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
accumulate | Whether the value of each measure gets added to all its predecessors. (true or false ) |
kind | Further elaboration of the visualization kind. For more information, see kind property. |
legend | Whether to display a legend or not (visible or hidden ). |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
ymin | The minimum value to be displayed on Y-axis. |
ymax | The maximum value to be displayed on Y-axis. |
title | The title of the visualization (of type string ). |
xaxis | How to scale the x-axis (linear or log ). |
xcolumn | Which column in the result is used for the x-axis. |
xtitle | The title of the x-axis (of type string ). |
yaxis | How to scale the y-axis (linear or log ). |
ycolumns | Comma-delimited list of columns that consist of the values provided per value of the x column. |
ysplit | How to split the y-axis values for multiple visualizations. |
ytitle | The title of the y-axis (of type string ). |
ysplit
property
This visualization supports splitting into multiple y-axis values:
ysplit | Description |
---|---|
none | A single y-axis is displayed for all series data. (Default) |
axes | A single chart is displayed with multiple y-axes (one per series). |
panels | One chart is rendered for each ycolumn value. Maximum five panels. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
kind | Further elaboration of the visualization kind. For more information, see kind property. |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
title | The title of the visualization (of type string ). |
kind
property
This visualization can be further elaborated by providing the kind
property.
The supported values of this property are:
kind value | Description |
---|---|
default | Each “area” stands on its own. |
unstacked | Same as default . |
stacked | Stack “areas” to the right. |
stacked100 | Stack “areas” to the right and stretch each one to the same width as the others. |
Examples
The example in this section shows how to use the syntax to help you get started.
Simple area chart
The following example shows a basic area chart visualization.
demo_series3
| render areachart
Area chart using properties
The following example shows an area chart using multiple property settings.
OccupancyDetection
| summarize avg_temp= avg(Temperature), avg_humidity= avg(Humidity) by bin(Timestamp, 1h)
| render areachart
with (
kind = unstacked,
legend = visible,
ytitle ="Sample value",
ymin = 10,
ymax =100,
xtitle = "Time",
title ="Humidity and temperature"
)
Area chart using split panels
The following example shows an area chart using split panels. In this example, the ysplit
property is set to panels
.
StormEvents
| where State in ("TEXAS", "NEBRASKA", "KANSAS") and EventType == "Hail"
| summarize count=count() by State, bin(StartTime, 1d)
| render areachart
with (
ysplit= panels,
legend = visible,
ycolumns=count,
yaxis =log,
ytitle ="Count",
ymin = 0,
ymax =100,
xaxis = linear,
xcolumn = StartTime,
xtitle = "Date",
title ="Hail events"
)
2.1.3 - Bar chart visualization
The bar chart visual needs a minimum of two columns in the query result. By default, the first column is used as the y-axis. This column can contain text, datetime, or numeric data types. The other columns are used as the x-axis and contain numeric data types to be displayed as horizontal lines. Bar charts are used mainly for comparing numeric and nominal discrete values, where the length of each line represents its value.
Syntax
T |
render
barchart
[with
(
propertyName =
propertyValue [,
…])
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Input table name. |
propertyName, propertyValue | string | A comma-separated list of key-value property pairs. See supported properties. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
accumulate | Whether the value of each measure gets added to all its predecessors (true or false ). |
kind | Further elaboration of the visualization kind. For more information, see kind property. |
legend | Whether to display a legend or not (visible or hidden ). |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
ymin | The minimum value to be displayed on Y-axis. |
ymax | The maximum value to be displayed on Y-axis. |
title | The title of the visualization (of type string ). |
xaxis | How to scale the x-axis (linear or log ). |
xcolumn | Which column in the result is used for the x-axis. |
xtitle | The title of the x-axis (of type string ). |
yaxis | How to scale the y-axis (linear or log ). |
ycolumns | Comma-delimited list of columns that consist of the values provided per value of the x column. |
ytitle | The title of the y-axis (of type string ). |
ysplit | How to split the visualization into multiple y-axis values. For more information, see ysplit property. |
ysplit
property
This visualization supports splitting into multiple y-axis values:
ysplit | Description |
---|---|
none | A single y-axis is displayed for all series data. This is the default. |
axes | A single chart is displayed with multiple y-axes (one per series). |
panels | One chart is rendered for each ycolumn value. Maximum five panels. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
kind | Further elaboration of the visualization kind. For more information, see kind property. |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
title | The title of the visualization (of type string ). |
kind
property
This visualization can be further elaborated by providing the kind
property.
The supported values of this property are:
kind value | Description |
---|---|
default | Each “bar” stands on its own. |
unstacked | Same as default . |
stacked | Stack “bars”. |
stacked100 | Stack “bars” and stretch each one to the same width as the others. |
Examples
The example in this section shows how to use the syntax to help you get started.
Render a bar chart
The following query creates a bar chart displaying the number of storm events for each state, filtering only those states with more than 10 events. The chart provides a visual representation of the event distribution across different states.
StormEvents
| summarize event_count=count() by State
| project State, event_count
| render barchart
with (
title="Storm count by state",
ytitle="Storm count",
xtitle="State",
legend=hidden
)
Render a stacked
bar chart
The following query creates a stacked
bar chart that shows the total count of storm events by their type for selected states of Texas, California, and Florida. Each bar represents a storm event type, and the stacked bars show the breakdown of storm events by state within each type.
StormEvents
| where State in ("TEXAS", "CALIFORNIA", "FLORIDA")
| summarize EventCount = count() by EventType, State
| order by EventType asc, State desc
| render barchart with (kind=stacked)
Render a stacked100
bar chart
The following query creates a stacked100
bar chart that shows the total count of storm events by their type for selected states of Texas, California, and Florida. The chart shows the distribution of storm events across states within each type. Although the stacks visually sum up to 100, the values actually represent the number of events, not percentages. This visualization is helpful for understanding both the percentages and the actual event counts.
StormEvents
| where State in ("TEXAS", "CALIFORNIA", "FLORIDA")
| summarize EventCount = count() by EventType, State
| order by EventType asc, State desc
| render barchart with (kind=stacked100)
Use the ysplit
property
The following query provides a daily summary of storm-related injuries and deaths, visualized as a bar chart with split axes/panels for better comparison.
StormEvents
| summarize
TotalInjuries = sum(InjuriesDirect) + sum(InjuriesIndirect),
TotalDeaths = sum(DeathsDirect) + sum(DeathsIndirect)
by bin(StartTime, 1d)
| project StartTime, TotalInjuries, TotalDeaths
| render barchart with (ysplit=axes)
To split the view into separate panels, specify panels
instead of axes
:
StormEvents
| summarize
TotalInjuries = sum(InjuriesDirect) + sum(InjuriesIndirect),
TotalDeaths = sum(DeathsDirect) + sum(DeathsIndirect)
by bin(StartTime, 1d)
| project StartTime, TotalInjuries, TotalDeaths
| render barchart with (ysplit=panels)
2.1.4 - Card visualization
The card visual only shows one element. If there are multiple columns and rows in the output, the first result record is treated as set of scalar values and shows as a card.
Syntax
T |
render
card
[with
(
propertyName =
propertyValue [,
…])
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Input table name. |
propertyName, propertyValue | string | A comma-separated list of key-value property pairs. See supported properties. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
title | The title of the visualization (of type string ). |
Example
This query provides a count of flood events in Virginia and displays the result in a card format.
StormEvents
| where State=="VIRGINIA" and EventType=="Flood"
| count
| render card with (title="Floods in Virginia")
2.1.5 - Column chart visualization
The column chart visual needs a minimum of two columns in the query result. By default, the first column is used as the x-axis. This column can contain text, datetime, or numeric data types. The other columns are used as the y-axis and contain numeric data types to be displayed as vertical lines. Column charts are used for comparing specific sub category items in a main category range, where the length of each line represents its value.
Syntax
T |
render
columnchart
[with
(
propertyName =
propertyValue [,
…])
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Input table name. |
propertyName, propertyValue | string | A comma-separated list of key-value property pairs. See supported properties. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
accumulate | Whether the value of each measure gets added to all its predecessors. (true or false ) |
kind | Further elaboration of the visualization kind. For more information, see kind property. |
legend | Whether to display a legend or not (visible or hidden ). |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
ymin | The minimum value to be displayed on Y-axis. |
ymax | The maximum value to be displayed on Y-axis. |
title | The title of the visualization (of type string ). |
xaxis | How to scale the x-axis (linear or log ). |
xcolumn | Which column in the result is used for the x-axis. |
xtitle | The title of the x-axis (of type string ). |
yaxis | How to scale the y-axis (linear or log ). |
ycolumns | Comma-delimited list of columns that consist of the values provided per value of the x column. |
ytitle | The title of the y-axis (of type string ). |
ysplit | How to split the visualization into multiple y-axis values. For more information, see ysplit property. |
ysplit
property
This visualization supports splitting into multiple y-axis values:
ysplit | Description |
---|---|
none | A single y-axis is displayed for all series data. This is the default. |
axes | A single chart is displayed with multiple y-axes (one per series). |
panels | One chart is rendered for each ycolumn value. Maximum five panels. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
kind | Further elaboration of the visualization kind. For more information, see kind property. |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
title | The title of the visualization (of type string ). |
kind
property
This visualization can be further elaborated by providing the kind
property.
The supported values of this property are:
kind value | Definition |
---|---|
default | Each “column” stands on its own. |
unstacked | Same as default . |
stacked | Stack “columns” one atop the other. |
stacked100 | Stack “columns” and stretch each one to the same height as the others. |
Examples
The example in this section shows how to use the syntax to help you get started.
Render a column chart
This query provides a visual representation of states with a high frequency of storm events, specifically those with more than 10 events, using a column chart.
StormEvents
| summarize event_count=count() by State
| where event_count > 10
| project State, event_count
| render columnchart
Use the ysplit
property
This query provides a daily summary of storm-related injuries and deaths, visualized as a column chart with split axes/panels for better comparison.
StormEvents
| summarize
TotalInjuries = sum(InjuriesDirect) + sum(InjuriesIndirect),
TotalDeaths = sum(DeathsDirect) + sum(DeathsIndirect)
by bin(StartTime, 1d)
| project StartTime, TotalInjuries, TotalDeaths
| render columnchart with (ysplit=axes)
To split the view into separate panels, specify panels
instead of axes
:
StormEvents
| summarize
TotalInjuries = sum(InjuriesDirect) + sum(InjuriesIndirect),
TotalDeaths = sum(DeathsDirect) + sum(DeathsIndirect)
by bin(StartTime, 1d)
| project StartTime, TotalInjuries, TotalDeaths
| render columnchart with (ysplit=panels)
Example
This query helps you identify states with a significant number of storm events and presents the information in a clear, visual format.
StormEvents
| summarize event_count=count() by State
| where event_count > 10
| project State, event_count
| render columnchart
2.1.6 - Ladder chart visualization
The last two columns are the x-axis, and the other columns are the y-axis.
Syntax
T |
render
ladderchart
[with
(
propertyName =
propertyValue [,
…])
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Input table name |
propertyName, propertyValue | string | A comma-separated list of key-value property pairs. See supported properties. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
accumulate | Whether the value of each measure gets added to all its predecessors. (true or false ) |
legend | Whether to display a legend or not (visible or hidden ). |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
ymin | The minimum value to be displayed on Y-axis. |
ymax | The maximum value to be displayed on Y-axis. |
title | The title of the visualization (of type string ). |
xaxis | How to scale the x-axis (linear or log ). |
xcolumn | Which column in the result is used for the x-axis. |
xtitle | The title of the x-axis (of type string ). |
yaxis | How to scale the y-axis (linear or log ). |
ycolumns | Comma-delimited list of columns that consist of the values provided per value of the x column. |
ytitle | The title of the y-axis (of type string ). |
Examples
The example in this section shows how to use the syntax to help you get started.
The examples in this article use publicly available tables in the help cluster, such as the StormEvents table in the Samples database.
Dates of storms by state
This query outputs a state-wise visualization of the duration of rain-related storm events, displayed as a ladder chart to help you analyze the temporal distribution of these events.
StormEvents
| where EventType has "rain"
| summarize min(StartTime), max(EndTime) by State
| render ladderchart
Dates of storms by event type
This query outputs a visualization of the duration of various storm events in Washington, displayed as a ladder chart to help you analyze the temporal distribution of these events by type.
StormEvents
| where State == "WASHINGTON"
| summarize min(StartTime), max(EndTime) by EventType
| render ladderchart
Dates of storms by state and event type
This query outputs a visualization of the duration of various storm events in states starting with “W”, displayed as a ladder chart to help you analyze the temporal distribution of these events by state and event type.
StormEvents
| where State startswith "W"
| summarize min(StartTime), max(EndTime) by State, EventType
| render ladderchart with (series=State, EventType)
2.1.7 - Line chart visualization
The line chart visual is the most basic type of chart. The first column of the query should be numeric and is used as the x-axis. Other numeric columns are the y-axes. Line charts track changes over short and long periods of time. When smaller changes exist, line graphs are more useful than bar graphs.
Syntax
T |
render
linechart
[with
(
propertyName =
propertyValue [,
…] )
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Input table name. |
propertyName, propertyValue | string | A comma-separated list of key-value property pairs. See supported properties. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
accumulate | Whether the value of each measure gets added to all its predecessors (true or false ). |
legend | Whether to display a legend or not (visible or hidden ). |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
ymin | The minimum value to be displayed on Y-axis. |
ymax | The maximum value to be displayed on Y-axis. |
title | The title of the visualization (of type string ). |
xaxis | How to scale the x-axis (linear or log ). |
xcolumn | Which column in the result is used for the x-axis. |
xtitle | The title of the x-axis (of type string ). |
yaxis | How to scale the y-axis (linear or log ). |
ycolumns | Comma-delimited list of columns that consist of the values provided per value of the x column. |
ysplit | How to split the visualization into multiple y-axis values. For more information, see ysplit property. |
ytitle | The title of the y-axis (of type string ). |
ysplit
property
This visualization supports splitting into multiple y-axis values:
ysplit | Description |
---|---|
none | A single y-axis is displayed for all series data. (Default) |
axes | A single chart is displayed with multiple y-axes (one per series). |
panels | One chart is rendered for each ycolumn value. Maximum five panels. |
Examples
The example in this section shows how to use the syntax to help you get started.
Render a line chart
This query retrieves storm events in Virginia, focusing on the start time and property damage, and then displays this information in a line chart.
StormEvents
| where State=="VIRGINIA"
| project StartTime, DamageProperty
| render linechart
Label a line chart
This query retrieves storm events in Virginia, focusing on the start time and property damage, and then displays this information in a line chart with specified titles for better clarity and presentation.
StormEvents
| where State=="VIRGINIA"
| project StartTime, DamageProperty
| render linechart
with (
title="Property damage from storms in Virginia",
xtitle="Start time of storm",
ytitle="Property damage"
)
Limit values displayed on the y-axis
This query retrieves storm events in Virginia, focusing on the start time and property damage, and then displays this information in a line chart with specified y-axis limits for better visualization of the data.
StormEvents
| where State=="VIRGINIA"
| project StartTime, DamageProperty
| render linechart with (ymin=7000, ymax=300000)
View multiple y-axes
This query retrieves hail events in Texas, Nebraska, and Kansas. It counts the number of hail events per day for each state, and then displays this information in a line chart with separate panels for each state.
StormEvents
| where State in ("TEXAS", "NEBRASKA", "KANSAS") and EventType == "Hail"
| summarize count() by State, bin(StartTime, 1d)
| render linechart with (ysplit=panels)
Related content
2.1.8 - Pie chart visualization
The pie chart visual needs a minimum of two columns in the query result. By default, the first column is used as the color axis. This column can contain text, datetime, or numeric data types. Other columns will be used to determine the size of each slice and contain numeric data types. Pie charts are used for presenting a composition of categories and their proportions out of a total.
The pie chart visual can also be used in the context of Geospatial visualizations.
Syntax
T |
render
piechart
[with
(
propertyName =
propertyValue [,
…])
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Input table name. |
propertyName, propertyValue | string | A comma-separated list of key-value property pairs. See supported properties. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
accumulate | Whether the value of each measure gets added to all its predecessors. (true or false ) |
kind | Further elaboration of the visualization kind. For more information, see kind property. |
legend | Whether to display a legend or not (visible or hidden ). |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
title | The title of the visualization (of type string ). |
xaxis | How to scale the x-axis (linear or log ). |
xcolumn | Which column in the result is used for the x-axis. |
xtitle | The title of the x-axis (of type string ). |
yaxis | How to scale the y-axis (linear or log ). |
ycolumns | Comma-delimited list of columns that consist of the values provided per value of the x column. |
ytitle | The title of the y-axis (of type string ). |
PropertyName | PropertyValue |
---|---|
kind | Further elaboration of the visualization kind. For more information, see kind property. |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
title | The title of the visualization (of type string ). |
kind
property
This visualization can be further elaborated by providing the kind
property.
The supported values of this property are:
kind value | Description |
---|---|
map | Expected columns are [Longitude, Latitude] or GeoJSON point, color-axis and numeric. Supported in Kusto Explorer desktop. For more information, see Geospatial visualizations |
Example
This query provides a visual representation of the top 10 states with the highest number of storm events, displayed as a pie chart
StormEvents
| summarize statecount=count() by State
| sort by statecount
| limit 10
| render piechart with(title="Storm Events by State")
2.1.9 - Pivot chart visualization
Displays a pivot table and chart. You can interactively select data, columns, rows, and various chart types.
Syntax
T |
render
pivotchart
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Input table name. |
Example
This query provides a detailed analysis of sales for Contoso computer products within the specified date range, visualized as a pivot chart.
SalesFact
| join kind= inner Products on ProductKey
| where ProductCategoryName has "Computers" and ProductName has "Contoso"
| where DateKey between (datetime(2006-12-31) .. datetime(2007-02-01))
| project SalesAmount, ProductName, DateKey
| render pivotchart
Output
2.1.10 - Plotly visualization
The Plotly graphics library supports ~80 chart types that are useful for advanced charting including geographic, scientific, machine learning, 3d, animation, and many other chart types. For more information, see Plotly.
To render a Plotly visual in Kusto Query Language, the query must generate a table with a single string cell containing Plotly JSON. This Plotly JSON string can be generated by one of the following two methods:
Write your own Plotly visualization in Python
In this method, you dynamically create the Plotly JSON string in Python using the Plotly package. This process requires use of the python() plugin. The Python script is run on the existing nodes using the inline python() plugin. It generates a Plotly JSON that is rendered by the client application.
All types of Plotly visualizations are supported.
Example
The following query uses inline Python to create a 3D scatter chart:
OccupancyDetection
| project Temperature, Humidity, CO2, Occupancy
| where rand() < 0.1
| evaluate python(typeof(plotly:string),
```if 1:
import plotly.express as px
fig = px.scatter_3d(df, x='Temperature', y='Humidity', z='CO2', color='Occupancy')
fig.update_layout(title=dict(text="Occupancy detection, plotly 5.11.0"))
plotly_obj = fig.to_json()
result = pd.DataFrame(data = [plotly_obj], columns = ["plotly"])
```)
The Plotly graphics library supports ~80 chart types including basic charts, scientific, statistical, financial, maps, 3D, animations, and more. To render a Plotly visual in KQL, the query must generate a table with a single string cell containing Plotly JSON.
Since python isn’t available in this service, you create this Plotly JSON using a preprepared template.
Use a preprepared Plotly template
In this method, a preprepared Plotly JSON for specific visualization can be reused by replacing the data objects with the required data to be rendered. The templates can be stored in a standard table, and the data replacement logic can be packed in a stored function.
Currently, the supported templates are: plotly_anomaly_fl() and plotly_scatter3d_fl(). Refer to these documents for syntax and usage.
Example
let plotly_scatter3d_fl=(tbl:(*), x_col:string, y_col:string, z_col:string, aggr_col:string='', chart_title:string='3D Scatter chart')
{
let scatter3d_chart = toscalar(PlotlyTemplate | where name == "scatter3d" | project plotly);
let tbl_ex = tbl | extend _x = column_ifexists(x_col, 0.0), _y = column_ifexists(y_col, 0.0), _z = column_ifexists(z_col, 0.0), _aggr = column_ifexists(aggr_col, 'ALL');
tbl_ex
| serialize
| summarize _x=pack_array(make_list(_x)), _y=pack_array(make_list(_y)), _z=pack_array(make_list(_z)) by _aggr
| summarize _aggr=make_list(_aggr), _x=make_list(_x), _y=make_list(_y), _z=make_list(_z)
| extend plotly = scatter3d_chart
| extend plotly=replace_string(plotly, '$CLASS1$', tostring(_aggr[0]))
| extend plotly=replace_string(plotly, '$CLASS2$', tostring(_aggr[1]))
| extend plotly=replace_string(plotly, '$CLASS3$', tostring(_aggr[2]))
| extend plotly=replace_string(plotly, '$X_NAME$', x_col)
| extend plotly=replace_string(plotly, '$Y_NAME$', y_col)
| extend plotly=replace_string(plotly, '$Z_NAME$', z_col)
| extend plotly=replace_string(plotly, '$CLASS1_X$', tostring(_x[0]))
| extend plotly=replace_string(plotly, '$CLASS1_Y$', tostring(_y[0]))
| extend plotly=replace_string(plotly, '$CLASS1_Z$', tostring(_z[0]))
| extend plotly=replace_string(plotly, '$CLASS2_X$', tostring(_x[1]))
| extend plotly=replace_string(plotly, '$CLASS2_Y$', tostring(_y[1]))
| extend plotly=replace_string(plotly, '$CLASS2_Z$', tostring(_z[1]))
| extend plotly=replace_string(plotly, '$CLASS3_X$', tostring(_x[2]))
| extend plotly=replace_string(plotly, '$CLASS3_Y$', tostring(_y[2]))
| extend plotly=replace_string(plotly, '$CLASS3_Z$', tostring(_z[2]))
| extend plotly=replace_string(plotly, '$TITLE$', chart_title)
| project plotly
};
Iris
| invoke plotly_scatter3d_fl(x_col='SepalLength', y_col='PetalLength', z_col='SepalWidth', aggr_col='Class', chart_title='3D scatter chart using plotly_scatter3d_fl()')
| render plotly
Related content
2.1.11 - Scatter chart visualization
In a scatter chart visual, the first column is the x-axis and should be a numeric column. Other numeric columns are y-axes. Scatter plots are used to observe relationships between variables. The scatter chart visual can also be used in the context of Geospatial visualizations.
Syntax
T |
render
scatterchart
[with
(
propertyName =
propertyValue [,
…])
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Input table name. |
propertyName, propertyValue | string | A comma-separated list of key-value property pairs. See supported properties. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
accumulate | Whether the value of each measure gets added to all its predecessors. (true or false ) |
kind | Further elaboration of the visualization kind. For more information, see kind property. |
legend | Whether to display a legend or not (visible or hidden ). |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
ymin | The minimum value to be displayed on Y-axis. |
ymax | The maximum value to be displayed on Y-axis. |
title | The title of the visualization (of type string ). |
xaxis | How to scale the x-axis (linear or log ). |
xcolumn | Which column in the result is used for the x-axis. |
xtitle | The title of the x-axis (of type string ). |
yaxis | How to scale the y-axis (linear or log ). |
ycolumns | Comma-delimited list of columns that consist of the values provided per value of the x column. |
ytitle | The title of the y-axis (of type string ). |
PropertyName | PropertyValue |
---|---|
kind | Further elaboration of the visualization kind. For more information, see kind property. |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
title | The title of the visualization (of type string ). |
kind
property
This visualization can be further elaborated by providing the kind
property.
The supported values of this property are:
kind value | Description |
---|---|
map | Expected columns are [Longitude, Latitude] or GeoJSON point. Series column is optional. For more information, see Geospatial visualizations. |
Example
This query provides a scatter chart that helps you analyze the correlation between state populations and the total property damage caused by storm events.
StormEvents
| summarize sum(DamageProperty)by State
| lookup PopulationData on State
| project-away State
| render scatterchart with (xtitle="State population", title="Property damage by state", legend=hidden)
2.1.12 - Stacked area chart visualization
The stacked area chart visual shows a continuous relationship. This visual is similar to the Area chart, but shows the area under each element of a series. The first column of the query should be numeric and is used as the x-axis. Other numeric columns are the y-axes. Unlike line charts, area charts also visually represent volume. Area charts are ideal for indicating the change among different datasets.
Syntax
T |
render
stackedareachart
[with
(
propertyName =
propertyValue [,
…])
]
Supported parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Input table name. |
propertyName, propertyValue | string | A comma-separated list of key-value property pairs. See supported properties. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
accumulate | Whether the value of each measure gets added to all its predecessors. (true or false ) |
legend | Whether to display a legend or not (visible or hidden ). |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
ymin | The minimum value to be displayed on Y-axis. |
ymax | The maximum value to be displayed on Y-axis. |
title | The title of the visualization (of type string ). |
xaxis | How to scale the x-axis (linear or log ). |
xcolumn | Which column in the result is used for the x-axis. |
xtitle | The title of the x-axis (of type string ). |
yaxis | How to scale the y-axis (linear or log ). |
ycolumns | Comma-delimited list of columns that consist of the values provided per value of the x column. |
ytitle | The title of the y-axis (of type string ). |
Example
The following query summarizes data from the nyc_taxi
table by number of passengers and visualizes the data in a stacked area chart. The x-axis shows the pickup time in two day intervals, and the stacked areas represent different passenger counts.
nyc_taxi
| summarize count() by passenger_count, bin(pickup_datetime, 2d)
| render stackedareachart with (xcolumn=pickup_datetime, series=passenger_count)
Output
Related content
2.1.13 - Table visualization
Default - results are shown as a table.
Syntax
T |
render
table
[with
(
propertyName =
propertyValue [,
…])
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Input table name. |
propertyName, propertyValue | string | A comma-separated list of key-value property pairs. See supported properties. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
accumulate | Whether the value of each measure gets added to all its predecessors. (true or false ) |
legend | Whether to display a legend or not (visible or hidden ). |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
ymin | The minimum value to be displayed on Y-axis. |
ymax | The maximum value to be displayed on Y-axis. |
title | The title of the visualization (of type string ). |
xaxis | How to scale the x-axis (linear or log ). |
xcolumn | Which column in the result is used for the x-axis. |
xtitle | The title of the x-axis (of type string ). |
yaxis | How to scale the y-axis (linear or log ). |
ycolumns | Comma-delimited list of columns that consist of the values provided per value of the x column. |
ytitle | The title of the y-axis (of type string ). |
PropertyName | PropertyValue |
---|---|
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
title | The title of the visualization (of type string ). |
Example
This query outputs a snapshot of the first 10 storm event records, displayed in a table format.
StormEvents
| take 10
| render table
2.1.14 - Time chart visualization
A time chart visual is a type of line graph. The first column of the query is the x-axis, and should be a datetime. Other numeric columns are y-axes. One string column values are used to group the numeric columns and create different lines in the chart. Other string columns are ignored. The time chart visual is like a line chart except the x-axis is always time.
Syntax
T |
render
timechart
[with
(
propertyName =
propertyValue [,
…])
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Input table name. |
propertyName, propertyValue | string | A comma-separated list of key-value property pairs. See supported properties. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
accumulate | Whether the value of each measure gets added to all its predecessors (true or false ). |
legend | Whether to display a legend or not (visible or hidden ). |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
ymin | The minimum value to be displayed on Y-axis. |
ymax | The maximum value to be displayed on Y-axis. |
title | The title of the visualization (of type string ). |
xaxis | How to scale the x-axis (linear or log ). |
xcolumn | Which column in the result is used for the x-axis. |
xtitle | The title of the x-axis (of type string ). |
yaxis | How to scale the y-axis (linear or log ). |
ycolumns | Comma-delimited list of columns that consist of the values provided per value of the x column. |
ysplit | How to split the visualization into multiple y-axis values. For more information, see ysplit property. |
ytitle | The title of the y-axis (of type string ). |
ysplit
property
This visualization supports splitting into multiple y-axis values:
ysplit | Description |
---|---|
none | A single y-axis is displayed for all series data. (Default) |
axes | A single chart is displayed with multiple y-axes (one per series). |
panels | One chart is rendered for each ycolumn value. Maximum five panels. |
Examples
The example in this section shows how to use the syntax to help you get started.
Render a timechart
The following example renders a timechart with a title “Web app. traffic over a month, decomposing” that decomposes the data into baseline, seasonal, trend, and residual components.
let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t step dt by sid
| where sid == 'TS1' // select a single time series for a cleaner visualization
| extend (baseline, seasonal, trend, residual) = series_decompose(num, -1, 'linefit') // decomposition of a set of time series to seasonal, trend, residual, and baseline (seasonal+trend)
| render timechart with(title='Web app. traffic over a month, decomposition')
Label a timechart
The following example renders a timechart that depicts crop damage grouped by week. The timechart x axis label is “Date” and the y axis label is “Crop damage.”
StormEvents
| where StartTime between (datetime(2007-01-01) .. datetime(2007-12-31))
and DamageCrops > 0
| summarize EventCount = count() by bin(StartTime, 7d)
| render timechart
with (
title="Crop damage over time",
xtitle="Date",
ytitle="Crop damage",
legend=hidden
)
View multiple y-axes
The following example renders daily hail events in the states of Texas, Nebraska, and Kansas. The visualization uses the ysplit
property to render each state’s events in separate panels for comparison.
StormEvents
| where State in ("TEXAS", "NEBRASKA", "KANSAS") and EventType == "Hail"
| summarize count() by State, bin(StartTime, 1d)
| render timechart with (ysplit=panels)
Related content
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
title | The title of the visualization (of type string ). |
Example
The following example renders a timechart with a title “Web app. traffic over a month, decomposing” that decomposes the data into baseline, seasonal, trend, and residual components.
let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t step dt by sid
| where sid == 'TS1' // select a single time series for a cleaner visualization
| extend (baseline, seasonal, trend, residual) = series_decompose(num, -1, 'linefit') // decomposition of a set of time series to seasonal, trend, residual, and baseline (seasonal+trend)
| render timechart with(title='Web app. traffic of a month, decomposition')
2.1.15 - Time pivot visualization
The time pivot visualization is an interactive navigation over the events time-line pivoting on time axis.
Syntax
T |
render
timepivot
[with
(
propertyName =
propertyValue [,
…])
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Input table name. |
propertyName, propertyValue | string | A comma-separated list of key-value property pairs. See supported properties. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
accumulate | Whether the value of each measure gets added to all its predecessors. (true or false ) |
legend | Whether to display a legend or not (visible or hidden ). |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
ymin | The minimum value to be displayed on Y-axis. |
ymax | The maximum value to be displayed on Y-axis. |
title | The title of the visualization (of type string ). |
xaxis | How to scale the x-axis (linear or log ). |
xcolumn | Which column in the result is used for the x-axis. |
xtitle | The title of the x-axis (of type string ). |
yaxis | How to scale the y-axis (linear or log ). |
ycolumns | Comma-delimited list of columns that consist of the values provided per value of the x column. |
ytitle | The title of the y-axis (of type string ). |
Example
This query outputs a visualization of flood events in the specified Midwestern states, displayed as a time pivot chart.
let midwesternStates = dynamic([
"ILLINOIS", "INDIANA", "IOWA", "KANSAS", "MICHIGAN", "MINNESOTA",
"MISSOURI", "NEBRASKA", "NORTH DAKOTA", "OHIO", "SOUTH DAKOTA", "WISCONSIN"
]);
StormEvents
| where EventType == "Flood" and State in (midwesternStates)
| render timepivot with (xcolumn=State)
Output
:::image type=“content” source=“media/visualization-timepivot/time-pivot-visualization.jpg” lightbox=“media/visualization-timepivot/time-pivot-visualization.jpg” alt-text=“Screenshot of timepivot in Kusto.Explorer.”:::
2.1.16 - Treemap visualization
Treemaps display hierarchical data as a set of nested rectangles. Each level of the hierarchy is represented by a colored rectangle (branch) containing smaller rectangles (leaves).
Syntax
T |
render
treemap
[with
(
propertyName =
propertyValue [,
…])
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Input table name. |
propertyName, propertyValue | string | A comma-separated list of key-value property pairs. See supported properties. |
Supported properties
All properties are optional.
PropertyName | PropertyValue |
---|---|
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
Example
This query counts the number of storm events for each type and state, sorts them in descending order, limits the results to the top 30, and then visualizes the data as a treemap.
StormEvents
| summarize StormEvents=count() by EventType, State
| sort by StormEvents
| limit 30
| render treemap with(title="Storm Events by EventType and State")
2.2 - render operator
Instructs the user agent to render a visualization of the query results.
The render operator must be the last operator in the query, and can only be used with queries that produce a single tabular data stream result. The render operator doesn’t modify data. It injects an annotation (“Visualization”) into the result’s extended properties. The annotation contains the information provided by the operator in the query. The interpretation of the visualization information is done by the user agent. Different agents, such as Kusto.Explorer or Azure Data Explorer web UI, may support different visualizations.
The data model of the render operator looks at the tabular data as if it has three kinds of columns:
The x axis column (indicated by the
xcolumn
property).The series columns (any number of columns indicated by the
series
property.) For each record, the combined values of these columns define a single series, and the chart has as many series as there are distinct combined values.The y axis columns (any number of columns indicated by the
ycolumns
property). For each record, the series has as many measurements (“points” in the chart) as there are y-axis columns.by the query. In particular, having “uninteresting” columns in the schema of the result might translate into them guessing wrong. Try projecting-away such columns when that happens.
Syntax
T |
render
visualization [with
(
propertyName =
propertyValue [,
…])
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Input table name. |
visualization | string | ✔️ | Indicates the kind of visualization to use. Must be one of the supported values in the following list. |
propertyName, propertyValue | string | A comma-separated list of key-value property pairs. See supported properties. |
Visualization
visualization | Description | Illustration |
---|---|---|
anomalychart | Similar to timechart, but highlights anomalies using series_decompose_anomalies function. | :::image type=“icon” source=“media/renderoperator/anomaly-chart.png” border=“false”::: |
areachart | Area graph. | :::image type=“icon” source=“media/renderoperator/area-chart.png” border=“false”::: |
barchart | displayed as horizontal strips. | :::image type=“icon” source=“media/renderoperator/bar-chart.png” border=“false”::: |
card | First result record is treated as set of scalar values and shows as a card. | :::image type=“icon” source=“media/renderoperator/card.png” border=“false”::: |
columnchart | Like barchart with vertical strips instead of horizontal strips. | :::image type=“icon” source=“media/renderoperator/column-chart.png” border=“false”::: |
ladderchart | Last two columns are the x-axis, other columns are y-axis. | :::image type=“icon” source=“media/renderoperator/ladder-chart.png” border=“false”::: |
linechart | Line graph. | :::image type=“icon” source=“media/renderoperator/line-chart.png” border=“false”::: |
piechart | First column is color-axis, second column is numeric. | :::image type=“icon” source=“media/renderoperator/pie-chart.png” border=“false”::: |
pivotchart | Displays a pivot table and chart. User can interactively select data, columns, rows and various chart types. | :::image type=“icon” source=“media/renderoperator/pivot-chart.png” border=“false”::: |
scatterchart | Points graph. | :::image type=“icon” source=“media/renderoperator/scatter-chart.png” border=“false”::: |
stackedareachart | Stacked area graph. | :::image type=“icon” source=“media/renderoperator/stacked-area-chart.png” border=“false”::: |
table | Default - results are shown as a table. | :::image type=“icon” source=“media/renderoperator/table-visualization.png” border=“false”::: |
timechart | Line graph. First column is x-axis, and must be datetime. Other (numeric) columns are y-axes. | :::image type=“icon” source=“media/renderoperator/visualization-timechart.png” border=“false”::: |
timepivot | Interactive navigation over the events time-line (pivoting on time axis) | :::image type=“icon” source=“media/renderoperator/visualization-time-pivot.png” border=“false”::: |
treemap | Displays hierarchical data as a set of nested rectangles. | :::image type=“icon” source=“media/renderoperator/tree-map.png” border=“false”::: |
Visualization | Description | Illustration |
---|---|---|
areachart | Area graph. First column is the x-axis and should be a numeric column. Other numeric columns are y-axes. | :::image type=“icon” source=“media/renderoperator/area-chart.png” border=“false”::: |
barchart | First column is the x-axis and can be text, datetime or numeric. Other columns are numeric, displayed as horizontal strips. | :::image type=“icon” source=“media/renderoperator/bar-chart.png” border=“false”::: |
columnchart | Like barchart with vertical strips instead of horizontal strips. | :::image type=“icon” source=“media/renderoperator/column-chart.png” border=“false”::: |
piechart | First column is color-axis, second column is numeric. | :::image type=“icon” source=“media/renderoperator/pie-chart.png” border=“false”::: |
scatterchart | Points graph. First column is the x-axis and should be a numeric column. Other numeric columns are y-axes. | :::image type=“icon” source=“media/renderoperator/scatter-chart.png” border=“false”::: |
table | Default - results are shown as a table. | :::image type=“icon” source=“media/renderoperator/table-visualization.png” border=“false”::: |
timechart | Line graph. First column is x-axis, and should be datetime. Other (numeric) columns are y-axes. There’s one string column whose values are used to “group” the numeric columns and create different lines in the chart (further string columns are ignored). | :::image type=“icon” source=“media/renderoperator/visualization-timechart.png” border=“false”::: |
visualization | Description | Illustration |
---|---|---|
anomalychart | Similar to timechart, but highlights anomalies using series_decompose_anomalies function. | :::image type=“icon” source=“media/renderoperator/anomaly-chart.png” border=“false”::: |
areachart | Area graph. | :::image type=“icon” source=“media/renderoperator/area-chart.png” border=“false”::: |
barchart | displayed as horizontal strips. | :::image type=“icon” source=“media/renderoperator/bar-chart.png” border=“false”::: |
card | First result record is treated as set of scalar values and shows as a card. | :::image type=“icon” source=“media/renderoperator/card.png” border=“false”::: |
columnchart | Like barchart with vertical strips instead of horizontal strips. | :::image type=“icon” source=“media/renderoperator/column-chart.png” border=“false”::: |
linechart | Line graph. | :::image type=“icon” source=“media/renderoperator/line-chart.png” border=“false”::: |
piechart | First column is color-axis, second column is numeric. | :::image type=“icon” source=“media/renderoperator/pie-chart.png” border=“false”::: |
scatterchart | Points graph. | :::image type=“icon” source=“media/renderoperator/scatter-chart.png” border=“false”::: |
stackedareachart | Stacked area graph. | :::image type=“icon” source=“media/renderoperator/stacked-area-chart.png” border=“false”::: |
table | Default - results are shown as a table. | :::image type=“icon” source=“media/renderoperator/table-visualization.png” border=“false”::: |
timechart | Line graph. First column is x-axis, and must be datetime. Other (numeric) columns are y-axes. | :::image type=“icon” source=“media/renderoperator/visualization-timechart.png” border=“false”::: |
Supported properties
PropertyName/PropertyValue indicate additional information to use when rendering. All properties are optional. The supported properties are:
PropertyName | PropertyValue |
---|---|
accumulate | Whether the value of each measure gets added to all its predecessors. (true or false ) |
kind | Further elaboration of the visualization kind. For more information, see kind property. |
legend | Whether to display a legend or not (visible or hidden ). |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
ymin | The minimum value to be displayed on Y-axis. |
ymax | The maximum value to be displayed on Y-axis. |
title | The title of the visualization (of type string ). |
xaxis | How to scale the x-axis (linear or log ). |
xcolumn | Which column in the result is used for the x-axis. |
xtitle | The title of the x-axis (of type string ). |
yaxis | How to scale the y-axis (linear or log ). |
ycolumns | Comma-delimited list of columns that consist of the values provided per value of the x column. |
ysplit | How to split the visualization into multiple y-axis values. For more information, see y-split property. |
ytitle | The title of the y-axis (of type string ). |
anomalycolumns | Property relevant only for anomalychart . Comma-delimited list of columns, which will be considered as anomaly series and displayed as points on the chart |
PropertyName | PropertyValue |
---|---|
kind | Further elaboration of the visualization kind. For more information, see kind property. |
series | Comma-delimited list of columns whose combined per-record values define the series that record belongs to. |
title | The title of the visualization (of type string ). |
kind
property
This visualization can be further elaborated by providing the kind
property.
The supported values of this property are:
Visualization | kind | Description |
---|---|---|
areachart | default | Each “area” stands on its own. |
unstacked | Same as default . | |
stacked | Stack “areas” to the right. | |
stacked100 | Stack “areas” to the right and stretch each one to the same width as the others. | |
barchart | default | Each “bar” stands on its own. |
unstacked | Same as default . | |
stacked | Stack “bars”. | |
stacked100 | Stack “bars” and stretch each one to the same width as the others. | |
columnchart | default | Each “column” stands on its own. |
unstacked | Same as default . | |
stacked | Stack “columns” one atop the other. | |
stacked100 | Stack “columns” and stretch each one to the same height as the others. | |
scatterchart | map | Expected columns are [Longitude, Latitude] or GeoJSON point. Series column is optional. For more information, see Geospatial visualizations. |
piechart | map | Expected columns are [Longitude, Latitude] or GeoJSON point, color-axis and numeric. Supported in Kusto Explorer desktop. For more information, see Geospatial visualizations. |
ysplit
property
Some visualizations support splitting into multiple y-axis values:
ysplit | Description |
---|---|
none | A single y-axis is displayed for all series data. (Default) |
axes | A single chart is displayed with multiple y-axes (one per series). |
panels | One chart is rendered for each ycolumn value. Maximum five panels. |
How to render continuous data
Several visualizations are used for rendering sequences of values, for example, linechart
, timechart
, and areachart
.
These visualizations have the following conceptual model:
- One column in the table represents the x-axis of the data. This column can be explicitly defined using the
xcolumn
property. If not defined, the user agent picks the first column that is appropriate for the visualization.- For example: in the
timechart
visualization, the user agent uses the firstdatetime
column. - If this column is of type
dynamic
and it holds an array, the individual values in the array will be treated as the values of the x-axis.
- For example: in the
- One or more columns in the table represent one or more measures that vary by the x-axis.
These columns can be explicitly defined using the
ycolumns
property. If not defined, the user agent picks all columns that are appropriate for the visualization.- For example: in the
timechart
visualization, the user agent uses all columns with a numeric value that haven’t been specified otherwise. - If the x-axis is an array, the values of each y-axis should also be an array of a similar length, with each y-axis occurring in a single column.
- For example: in the
- Zero or more columns in the table represent a unique set of dimensions that group together the measures. These columns can be specified by the
series
property, or the user agent will pick them automatically from the columns that are otherwise unspecified.
Related content
- Add a query visualization in the web UI
- Customize dashboard visuals
- Rendering examples in the tutorial
- Anomaly detection
three kinds of columns: property). For each record, the series has as many measurements (“points” in the chart) as there are y-axis columns.
Example
InsightsMetrics
| where Computer == "DC00.NA.contosohotels.com"
| where Namespace == "Processor" and Name == "UtilizationPercentage"
| summarize avg(Val) by Computer, bin(TimeGenerated, 1h)
| render timechart
3 - Summarize operator
3.1 - Kusto partition & compose intermediate aggregation results
Suppose you want to calculate the count of distinct users every day over the last seven days. You can run summarize dcount(user)
once a day with a span filtered to the last seven days. This method is inefficient, because each time the calculation is run, there’s a six-day overlap with the previous calculation. You can also calculate an aggregate for each day, and then combine these aggregates. This method requires you to “remember” the last six results, but it’s much more efficient.
Partitioning queries as described is easy for simple aggregates, such as count()
and sum()
. It can also be useful for complex aggregates, such as dcount()
and percentiles()
. This article explains how Kusto supports such calculations.
The following examples show how to use hll
/tdigest
and demonstrate that using these commands is highly performant in some scenarios:
range x from 1 to 1000000 step 1
| summarize hll(x,4)
| project sizeInMb = estimate_data_size(hll_x) / pow(1024,2)
Output
sizeInMb |
---|
1.0000524520874 |
Ingesting this object into a table before applying this kind of policy will ingest null:
.set-or-append MyTable <| range x from 1 to 1000000 step 1
| summarize hll(x,4)
MyTable
| project isempty(hll_x)
Output
Column1 |
---|
1 |
To avoid ingesting null, use the special encoding policy type bigobject
, which overrides the MaxValueSize
to 2 MB like this:
.alter column MyTable.hll_x policy encoding type='bigobject'
Ingesting a value now to the same table above:
.set-or-append MyTable <| range x from 1 to 1000000 step 1
| summarize hll(x,4)
ingests the second value successfully:
MyTable
| project isempty(hll_x)
Output
Column1 |
---|
1 |
0 |
Example: Count with binned timestamp
There’s a table, PageViewsHllTDigest
, containing hll
values of Pages viewed in each hour. You want these values binned to 12h
. Merge the hll
values using the hll_merge()
aggregate function, with the timestamp binned to 12h
. Use the function dcount_hll
to return the final dcount
value:
PageViewsHllTDigest
| summarize merged_hll = hll_merge(hllPage) by bin(Timestamp, 12h)
| project Timestamp , dcount_hll(merged_hll)
Output
Timestamp | dcount_hll_merged_hll |
---|---|
2016-05-01 12:00:00.0000000 | 20056275 |
2016-05-02 00:00:00.0000000 | 38797623 |
2016-05-02 12:00:00.0000000 | 39316056 |
2016-05-03 00:00:00.0000000 | 13685621 |
To bin timestamp for 1d
:
PageViewsHllTDigest
| summarize merged_hll = hll_merge(hllPage) by bin(Timestamp, 1d)
| project Timestamp , dcount_hll(merged_hll)
Output
Timestamp | dcount_hll_merged_hll |
---|---|
2016-05-01 00:00:00.0000000 | 20056275 |
2016-05-02 00:00:00.0000000 | 64135183 |
2016-05-03 00:00:00.0000000 | 13685621 |
The same query may be done over the values of tdigest
, which represent the BytesDelivered
in each hour:
PageViewsHllTDigest
| summarize merged_tdigests = merge_tdigest(tdigestBytesDel) by bin(Timestamp, 12h)
| project Timestamp , percentile_tdigest(merged_tdigests, 95, typeof(long))
Output
Timestamp | percentile_tdigest_merged_tdigests |
---|---|
2016-05-01 12:00:00.0000000 | 170200 |
2016-05-02 00:00:00.0000000 | 152975 |
2016-05-02 12:00:00.0000000 | 181315 |
2016-05-03 00:00:00.0000000 | 146817 |
Example: Temporary table
Kusto limits are reached with datasets that are too large, where you need to run periodic queries over the dataset, but run the regular queries to calculate percentile()
or dcount()
over large datasets.
To solve this problem, newly added data may be added to a temp table as hll
or tdigest
values using hll()
when the required operation is dcount
or tdigest()
when the required operation is percentile using set/append
or update policy
. In this case, the intermediate results of dcount
or tdigest
are saved into another dataset, which should be smaller than the target large one.
To solve this problem, newly added data may be added to a temp table as hll
or tdigest
values using hll()
when the required operation is dcount
. In this case, the intermediate results of dcount
are saved into another dataset, which should be smaller than the target large one.
When you need to get the final results of these values, the queries may use hll
/tdigest
mergers: hll-merge()
/tdigest_merge()
. Then, after getting the merged values, percentile_tdigest()
/ dcount_hll()
may be invoked on these merged values to get the final result of dcount
or percentiles.
Assuming there’s a table, PageViews, into which data is ingested daily, every day on which you want to calculate the distinct count of pages viewed per minute later than date = datetime(2016-05-01 18:00:00.0000000).
Run the following query:
PageViews
| where Timestamp > datetime(2016-05-01 18:00:00.0000000)
| summarize percentile(BytesDelivered, 90), dcount(Page,2) by bin(Timestamp, 1d)
Output
Timestamp | percentile_BytesDelivered_90 | dcount_Page |
---|---|---|
2016-05-01 00:00:00.0000000 | 83634 | 20056275 |
2016-05-02 00:00:00.0000000 | 82770 | 64135183 |
2016-05-03 00:00:00.0000000 | 72920 | 13685621 |
This query aggregates all the values every time you run this query (for example, if you want to run it many times a day).
If you save the hll
and tdigest
values (which are the intermediate results of dcount
and percentile) into a temp table, PageViewsHllTDigest
, using an update policy or set/append commands, you may only merge the values and then use dcount_hll
/percentile_tdigest
using the following query:
PageViewsHllTDigest
| summarize percentile_tdigest(merge_tdigest(tdigestBytesDel), 90), dcount_hll(hll_merge(hllPage)) by bin(Timestamp, 1d)
Output
Timestamp | percentile_tdigest_merge_tdigests_tdigestBytesDel | dcount_hll_hll_merge_hllPage |
---|---|---|
2016-05-01 00:00:00.0000000 | 84224 | 20056275 |
2016-05-02 00:00:00.0000000 | 83486 | 64135183 |
2016-05-03 00:00:00.0000000 | 72247 | 13685621 |
This query should be more performant, as it runs over a smaller table. In this example, the first query runs over ~215M records, while the second one runs over just 32 records:
Example: Intermediate results
The Retention Query. Assume you have a table that summarizes when each Wikipedia page was viewed (sample size is 10M), and you want to find for each date1 date2 the percentage of pages reviewed in both date1 and date2 relative to the pages viewed on date1 (date1 < date2).
The trivial way uses join and summarize operators:
// Get the total pages viewed each day
let totalPagesPerDay = PageViewsSample
| summarize by Page, Day = startofday(Timestamp)
| summarize count() by Day;
// Join the table to itself to get a grid where
// each row shows foreach page1, in which two dates
// it was viewed.
// Then count the pages between each two dates to
// get how many pages were viewed between date1 and date2.
PageViewsSample
| summarize by Page, Day1 = startofday(Timestamp)
| join kind = inner
(
PageViewsSample
| summarize by Page, Day2 = startofday(Timestamp)
)
on Page
| where Day2 > Day1
| summarize count() by Day1, Day2
| join kind = inner
totalPagesPerDay
on $left.Day1 == $right.Day
| project Day1, Day2, Percentage = count_*100.0/count_1
Output
Day1 | Day2 | Percentage |
---|---|---|
2016-05-01 00:00:00.0000000 | 2016-05-02 00:00:00.0000000 | 34.0645725975255 |
2016-05-01 00:00:00.0000000 | 2016-05-03 00:00:00.0000000 | 16.618368960101 |
2016-05-02 00:00:00.0000000 | 2016-05-03 00:00:00.0000000 | 14.6291376489636 |
The above query took ~18 seconds.
When you use the hll()
, hll_merge()
, and dcount_hll()
functions, the equivalent query will end after ~1.3 seconds and show that the hll
functions speeds up the query above by ~14 times:
let Stats=PageViewsSample | summarize pagehll=hll(Page, 2) by day=startofday(Timestamp); // saving the hll values (intermediate results of the dcount values)
let day0=toscalar(Stats | summarize min(day)); // finding the min date over all dates.
let dayn=toscalar(Stats | summarize max(day)); // finding the max date over all dates.
let daycount=tolong((dayn-day0)/1d); // finding the range between max and min
Stats
| project idx=tolong((day-day0)/1d), day, pagehll
| mv-expand pidx=range(0, daycount) to typeof(long)
// Extend the column to get the dcount value from hll'ed values for each date (same as totalPagesPerDay from the above query)
| extend key1=iff(idx < pidx, idx, pidx), key2=iff(idx < pidx, pidx, idx), pages=dcount_hll(pagehll)
// For each two dates, merge the hll'ed values to get the total dcount over each two dates,
// This helps to get the pages viewed in both date1 and date2 (see the description below about the intersection_size)
| summarize (day1, pages1)=arg_min(day, pages), (day2, pages2)=arg_max(day, pages), union_size=dcount_hll(hll_merge(pagehll)) by key1, key2
| where day2 > day1
// To get pages viewed in date1 and also date2, look at the merged dcount of date1 and date2, subtract it from pages of date1 + pages on date2.
| project pages1, day1,day2, intersection_size=(pages1 + pages2 - union_size)
| project day1, day2, Percentage = intersection_size*100.0 / pages1
Output
day1 | day2 | Percentage |
---|---|---|
2016-05-01 00:00:00.0000000 | 2016-05-02 00:00:00.0000000 | 33.2298494510578 |
2016-05-01 00:00:00.0000000 | 2016-05-03 00:00:00.0000000 | 16.9773830213667 |
2016-05-02 00:00:00.0000000 | 2016-05-03 00:00:00.0000000 | 14.5160020350006 |
3.2 - summarize operator
Produces a table that aggregates the content of the input table.
Syntax
T | summarize
[ SummarizeParameters ]
[[Column =
] Aggregation [,
…]]
[by
[Column =
] GroupExpression [,
…]]
Parameters
Name | Type | Required | Description |
---|---|---|---|
Column | string | The name for the result column. Defaults to a name derived from the expression. | |
Aggregation | string | ✔️ | A call to an aggregation function such as count() or avg() , with column names as arguments. |
GroupExpression | scalar | ✔️ | A scalar expression that can reference the input data. The output will have as many records as there are distinct values of all the group expressions. |
SummarizeParameters | string | Zero or more space-separated parameters in the form of Name = Value that control the behavior. See supported parameters. |
Supported parameters
Name | Description |
---|---|
hint.num_partitions | Specifies the number of partitions used to share the query load on cluster nodes. See shuffle query |
hint.shufflekey=<key> | The shufflekey query shares the query load on cluster nodes, using a key to partition data. See shuffle query |
hint.strategy=shuffle | The shuffle strategy query shares the query load on cluster nodes, where each node will process one partition of the data. See shuffle query |
Returns
The input rows are arranged into groups having the same values of the by
expressions. Then the specified aggregation functions are computed over each group, producing a row for each group. The result contains the by
columns and also at least one column for each computed aggregate. (Some aggregation functions return multiple columns.)
The result has as many rows as there are distinct combinations of by
values
(which may be zero). If there are no group keys provided, the result has a single
record.
To summarize over ranges of numeric values, use bin()
to reduce ranges to discrete values.
Default values of aggregations
The following table summarizes the default values of aggregations:
Operator | Default value |
---|---|
count() , countif() , dcount() , dcountif() , count_distinct() , sum() , sumif() , variance() , varianceif() , stdev() , stdevif() | 0 |
make_bag() , make_bag_if() , make_list() , make_list_if() , make_set() , make_set_if() | empty dynamic array ([]) |
All others | null |
Examples
The example in this section shows how to use the syntax to help you get started.
Unique combination
The following query determines what unique combinations of State
and EventType
there are for storms that resulted in direct injury. There are no aggregation functions, just group-by keys. The output will just show the columns for those results.
StormEvents
| where InjuriesDirect > 0
| summarize by State, EventType
Output
The following table shows only the first 5 rows. To see the full output, run the query.
State | EventType |
---|---|
TEXAS | Thunderstorm Wind |
TEXAS | Flash Flood |
TEXAS | Winter Weather |
TEXAS | High Wind |
TEXAS | Flood |
… | … |
Minimum and maximum timestamp
Finds the minimum and maximum heavy rain storms in Hawaii. There’s no group-by clause, so there’s just one row in the output.
StormEvents
| where State == "HAWAII" and EventType == "Heavy Rain"
| project Duration = EndTime - StartTime
| summarize Min = min(Duration), Max = max(Duration)
Output
Min | Max |
---|---|
01:08:00 | 11:55:00 |
Distinct count
The following query calculates the number of unique storm event types for each state and sorts the results by the number of unique storm types:
StormEvents
| summarize TypesOfStorms=dcount(EventType) by State
| sort by TypesOfStorms
Output
The following table shows only the first 5 rows. To see the full output, run the query.
State | TypesOfStorms |
---|---|
TEXAS | 27 |
CALIFORNIA | 26 |
PENNSYLVANIA | 25 |
GEORGIA | 24 |
ILLINOIS | 23 |
… | … |
Histogram
The following example calculates a histogram storm event types that had storms lasting longer than 1 day. Because Duration
has many values, use bin()
to group its values into 1-day intervals.
StormEvents
| project EventType, Duration = EndTime - StartTime
| where Duration > 1d
| summarize EventCount=count() by EventType, Length=bin(Duration, 1d)
| sort by Length
Output
EventType | Length | EventCount |
---|---|---|
Drought | 30.00:00:00 | 1646 |
Wildfire | 30.00:00:00 | 11 |
Heat | 30.00:00:00 | 14 |
Flood | 30.00:00:00 | 20 |
Heavy Rain | 29.00:00:00 | 42 |
… | … | … |
Aggregates default values
When the input of summarize
operator has at least one empty group-by key, its result is empty, too.
When the input of summarize
operator doesn’t have an empty group-by key, the result is the default values of the aggregates used in the summarize
For more information, see Default values of aggregations.
datatable(x:long)[]
| summarize any_x=take_any(x), arg_max_x=arg_max(x, *), arg_min_x=arg_min(x, *), avg(x), buildschema(todynamic(tostring(x))), max(x), min(x), percentile(x, 55), hll(x) ,stdev(x), sum(x), sumif(x, x > 0), tdigest(x), variance(x)
Output
any_x | arg_max_x | arg_min_x | avg_x | schema_x | max_x | min_x | percentile_x_55 | hll_x | stdev_x | sum_x | sumif_x | tdigest_x | variance_x |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NaN | 0 | 0 | 0 | 0 |
The result of avg_x(x)
is NaN
due to dividing by 0.
datatable(x:long)[]
| summarize count(x), countif(x > 0) , dcount(x), dcountif(x, x > 0)
Output
count_x | countif_ | dcount_x | dcountif_x |
---|---|---|---|
0 | 0 | 0 | 0 |
datatable(x:long)[]
| summarize make_set(x), make_list(x)
Output
set_x | list_x |
---|---|
[] | [] |
The aggregate avg sums all the non-nulls and counts only those which participated in the calculation (won’t take nulls into account).
range x from 1 to 4 step 1
| extend y = iff(x == 1, real(null), real(5))
| summarize sum(y), avg(y)
Output
sum_y | avg_y |
---|---|
15 | 5 |
The regular count will count nulls:
range x from 1 to 2 step 1
| extend y = iff(x == 1, real(null), real(5))
| summarize count(y)
Output
count_y |
---|
2 |
range x from 1 to 2 step 1
| extend y = iff(x == 1, real(null), real(5))
| summarize make_set(y), make_set(y)
Output
set_y | set_y1 |
---|---|
[5.0] | [5.0] |
4 - as operator
Binds a name to the operator’s input tabular expression. This operator allows the query to reference the value of the tabular expression multiple times without breaking the query and binding a name through the let statement.
To optimize multiple uses of the as
operator within a single query, see Named expressions.
Syntax
T |
as
[hint.materialized
=
Materialized] Name
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular expression to rename. |
Name | string | ✔️ | The temporary name for the tabular expression. |
hint.materialized | bool | If Materialized is set to true , the value of the tabular expression output is wrapped by a materialize() function call. Otherwise, the value is recalculated on every reference. |
Examples
In the following two examples, the generated TableName column consists of ‘T1’ and ‘T2’.
range x from 1 to 5 step 1
| as T1
| union withsource=TableName (range x from 1 to 5 step 1 | as T2)
Alternatively, you can write the same example as follows:
union withsource=TableName (range x from 1 to 5 step 1 | as T1), (range x from 1 to 5 step 1 | as T2)
Output
TableName | x |
---|---|
T1 | 1 |
T1 | 2 |
T1 | 3 |
T1 | 4 |
T1 | 5 |
T2 | 1 |
T2 | 2 |
T2 | 3 |
T2 | 4 |
T2 | 5 |
In the following example, the ’left side’ of the join is:
MyLogTable
filtered by type == "Event"
and Name == "Start"
and the ‘right side’ of the join is:
MyLogTable
filtered by type == "Event"
and Name == "Stop"
MyLogTable
| where type == "Event"
| as T
| where Name == "Start"
| join (
T
| where Name == "Stop"
) on ActivityId
5 - consume operator
Consumes the tabular data stream handed to the operator.
The consume
operator is mostly used for triggering the query side-effect without actually returning
the results back to the caller.
The consume
operator can be used for estimating the
cost of a query without actually delivering the results back to the client.
(The estimation isn’t exact for various reasons; for example, consume
is calculated distributively, so T | consume
won’t transmit the table’s
data between the nodes of the cluster.)
Syntax
consume
[decodeblocks
=
DecodeBlocks]
Parameters
Name | Type | Required | Description |
---|---|---|---|
DecodeBlocks | bool | If set to true , or if the request property perftrace is set to true , the consume operator won’t just enumerate the records at its input, but actually force each value in those records to be decompressed and decoded. |
6 - count operator
Returns the number of records in the input record set.
Syntax
T |
count
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose records are to be counted. |
Returns
This function returns a table with a single record and column of type
long
. The value of the only cell is the number of records in T.
Example
When you use the count operator with a table name, like StormEvents, it will return the total number of records in that table.
StormEvents | count
Output
Count |
---|
59066 |
Related content
For information about the count() aggregation function, see count() (aggregation function).
7 - datatable operator
Returns a table whose schema and values are defined in the query itself.
Syntax
datatable(
ColumnName :
ColumnType [,
…])
[
ScalarValue [,
…] ]
Parameters
Name | Type | Required | Description |
---|---|---|---|
ColumnName | string | ✔️ | The name for a column. |
ColumnType | string | ✔️ | The type of data in the column. |
ScalarValue | scalar | ✔️ | The value to insert into the table. The total number of values must be a multiple of the number of columns in the table. Each value is assigned to a column based on its position. Specifically, the n’th value is assigned to the column at position n % NumColumns, where NumColumns is the total number of columns. |
Returns
This operator returns a data table of the given schema and data.
Example
This example creates a table with Date, Event, and MoreData columns, filters rows with Event descriptions longer than 4 characters, and adds a new column key2 to each row from the MoreData dynamic object.
datatable(Date:datetime, Event:string, MoreData:dynamic) [
datetime(1910-06-11), "Born", dynamic({"key1":"value1", "key2":"value2"}),
datetime(1930-01-01), "Enters Ecole Navale", dynamic({"key1":"value3", "key2":"value4"}),
datetime(1953-01-01), "Published first book", dynamic({"key1":"value5", "key2":"value6"}),
datetime(1997-06-25), "Died", dynamic({"key1":"value7", "key2":"value8"}),
]
| where strlen(Event) > 4
| extend key2 = MoreData.key2
Output
Date | Event | MoreData | key2 |
---|---|---|---|
1930-01-01 00:00:00.0000000 | Enters Ecole Navale | { “key1”: “value3”, “key2”: “value4” } | value4 |
1953-01-01 00:00:00.0000000 | Published first book | { “key1”: “value5”, “key2”: “value6” } | value6 |
8 - distinct operator
Produces a table with the distinct combination of the provided columns of the input table.
Syntax
T | distinct
ColumnName[,
ColumnName2, ...]
Parameters
Name | Type | Required | Description |
---|---|---|---|
ColumnName | string | ✔️ | The column name to search for distinct values. |
Example
Shows distinct combination of states and type of events that led to over 45 direct injuries.
StormEvents
| where InjuriesDirect > 45
| distinct State, EventType
Output
State | EventType |
---|---|
TEXAS | Winter Weather |
KANSAS | Tornado |
MISSOURI | Excessive Heat |
OKLAHOMA | Thunderstorm Wind |
OKLAHOMA | Excessive Heat |
ALABAMA | Tornado |
ALABAMA | Heat |
TENNESSEE | Heat |
CALIFORNIA | Wildfire |
Related content
If the group by keys are of high cardinalities, try summarize by ...
with the shuffle strategy.
9 - evaluate plugin operator
Invokes a service-side query extension (plugin).
The evaluate
operator is a tabular operator that allows you to invoke query language extensions known as plugins. Unlike other language constructs, plugins can be enabled or disabled. Plugins aren’t “bound” by the relational nature of the language. In other words, they may not have a predefined, statically determined, output schema.
Syntax
[T |
] evaluate
[ evaluateParameters ] PluginName (
[ PluginArgs ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | A tabular input to the plugin. Some plugins don’t take any input and act as a tabular data source. | |
evaluateParameters | string | Zero or more space-separated evaluate parameters in the form of Name = Value that control the behavior of the evaluate operation and execution plan. Each plugin may decide differently how to handle each parameter. Refer to each plugin’s documentation for specific behavior. | |
PluginName | string | ✔️ | The mandatory name of the plugin being invoked. |
PluginArgs | string | Zero or more comma-separated arguments to provide to the plugin. |
Evaluate parameters
The following parameters are supported:
Name | Values | Description |
---|---|---|
hint.distribution | single , per_node , per_shard | Distribution hints |
hint.pass_filters | true , false | Allow evaluate operator to passthrough any matching filters before the plugin. Filter is considered as ‘matched’ if it refers to a column existing before the evaluate operator. Default: false |
hint.pass_filters_column | column_name | Allow plugin operator to passthrough filters referring to column_name before the plugin. Parameter can be used multiple times with different column names. |
Plugins
The following plugins are supported:
- autocluster plugin
- azure-digital-twins-query-request plugin
- bag-unpack plugin
- basket plugin
- cosmosdb-sql-request plugin
- dcount-intersect plugin
- diffpatterns plugin
- diffpatterns-text plugin
- infer-storage-schema plugin
- ipv4-lookup plugin
- ipv6-lookup plugin
- mysql-request-plugin
- narrow plugin
- pivot plugin
- preview plugin
- R plugin
- rolling-percentile plugin
- rows-near plugin
- schema-merge plugin
- sql-request plugin
- sequence-detect plugin
Distribution hints
Distribution hints specify how the plugin execution will be distributed across multiple cluster nodes. Each plugin may implement a different support for the distribution. The plugin’s documentation specifies the distribution options supported by the plugin.
Possible values:
single
: A single instance of the plugin will run over the entire query data.per_node
: If the query before the plugin call is distributed across nodes, then an instance of the plugin will run on each node over the data that it contains.per_shard
: If the data before the plugin call is distributed across shards, then an instance of the plugin will run over each shard of the data.
10 - extend operator
Creates calculated columns and append them to the result set.
Syntax
T | extend
[ColumnName | (
ColumnName[,
…])
=
] Expression [,
…]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Tabular input to extend. |
ColumnName | string | Name of the column to add or update. | |
Expression | string | ✔️ | Calculation to perform over the input. |
- If ColumnName is omitted, the output column name of Expression is automatically generated.
- If Expression returns more than one column, a list of column names can be specified in parentheses. Then, Expression’s output columns is given the specified names. If a list of the column names isn’t specified, all Expression’s output columns with generated names are added to the output.
Returns
A copy of the input tabular result set, such that:
- Column names noted by
extend
that already exist in the input are removed and appended as their new calculated values. - Column names noted by
extend
that don’t exist in the input are appended as their new calculated values.
not have an index. In most cases, if the new column is set to be exactly
the same as an existing table column that has an index, Kusto can automatically
use the existing index. However, in some complex scenarios this propagation is
not done. In such cases, if the goal is to rename a column, use the project-rename
operator instead.
Example
StormEvents
| project EndTime, StartTime
| extend Duration = EndTime - StartTime
The following table shows only the first 10 results. To see the full output, run the query.
EndTime | StartTime | Duration |
---|---|---|
2007-01-01T00:00:00Z | 2007-01-01T00:00:00Z | 00:00:00 |
2007-01-01T00:25:00Z | 2007-01-01T00:25:00Z | 00:00:00 |
2007-01-01T02:24:00Z | 2007-01-01T02:24:00Z | 00:00:00 |
2007-01-01T03:45:00Z | 2007-01-01T03:45:00Z | 00:00:00 |
2007-01-01T04:35:00Z | 2007-01-01T04:35:00Z | 00:00:00 |
2007-01-01T04:37:00Z | 2007-01-01T03:37:00Z | 01:00:00 |
2007-01-01T05:00:00Z | 2007-01-01T00:00:00Z | 05:00:00 |
2007-01-01T05:00:00Z | 2007-01-01T00:00:00Z | 05:00:00 |
2007-01-01T06:00:00Z | 2007-01-01T00:00:00Z | 06:00:00 |
2007-01-01T06:00:00Z | 2007-01-01T00:00:00Z | 06:00:00 |
Related content
- Use series_stats to return multiple columns
11 - externaldata operator
The externaldata
operator returns a table whose schema is defined in the query itself, and whose data is read from an external storage artifact, such as a blob in Azure Blob Storage or a file in Azure Data Lake Storage.
Syntax
externaldata
(
columnName:
columnType [,
…] )
[
storageConnectionString [,
…] ]
[with
(
propertyName =
propertyValue [,
…])
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
columnName, columnType | string | ✔️ | A list of column names and their types. This list defines the schema of the table. |
storageConnectionString | string | ✔️ | A storage connection string of the storage artifact to query. |
propertyName, propertyValue | string | A list of optional supported properties that determines how to interpret the data retrieved from storage. |
Supported properties
Property | Type | Description |
---|---|---|
format | string | The data format. If unspecified, an attempt is made to detect the data format from file extension. The default is CSV . All ingestion data formats are supported. |
ignoreFirstRecord | bool | If set to true , the first record in every file is ignored. This property is useful when querying CSV files with headers. |
ingestionMapping | string | Indicates how to map data from the source file to the actual columns in the operator result set. See data mappings. |
Returns
The externaldata
operator returns a data table of the given schema whose data was parsed from the specified storage artifact, indicated by the storage connection string.
Examples
The examples query data in an external storage file.
Fetch a list of user IDs stored in Azure Blob Storage
The following example shows how to find all records in a table whose UserID
column falls into a known set of IDs, held (one per line) in an external storage file. Since the data format isn’t specified, the detected data format is TXT
.
Users
| where UserID in ((externaldata (UserID:string) [
@"https://storageaccount.blob.core.windows.net/storagecontainer/users.txt"
h@"?...SAS..." // Secret token needed to access the blob
]))
| ...
Query multiple data files
The following example queries multiple data files stored in external storage.
externaldata(Timestamp:datetime, ProductId:string, ProductDescription:string)
[
h@"https://mycompanystorage.blob.core.windows.net/archivedproducts/2019/01/01/part-00000-7e967c99-cf2b-4dbb-8c53-ce388389470d.csv.gz?...SAS...",
h@"https://mycompanystorage.blob.core.windows.net/archivedproducts/2019/01/02/part-00000-ba356fa4-f85f-430a-8b5a-afd64f128ca4.csv.gz?...SAS...",
h@"https://mycompanystorage.blob.core.windows.net/archivedproducts/2019/01/03/part-00000-acb644dc-2fc6-467c-ab80-d1590b23fc31.csv.gz?...SAS..."
]
with(format="csv")
| summarize count() by ProductId
The above example can be thought of as a quick way to query multiple data files without defining an external table.
Query hierarchical data formats
To query hierarchical data format, such as JSON
, Parquet
, Avro
, or ORC
, ingestionMapping
must be specified in the operator properties.
In this example, there’s a JSON file stored in Azure Blob Storage with the following contents:
{
"timestamp": "2019-01-01 10:00:00.238521",
"data": {
"tenant": "e1ef54a6-c6f2-4389-836e-d289b37bcfe0",
"method": "RefreshTableMetadata"
}
}
{
"timestamp": "2019-01-01 10:00:01.845423",
"data": {
"tenant": "9b49d0d7-b3e6-4467-bb35-fa420a25d324",
"method": "GetFileList"
}
}
...
To query this file using the externaldata
operator, a data mapping must be specified. The mapping dictates how to map JSON fields to the operator result set columns:
externaldata(Timestamp: datetime, TenantId: guid, MethodName: string)
[
h@'https://mycompanystorage.blob.core.windows.net/events/2020/09/01/part-0000046c049c1-86e2-4e74-8583-506bda10cca8.json?...SAS...'
]
with(format='multijson', ingestionMapping='[{"Column":"Timestamp","Properties":{"Path":"$.timestamp"}},{"Column":"TenantId","Properties":{"Path":"$.data.tenant"}},{"Column":"MethodName","Properties":{"Path":"$.data.method"}}]')
The MultiJSON
format is used here because single JSON records are spanned into multiple lines.
Related content
For more info on mapping syntax, see data mappings.
12 - facet operator
Returns a set of tables, one for each column specified in the facet clause.
Each table contains the list of values taken by its column.
An additional table can be created by using the with
clause. Facet result tables can’t be renamed or referenced by any additional operators.
Syntax
T | facet by
ColumnName [,
ColumnName2,
…] [with (
filterPipe )
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
ColumnName | string | ✔️ | The column name, or list of column names, to be summarized. |
filterPipe | string | A query expression applied to the input table. |
Returns
Multiple tables: one for the with
clause, and one for each column.
Example
StormEvents
| where State startswith "A" and EventType has "Heavy"
| facet by State, EventType
with
(
where StartTime between(datetime(2007-01-04) .. 7d)
| project State, StartTime, Source, EpisodeId, EventType
| take 5
)
The following is the table generated by the with
clause.
State | StartTime | Source | EpisodeId | EventType |
---|---|---|---|---|
ALASKA | 2007-01-04 12:00:00.0000000 | COOP Observer | 2192 | Heavy Snow |
ALASKA | 2007-01-04 15:00:00.0000000 | Trained Spotter | 2192 | Heavy Snow |
ALASKA | 2007-01-04 15:00:00.0000000 | Trained Spotter | 2192 | Heavy Snow |
ALASKA | 2007-01-04 15:00:00.0000000 | Trained Spotter | 2192 | Heavy Snow |
ALASKA | 2007-01-06 18:00:00.0000000 | COOP Observer | 2193 | Heavy Snow |
The following table is the State
facet output table.
State | count_State |
---|---|
ALABAMA | 19 |
ARIZONA | 33 |
ARKANSAS | 1 |
AMERICAN SAMOA | 1 |
ALASKA | 58 |
The following table is the EventType
facet output table.
EventType | count_EventType |
---|---|
Heavy Rain | 34 |
Heavy Snow | 78 |
13 - find operator
Finds rows that match a predicate across a set of tables.
The scope of the find
operator can also be cross-database or cross-cluster.
find in (Table1, Table2, Table3) where Fruit=="apple"
find in (database('*').*) where Fruit == "apple"
find in (cluster('cluster_name').database('MyDB*').*) where Fruit == "apple"
find in (Table1, Table2, Table3) where Fruit=="apple"
Syntax
find
[withsource
= ColumnName] [in
(
Tables)
]where
Predicate [project-smart
|project
ColumnName[:
ColumnType,
… ] [,
pack_all()
]]find
Predicate [project-smart
|project
ColumnName[:
ColumnType,
… ] [,
pack_all()
]]
Parameters
Name | Type | Required | Description |
---|---|---|---|
ColumnName | string | By default, the output includes a column called source_ whose values indicate which source table contributed to each row. If specified, ColumnName is used instead of source_. After wildcard matching, if the query references tables from more than one database including the default database, the value of this column has a table name qualified with the database. Similarly cluster and database qualifications are present in the value if more than one cluster is referenced. | |
Predicate | bool | ✔️ | This boolean expression is evaluated for each row in each input table. For more information, see predicate-syntax details. |
Tables | string | Zero or more comma-separated table references. By default, find looks in all the tables in the current database. You can use:1. The name of a table, such as Events 2. A query expression, such as `(Events | |
project-smart or project | string | If not specified, project-smart is used by default. For more information, see output-schema details. |
withsource=
ColumnName: Optional. By default, the output includes a column called source_ whose values indicate which source table contributed each row. If specified, ColumnName is used instead of source_.Predicate: A
boolean
expression over the columns of the input tables Table [,
Table, …]. It’s evaluated for each row in each input table. For more information, see predicate-syntax details.Tables: Optional. Zero or more comma-separated table references. By default find searches all tables for:
- The name of a table, such as
Events
- A query expression, such as
(Events | where id==42)
- A set of tables specified with a wildcard. For example,
E*
would form the union of all the tables whose names begin withE
.
- The name of a table, such as
project-smart
|project
: If not specifiedproject-smart
is used by default. For more information, see output-schema details.
Returns
Transformation of rows in Table [,
Table, …] for which Predicate is true
. The rows are transformed according to the output schema.
Output schema
source_ column
The find
operator output always includes a source_ column with the source table name. The column can be renamed using the withsource
parameter.
results columns
Source tables that don’t contain any column used by the predicate evaluation, are filtered out.
When you use project-smart
, the columns that appear in the output are:
- Columns that appear explicitly in the predicate.
- Columns that are common to all the filtered tables.
The rest of the columns are packed into a property bag and appear in an extra pack
column.
A column that is referenced explicitly by the predicate and appears in multiple tables with multiple types, has a different column in the result schema for each such type. Each of the column names is constructed from the original column name and type, separated by an underscore.
When using project
ColumnName[:
ColumnType ,
… ] [,
pack_all()
]:
- The result table includes the columns specified in the list. If a source table doesn’t contain a certain column, the values in the corresponding rows are null.
- When you specify a ColumnType with a ColumnName, this column in the “result” has the given type, and the values are cast to that type if needed. The casting doesn’t have an effect on the column type when evaluating the Predicate.
- When
pack_all()
is used, all the columns, including the projected columns, are packed into a property bag and appear in an extra column, by default ‘column1’. In the property bag, the source column name serves as the property name and the column’s value serves as the property value.
Predicate syntax
The find
operator supports an alternative syntax for the * has
term, and using just term, searches a term across all input columns.
For a summary of some filtering functions, see where operator.
Considerations
- If the
project
clause references a column that appears in multiple tables and has multiple types, a type must follow this column reference in the project clause - If a column appears in multiple tables and has multiple types and
project-smart
is in use, there’s a corresponding column for each type in thefind
’s result, as described in union - When you use project-smart, changes in the predicate, in the source tables set, or in the tables schema, might result in a change to the output schema. If a constant result schema is needed, use project instead
find
scope can’t include functions. To include a function in thefind
scope, define a let statement with view keyword.
Performance tips
- Use tables as opposed to tabular expressions.
If tabular expression, the find operator falls back to a
union
query that can result in degraded performance. - If a column that appears in multiple tables and has multiple types, is part of the project clause, prefer adding a ColumnType to the project clause over modifying the table before passing it to
find
. - Add time-based filters to the predicate. Use a datetime column value or ingestion_time().
- Search in specific columns rather than a full text search.
- It’s better not to reference columns that appear in multiple tables and have multiple types. If the predicate is valid when resolving such columns type for more than one type, the query falls back to union.
For example, see examples of cases where
find
acts as a union.
Examples
Term lookup across all tables
The query finds all rows from all tables in the current database in which any column includes the word Hernandez
. The resulting records are transformed according to the output schema. The output includes rows from the Customers
table and the SalesTable
table of the ContosoSales
database.
find "Hernandez"
Output
This table shows the first three rows of the output.
source_ | pack_ |
---|---|
Customers | {“CityName”:“Ballard”,“CompanyName”:“NULL”,“ContinentName”:“North America”,“CustomerKey”:5023,“Education”:“Partial High School”,“FirstName”:“Devin”,“Gender”:“M”,“LastName”:“Hernandez”,“MaritalStatus”:“S”,“Occupation”:“Clerical”,“RegionCountryName”:“United States”,“StateProvinceName”:“Washington”} |
Customers | {“CityName”:“Ballard”,“CompanyName”:“NULL”,“ContinentName”:“North America”,“CustomerKey”:7814,“Education”:“Partial College”,“FirstName”:“Kristy”,“Gender”:“F”,“LastName”:“Hernandez”,“MaritalStatus”:“S”,“Occupation”:“Professional”,“RegionCountryName”:“United States”,“StateProvinceName”:“Washington”} |
Customers | {“CityName”:“Ballard”,“CompanyName”:“NULL”,“ContinentName”:“North America”,“CustomerKey”:7888,“Education”:“Partial High School”,“FirstName”:“Kari”,“Gender”:“F”,“LastName”:“Hernandez”,“MaritalStatus”:“S”,“Occupation”:“Clerical”,“RegionCountryName”:“United States”,“StateProvinceName”:“Washington”} |
… | … |
Term lookup across all tables matching a name pattern
The query finds all rows from all tables in the current database whose name starts with C
, and in which any column includes the word Hernandez
. The resulting records are transformed according to the output schema. Now, the output only contains records from the Customers
table.
find in (C*) where * has "Hernandez"
Output
This table shows the first three rows of the output.
source_ | pack_ |
---|---|
ConferenceSessions | {“conference”:“Build 2021”,“sessionid”:“CON-PRT103”,“session_title”:“Roundtable: Advanced Kusto query language topics”,“session_type”:“Roundtable”,“owner”:“Avner Aharoni”,“participants”:“Alexander Sloutsky, Tzvia Gitlin-Troyna”,“URL”:“https://sessions.mybuild.microsoft.com/sessions/details/4d4887e9-f08d-4f88-99ac-41e5feb869e7","level":200,"session_location":"Online","starttime":"2021-05-26T08:30:00.0000000Z","duration":60,"time_and_duration":"Wednesday, May 26\n8:30 AM - 9:30 AM GMT”,“kusto_affinity”:“Focused”} |
ConferenceSessions | {“conference”:“Ignite 2018”,“sessionid”:“THR3115”,“session_title”:“Azure Log Analytics: Deep dive into the Azure Kusto query language. “,“session_type”:“Theater”,“owner”:“Jean Francois Berenguer”,“participants”:””,“URL”:“https://myignite.techcommunity.microsoft.com/sessions/66329","level":300,"session_location":"","starttime":null,"duration":null,"time_and_duration":"","kusto_affinity":"Focused"} |
ConferenceSessions | {“conference”:“Build 2021”,“sessionid”:“CON-PRT103”,“session_title”:“Roundtable: Advanced Kusto query language topics”,“session_type”:“Roundtable”,“owner”:“Avner Aharoni”,“participants”:“Alexander Sloutsky, Tzvia Gitlin-Troyna”,“URL”:“https://sessions.mybuild.microsoft.com/sessions/details/4d4887e9-f08d-4f88-99ac-41e5feb869e7","level":200,"session_location":"Online","starttime":"2021-05-26T08:30:00.0000000Z","duration":60,"time_and_duration":"Wednesday, May 26\n8:30 AM - 9:30 AM GMT”,“kusto_affinity”:“Focused”} |
… | … |
Term lookup across the cluster
The query finds all rows from all tables in all databases in the cluster in which any column includes the word Kusto
.
This query is a cross-database query.
The resulting records are transformed according to the output schema.
find in (database('*').*) where * has "Kusto"
Output
This table shows the first three rows of the output.
source_ | pack_ |
---|---|
database(“Samples”).ConferenceSessions | {“conference”:“Build 2021”,“sessionid”:“CON-PRT103”,“session_title”:“Roundtable: Advanced Kusto query language topics”,“session_type”:“Roundtable”,“owner”:“Avner Aharoni”,“participants”:“Alexander Sloutsky, Tzvia Gitlin-Troyna”,“URL”:“https://sessions.mybuild.microsoft.com/sessions/details/4d4887e9-f08d-4f88-99ac-41e5feb869e7","level":200,"session_location":"Online","starttime":"2021-05-26T08:30:00.0000000Z","duration":60,"time_and_duration":"Wednesday, May 26\n8:30 AM - 9:30 AM GMT”,“kusto_affinity”:“Focused”} |
database(“Samples”).ConferenceSessions | {“conference”:“Ignite 2018”,“sessionid”:“THR3115”,“session_title”:“Azure Log Analytics: Deep dive into the Azure Kusto query language. “,“session_type”:“Theater”,“owner”:“Jean Francois Berenguer”,“participants”:””,“URL”:“https://myignite.techcommunity.microsoft.com/sessions/66329","level":300,"session_location":"","starttime":null,"duration":null,"time_and_duration":"","kusto_affinity":"Focused"} |
database(“Samples”).ConferenceSessions | {“conference”:“Build 2021”,“sessionid”:“CON-PRT103”,“session_title”:“Roundtable: Advanced Kusto query language topics”,“session_type”:“Roundtable”,“owner”:“Avner Aharoni”,“participants”:“Alexander Sloutsky, Tzvia Gitlin-Troyna”,“URL”:“https://sessions.mybuild.microsoft.com/sessions/details/4d4887e9-f08d-4f88-99ac-41e5feb869e7","level":200,"session_location":"Online","starttime":"2021-05-26T08:30:00.0000000Z","duration":60,"time_and_duration":"Wednesday, May 26\n8:30 AM - 9:30 AM GMT”,“kusto_affinity”:“Focused”} |
… | … |
Term lookup matching a name pattern in the cluster
The query finds all rows from all tables whose name starts with K
in all databases whose name start with B
and in which any column includes the word Kusto
.
The resulting records are transformed according to the output schema.
find in (database("S*").C*) where * has "Kusto"
Output
This table shows the first three rows of the output.
source_ | pack_ |
---|---|
ConferenceSessions | {“conference”:“Build 2021”,“sessionid”:“CON-PRT103”,“session_title”:“Roundtable: Advanced Kusto query language topics”,“session_type”:“Roundtable”,“owner”:“Avner Aharoni”,“participants”:“Alexander Sloutsky, Tzvia Gitlin-Troyna”,“URL”:“https://sessions.mybuild.microsoft.com/sessions/details/4d4887e9-f08d-4f88-99ac-41e5feb869e7","level":200,"session_location":"Online","starttime":"2021-05-26T08:30:00.0000000Z","duration":60,"time_and_duration":"Wednesday, May 26\n8:30 AM - 9:30 AM GMT”,“kusto_affinity”:“Focused”} |
ConferenceSessions | {“conference”:“Build 2021”,“sessionid”:“CON-PRT103”,“session_title”:“Roundtable: Advanced Kusto query language topics”,“session_type”:“Roundtable”,“owner”:“Avner Aharoni”,“participants”:“Alexander Sloutsky, Tzvia Gitlin-Troyna”,“URL”:“https://sessions.mybuild.microsoft.com/sessions/details/4d4887e9-f08d-4f88-99ac-41e5feb869e7","level":200,"session_location":"Online","starttime":"2021-05-26T08:30:00.0000000Z","duration":60,"time_and_duration":"Wednesday, May 26\n8:30 AM - 9:30 AM GMT”,“kusto_affinity”:“Focused”} |
ConferenceSessions | {“conference”:“Build 2021”,“sessionid”:“CON-PRT103”,“session_title”:“Roundtable: Advanced Kusto query language topics”,“session_type”:“Roundtable”,“owner”:“Avner Aharoni”,“participants”:“Alexander Sloutsky, Tzvia Gitlin-Troyna”,“URL”:“https://sessions.mybuild.microsoft.com/sessions/details/4d4887e9-f08d-4f88-99ac-41e5feb869e7","level":200,"session_location":"Online","starttime":"2021-05-26T08:30:00.0000000Z","duration":60,"time_and_duration":"Wednesday, May 26\n8:30 AM - 9:30 AM GMT”,“kusto_affinity”:“Focused”} |
… | … |
Term lookup in several clusters
The query finds all rows from all tables whose name starts with K
in all databases whose name start with B
and in which any column includes the word Kusto
.
The resulting records are transformed according to the output schema.
find in (cluster("cluster1").database("B*").K*, cluster("cluster2").database("C*".*))
where * has "Kusto"
Term lookup across all tables
The query finds all rows from all tables in which any column includes the word Kusto
.
The resulting records are transformed according to the output schema.
find "Kusto"
Examples of find
output results
The following examples show how find
can be used over two tables: EventsTable1 and EventsTable2.
Assume we have the next content of these two tables:
EventsTable1
Session_Id | Level | EventText | Version |
---|---|---|---|
acbd207d-51aa-4df7-bfa7-be70eb68f04e | Information | Some Text1 | v1.0.0 |
acbd207d-51aa-4df7-bfa7-be70eb68f04e | Error | Some Text2 | v1.0.0 |
28b8e46e-3c31-43cf-83cb-48921c3986fc | Error | Some Text3 | v1.0.1 |
8f057b11-3281-45c3-a856-05ebb18a3c59 | Information | Some Text4 | v1.1.0 |
EventsTable2
Session_Id | Level | EventText | EventName |
---|---|---|---|
f7d5f95f-f580-4ea6-830b-5776c8d64fdd | Information | Some Other Text1 | Event1 |
acbd207d-51aa-4df7-bfa7-be70eb68f04e | Information | Some Other Text2 | Event2 |
acbd207d-51aa-4df7-bfa7-be70eb68f04e | Error | Some Other Text3 | Event3 |
15eaeab5-8576-4b58-8fc6-478f75d8fee4 | Error | Some Other Text4 | Event4 |
Search in common columns, project common, and uncommon columns, and pack the rest
The query searches for specific records in EventsTable1 and EventsTable2 based on a given Session_Id and an Error Level. It then projects three specific columns: EventText, Version, and EventName, and packs all other remaining columns into a dynamic object.
find in (EventsTable1, EventsTable2)
where Session_Id == 'acbd207d-51aa-4df7-bfa7-be70eb68f04e' and Level == 'Error'
project EventText, Version, EventName, pack_all()
Output
source_ | EventText | Version | EventName | pack_ |
---|---|---|---|---|
EventsTable1 | Some Text2 | v1.0.0 | {“Session_Id”:“acbd207d-51aa-4df7-bfa7-be70eb68f04e”, “Level”:“Error”} | |
EventsTable2 | Some Other Text3 | Event3 | {“Session_Id”:“acbd207d-51aa-4df7-bfa7-be70eb68f04e”, “Level”:“Error”} |
Search in common and uncommon columns
The query searches for records that either have Version as ‘v1.0.0’ or EventName as ‘Event1’, and then it projects (selects) four specific columns: Session_Id, EventText, Version, and EventName from those filtered results.
find Version == 'v1.0.0' or EventName == 'Event1' project Session_Id, EventText, Version, EventName
Output
source_ | Session_Id | EventText | Version | EventName |
---|---|---|---|---|
EventsTable1 | acbd207d-51aa-4df7-bfa7-be70eb68f04e | Some Text1 | v1.0.0 | |
EventsTable1 | acbd207d-51aa-4df7-bfa7-be70eb68f04e | Some Text2 | v1.0.0 | |
EventsTable2 | f7d5f95f-f580-4ea6-830b-5776c8d64fdd | Some Other Text1 | Event1 |
Use abbreviated notation to search across all tables in the current database
This query searches the database for any records with a Session_Id that matches ‘acbd207d-51aa-4df7-bfa7-be70eb68f04e’. It retrieves records from all tables and columns that contain this specific Session_Id.
find Session_Id == 'acbd207d-51aa-4df7-bfa7-be70eb68f04e'
Output
source_ | Session_Id | Level | EventText | pack_ |
---|---|---|---|---|
EventsTable1 | acbd207d-51aa-4df7-bfa7-be70eb68f04e | Information | Some Text1 | {“Version”:“v1.0.0”} |
EventsTable1 | acbd207d-51aa-4df7-bfa7-be70eb68f04e | Error | Some Text2 | {“Version”:“v1.0.0”} |
EventsTable2 | acbd207d-51aa-4df7-bfa7-be70eb68f04e | Information | Some Other Text2 | {“EventName”:“Event2”} |
EventsTable2 | acbd207d-51aa-4df7-bfa7-be70eb68f04e | Error | Some Other Text3 | {“EventName”:“Event3”} |
Return the results from each row as a property bag
This query searches the database for records with the specified Session_Id and returns all columns of those records as a single dynamic object.
find Session_Id == 'acbd207d-51aa-4df7-bfa7-be70eb68f04e' project pack_all()
Output
source_ | pack_ |
---|---|
EventsTable1 | {“Session_Id”:“acbd207d-51aa-4df7-bfa7-be70eb68f04e”, “Level”:“Information”, “EventText”:“Some Text1”, “Version”:“v1.0.0”} |
EventsTable1 | {“Session_Id”:“acbd207d-51aa-4df7-bfa7-be70eb68f04e”, “Level”:“Error”, “EventText”:“Some Text2”, “Version”:“v1.0.0”} |
EventsTable2 | {“Session_Id”:“acbd207d-51aa-4df7-bfa7-be70eb68f04e”, “Level”:“Information”, “EventText”:“Some Other Text2”, “EventName”:“Event2”} |
EventsTable2 | {“Session_Id”:“acbd207d-51aa-4df7-bfa7-be70eb68f04e”, “Level”:“Error”, “EventText”:“Some Other Text3”, “EventName”:“Event3”} |
Examples of cases where find
acts as union
The find
operator in Kusto can sometimes act like a union
operator, mainly when it’s used to search across multiple tables.
Using a nontabular expression as find operand
The query first creates a view that filters EventsTable1 to only include error-level records. Then, it searches within this filtered view and the EventsTable2 table for records with a specific Session_Id.
let PartialEventsTable1 = view() { EventsTable1 | where Level == 'Error' };
find in (PartialEventsTable1, EventsTable2)
where Session_Id == 'acbd207d-51aa-4df7-bfa7-be70eb68f04e'
Referencing a column that appears in multiple tables and has multiple types
For this example, create two tables by running:
.create tables
Table1 (Level:string, Timestamp:datetime, ProcessId:string),
Table2 (Level:string, Timestamp:datetime, ProcessId:int64)
- The following query is executed as
union
.
find in (Table1, Table2) where ProcessId == 1001
The output result schema is (Level:string, Timestamp, ProcessId_string, ProcessId_int).
- The following query is executed as
union
, but produces a different result schema.
find in (Table1, Table2) where ProcessId == 1001 project Level, Timestamp, ProcessId:string
The output result schema is (Level:string, Timestamp, ProcessId_string)
14 - fork operator
Runs multiple consumer operators in parallel.
Syntax
T |
fork
[name=
](
subquery)
[name=
](
subquery)
…
Parameters
Name | Type | Required | Description |
---|---|---|---|
subquery | string | ✔️ | A downstream pipeline of supported query operators. |
name | string | A temporary name for the subquery result table. |
Supported query operators
as
count
extend
parse
where
take
project
project-away
project-keep
project-rename
project-reorder
summarize
top
top-nested
sort
mv-expand
reduce
Returns
Multiple result tables, one for each of the subquery arguments.
Tips
Use
materialize
as a replacement forjoin
orunion
on fork legs. The input stream is cached by materialize and then the cached expression can be used in join/union legs.Use batch with
materialize
of tabular expression statements instead of thefork
operator.
Examples
The examples output multiple tables, with named and umnamed columns.
Unnamed subqueries
StormEvents
| where State == "FLORIDA"
| fork
( where DeathsDirect + DeathsIndirect > 1)
( where InjuriesDirect + InjuriesIndirect > 1)
Output
This output shows the first few rows and columns of the result table.
GenericResult
StartTime | EndTime | EpisodeId | EventId | State | EventType | InjuriesDirect | InjuriesIndirect |
---|---|---|---|---|---|---|---|
2007-02-02T03:17:00Z | 2007-02-02T03:25:00Z | 3464 | 18948 | FLORIDA | Tornado | 10 | 0 |
2007-02-02T03:37:00Z | 2007-02-02T03:55:00Z | 3464 | 18950 | FLORIDA | Tornado | 9 | 0 |
2007-03-13T08:20:00Z | 2007-03-13T08:20:00Z | 4094 | 22961 | FLORIDA | Dense Fog | 3 | 0 |
2007-09-11T15:26:00Z | 2007-09-11T15:26:00Z | 9578 | 53798 | FLORIDA | Rip Current | 0 | 0 |
GenericResult
StartTime | EndTime | EpisodeId | EventId | State | EventType | InjuriesDirect | InjuriesIndirect |
---|---|---|---|---|---|---|---|
2007-02-02T03:10:00Z | 2007-02-02T03:16:00Z | 2545 | 17515 | FLORIDA | Tornado | 15 | 0 |
2007-02-02T03:17:00Z | 2007-02-02T03:25:00Z | 3464 | 18948 | FLORIDA | Tornado | 10 | 0 |
2007-02-02T03:37:00Z | 2007-02-02T03:55:00Z | 3464 | 18950 | FLORIDA | Tornado | 9 | 0 |
2007-02-02T03:55:00Z | 2007-02-02T04:10:00Z | 3464 | 20318 | FLORIDA | Tornado | 42 | 0 |
Named subqueries
In the following examples, the result table is named “StormsWithDeaths” and “StormsWithInjuries”.
StormEvents
| where State == "FLORIDA"
| fork
(where DeathsDirect + DeathsIndirect > 1 | as StormsWithDeaths)
(where InjuriesDirect + InjuriesIndirect > 1 | as StormsWithInjuries)
StormEvents
| where State == "FLORIDA"
| fork
StormsWithDeaths = (where DeathsDirect + DeathsIndirect > 1)
StormsWithInjuries = (where InjuriesDirect + InjuriesIndirect > 1)
Output
This output shows the first few rows and columns of the result table.
StormsWithDeaths
StartTime | EndTime | EpisodeId | EventId | State | EventType | InjuriesDirect | InjuriesIndirect |
---|---|---|---|---|---|---|---|
2007-02-02T03:17:00Z | 2007-02-02T03:25:00Z | 3464 | 18948 | FLORIDA | Tornado | 10 | 0 |
2007-02-02T03:37:00Z | 2007-02-02T03:55:00Z | 3464 | 18950 | FLORIDA | Tornado | 9 | 0 |
2007-03-13T08:20:00Z | 2007-03-13T08:20:00Z | 4094 | 22961 | FLORIDA | Dense Fog | 3 | 0 |
2007-09-11T15:26:00Z | 2007-09-11T15:26:00Z | 9578 | 53798 | FLORIDA | Rip Current | 0 | 0 |
StormsWithInjuries
StartTime | EndTime | EpisodeId | EventId | State | EventType | InjuriesDirect | InjuriesIndirect |
---|---|---|---|---|---|---|---|
2007-02-02T03:10:00Z | 2007-02-02T03:16:00Z | 2545 | 17515 | FLORIDA | Tornado | 15 | 0 |
2007-02-02T03:17:00Z | 2007-02-02T03:25:00Z | 3464 | 18948 | FLORIDA | Tornado | 10 | 0 |
2007-02-02T03:37:00Z | 2007-02-02T03:55:00Z | 3464 | 18950 | FLORIDA | Tornado | 9 | 0 |
2007-02-02T03:55:00Z | 2007-02-02T04:10:00Z | 3464 | 20318 | FLORIDA | Tornado | 42 | 0 |
SamplePowerRequirementHistorizedData
| fork
Dataset2 = (where twinId <> "p_sol_01" | summarize count() by twinId, name)
Dataset3 = (summarize count() by WeekOfYear = week_of_year(timestamp))
It is possible to use almost all the known features of the KQL language inside every single “sub” result set. For instance, the join operator inside a sub-statement does not work. This is not allowed by the engine.
15 - getschema operator
Produce a table that represents a tabular schema of the input.
Syntax
T |
getschema
Example
StormEvents
| getschema
Output
ColumnName | ColumnOrdinal | DataType | ColumnType |
---|---|---|---|
StartTime | 0 | System.DateTime | datetime |
EndTime | 1 | System.DateTime | datetime |
EpisodeId | 2 | System.Int32 | int |
EventId | 3 | System.Int32 | int |
State | 4 | System.String | string |
EventType | 5 | System.String | string |
InjuriesDirect | 6 | System.Int32 | int |
InjuriesIndirect | 7 | System.Int32 | int |
DeathsDirect | 8 | System.Int32 | int |
DeathsIndirect | 9 | System.Int32 | int |
DamageProperty | 10 | System.Int32 | int |
DamageCrops | 11 | System.Int32 | int |
Source | 12 | System.String | string |
BeginLocation | 13 | System.String | string |
EndLocation | 14 | System.String | string |
BeginLat | 15 | System.Double | real |
BeginLon | 16 | System.Double | real |
EndLat | 17 | System.Double | real |
EndLon | 18 | System.Double | real |
EpisodeNarrative | 19 | System.String | string |
EventNarrative | 20 | System.String | string |
StormSummary | 21 | System.Object | dynamic |
16 - invoke operator
invoke
as a tabular parameter argumentInvokes a lambda expression that receives the source of invoke
as a tabular argument.
Syntax
T | invoke
function(
[param1,
param2])
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular source. |
function | string | ✔️ | The name of the lambda let expression or stored function name to be evaluated. |
param1, param2 … | string | Any additional lambda arguments to pass to the function. |
Returns
Returns the result of the evaluated expression.
Example
This example shows how to use the invoke
operator to call lambda let
expression:
// clipped_average(): calculates percentiles limits, and then makes another
// pass over the data to calculate average with values inside the percentiles
let clipped_average = (T:(x: long), lowPercentile:double, upPercentile:double)
{
let high = toscalar(T | summarize percentiles(x, upPercentile));
let low = toscalar(T | summarize percentiles(x, lowPercentile));
T
| where x > low and x < high
| summarize avg(x)
};
range x from 1 to 100 step 1
| invoke clipped_average(5, 99)
Output
avg_x |
---|
52 |
17 - lookup operator
Extends the columns of a fact table with values looked-up in a dimension table.
For example, the following query results in a table that extends the FactTable
($left
) with data from the DimensionTable
($right
) by performing a lookup. The lookup matches each pair (CommonColumn
, Col1
) from FactTable
with each pair (CommonColumn
, Col2
) in the DimensionTable
. For the differences between fact and dimension tables, see fact and dimension tables.
FactTable | lookup kind=leftouter (DimensionTable) on CommonColumn, $left.Col1 == $right.Col2
The lookup
operator performs an operation similar to the join operator
with the following differences:
- The result doesn’t repeat columns from the
$right
table that are the basis for the join operation. - Only two kinds of lookup are supported,
leftouter
andinner
, withleftouter
being the default. - In terms of performance, the system by default assumes that the
$left
table is the larger (facts) table, and the$right
table is the smaller (dimensions) table. This is exactly opposite to the assumption used by thejoin
operator. - The
lookup
operator automatically broadcasts the$right
table to the$left
table (essentially, behaves as ifhint.broadcast
was specified). This limits the size of the$right
table.
Syntax
LeftTable |
lookup
[kind
=
(leftouter
|inner
)] (
RightTable)
on
Attributes
Parameters
Name | Type | Required | Description |
---|---|---|---|
LeftTable | string | ✔️ | The table or tabular expression that is the basis for the lookup. Denoted as $left . |
RightTable | string | ✔️ | The table or tabular expression that is used to “populate” new columns in the fact table. Denoted as $right . |
Attributes | string | ✔️ | A comma-delimited list of one or more rules that describe how rows from LeftTable are matched to rows from RightTable. Multiple rules are evaluated using the and logical operator. See Rules. |
kind | string | Determines how to treat rows in LeftTable that have no match in RightTable. By default, leftouter is used, which means all those rows appear in the output with null values used for the missing values of RightTable columns added by the operator. If inner is used, such rows are omitted from the output. Other kinds of join aren’t supported by the lookup operator. |
Rules
Rule kind | Syntax | Predicate |
---|---|---|
Equality by name | ColumnName | where LeftTable.ColumnName == RightTable.ColumnName |
Equality by value | $left. LeftColumn == $right. RightColumn | where $left. LeftColumn == $right. *RightColumn |
Returns
A table with:
- A column for every column in each of the two tables, including the matching keys. The columns of the right side are automatically renamed if there are name conflicts.
- A row for every match between the input tables. A match is a row selected from one table that has the same value for all the
on
fields as a row in the other table. - The Attributes (lookup keys) appear only once in the output table.
- If
kind
is unspecified orkind=leftouter
, then in addition to the inner matches, there’s a row for every row on the left (and/or right), even if it has no match. In that case, the unmatched output cells contain nulls. - If
kind=inner
, then there’s a row in the output for every combination of matching rows from left and right.
Example
The following example shows how to perform a left outer join between the FactTable
and DimTable
, based on matching values in the Personal
and Family
columns.
let FactTable=datatable(Row:string,Personal:string,Family:string) [
"1", "Rowan", "Murphy",
"2", "Ellis", "Turner",
"3", "Ellis", "Turner",
"4", "Maya", "Robinson",
"5", "Quinn", "Campbell"
];
let DimTable=datatable(Personal:string,Family:string,Alias:string) [
"Rowan", "Murphy", "rowanm",
"Ellis", "Turner", "ellist",
"Maya", "Robinson", "mayar",
"Quinn", "Campbell", "quinnc"
];
FactTable
| lookup kind=leftouter DimTable on Personal, Family
Output
Row | Personal | Family | Alias |
---|---|---|---|
1 | Rowan | Murphy | rowanm |
2 | Ellis | Turner | ellist |
3 | Ellis | Turner | ellist |
4 | Maya | Robinson | mayar |
5 | Quinn | Campbell | quinnc |
Related content
18 - mv-apply operator
Applies a subquery to each record, and returns the union of the results of all subqueries.
For example, assume a table T
has a column Metric
of type dynamic
whose values are arrays of real
numbers. The following query locates the
two biggest values in each Metric
value, and return the records corresponding
to these values.
T | mv-apply Metric to typeof(real) on
(
top 2 by Metric desc
)
The mv-apply
operator has the following
processing steps:
- Uses the
mv-expand
operator to expand each record in the input into subtables (order is preserved). - Applies the subquery for each of the subtables.
- Adds zero or more columns to the resulting subtable. These columns contain the values of the source columns that aren’t expanded, and are repeated where needed.
- Returns the union of the results.
The mv-apply
operator gets the following inputs:
One or more expressions that evaluate into dynamic arrays to expand. The number of records in each expanded subtable is the maximum length of each of those dynamic arrays. Null values are added where multiple expressions are specified and the corresponding arrays have different lengths.
Optionally, the names to assign the values of the expressions after expansion. These names become the columns names in the subtables. If not specified, the original name of the column is used when the expression is a column reference. A random name is used otherwise.
[!NOTE] It is recommended to use the default column names.
The data types of the elements of those dynamic arrays, after expansion. These become the column types of the columns in the subtables. If not specified,
dynamic
is used.Optionally, the name of a column to add to the subtables that specifies the 0-based index of the element in the array that resulted in the subtable record.
Optionally, the maximum number of array elements to expand.
The mv-apply
operator can be thought of as a generalization of the
mv-expand
operator (in fact, the latter can be implemented
by the former, if the subquery includes only projections.)
Syntax
T |
mv-apply
[ItemIndex] ColumnsToExpand [RowLimit] on
(
SubQuery )
Where ItemIndex has the syntax:
with_itemindex
=
IndexColumnName
ColumnsToExpand is a comma-separated list of one or more elements of the form:
[Name =
] ArrayExpression [to
typeof
(
Typename)
]
RowLimit is simply:
limit
RowLimit
and SubQuery has the same syntax of any query statement.
Parameters
Name | Type | Required | Description |
---|---|---|---|
ItemIndex | string | Indicates the name of a column of type long that’s appended to the input as part of the array-expansion phase and indicates the 0-based array index of the expanded value. | |
Name | string | The name to assign the array-expanded values of each array-expanded expression. If not specified, the name of the column is used if available. A random name is generated if ArrayExpression isn’t a simple column name. | |
ArrayExpression | dynamic | ✔️ | The array whose values are array-expanded. If the expression is the name of a column in the input, the input column is removed from the input and a new column of the same name, or ColumnName if specified, appears in the output. |
Typename | string | The name of the type that the individual elements of the dynamic array ArrayExpression take. Elements that don’t conform to this type are replaced by a null value. If unspecified, dynamic is used by default. | |
RowLimit | int | A limit on the number of records to generate from each record of the input. If unspecified, 2147483647 is used. | |
SubQuery | string | A tabular query expression with an implicit tabular source that gets applied to each array-expanded subtable. |
Examples
Review the examples and run them in your Data Explorer query page.
Getting the largest element from the array
The query outputs the smallest even number (2) and the smallest odd number (1).
let _data =
range x from 1 to 8 step 1
| summarize l=make_list(x) by xMod2 = x % 2;
_data
| mv-apply element=l to typeof(long) on
(
top 1 by element
)
Output
xMod2 | l | element |
---|---|---|
1 | [1, 3, 5, 7] | 7 |
0 | [2, 4, 6, 8] | 8 |
Calculating the sum of the largest two elements in an array
The query outputs the sum of the top 2 even numbers (6 + 8 = 14) and the sum of the top 2 odd numbers (5 + 7 = 12).
let _data =
range x from 1 to 8 step 1
| summarize l=make_list(x) by xMod2 = x % 2;
_data
| mv-apply l to typeof(long) on
(
top 2 by l
| summarize SumOfTop2=sum(l)
)
Output
xMod2 | l | SumOfTop2 |
---|---|---|
1 | [1,3,5,7] | 12 |
0 | [2,4,6,8] | 14 |
Select elements in arrays
The query identifies the top 2 elements from each dynamic array based on the Arr2 values and summarizes them into new lists.
datatable (Val:int, Arr1:dynamic, Arr2:dynamic)
[ 1, dynamic(['A1', 'A2', 'A3']), dynamic([10, 30, 7]),
7, dynamic(['B1', 'B2', 'B5']), dynamic([15, 11, 50]),
3, dynamic(['C1', 'C2', 'C3', 'C4']), dynamic([6, 40, 20, 8])
]
| mv-apply NewArr1=Arr1, NewArr2=Arr2 to typeof(long) on (
top 2 by NewArr2
| summarize NewArr1=make_list(NewArr1), NewArr2=make_list(NewArr2)
)
Output
Val1 | Arr1 | Arr2 | NewArr1 | NewArr2 |
---|---|---|---|---|
1 | [“A1”,“A2”,“A3”] | [10,30,7] | [“A2’,“A1”] | [30,10] |
7 | [“B1”,“B2”,“B5”] | [15,11,50] | [“B5”,“B1”] | [50,15] |
3 | [“C1”,“C2”,“C3”,“C4”] | [6,40,20,8] | [“C2”,“C3”] | [40,20] |
Using with_itemindex
for working with a subset of the array
The query results in a table with rows where the index is 3 or greater, including the index and element values from the original lists of even and odd numbers.
let _data =
range x from 1 to 10 step 1
| summarize l=make_list(x) by xMod2 = x % 2;
_data
| mv-apply with_itemindex=index element=l to typeof(long) on
(
// here you have 'index' column
where index >= 3
)
| project index, element
Output
index | element |
---|---|
3 | 7 |
4 | 9 |
3 | 8 |
4 | 10 |
Using mutiple columns to join element of 2 arrays
The query combines elements from two dynamic arrays into a new concatenated format and then summarizes them into lists.
datatable (Val: int, Arr1: dynamic, Arr2: dynamic)
[
1, dynamic(['A1', 'A2', 'A3']), dynamic(['B1', 'B2', 'B3']),
5, dynamic(['C1', 'C2']), dynamic(['D1', 'D2'])
]
| mv-apply Arr1, Arr2 on (
extend Out = strcat(Arr1, "_", Arr2)
| summarize Arr1 = make_list(Arr1), Arr2 = make_list(Arr2), Out= make_list(Out)
)
Output
Val | Arr1 | Arr2 | Out |
---|---|---|---|
1 | [“A1”,“A2”,“A3”] | [“B1”,“B2”,“B3”] | [“A1_B1”,“A2_B2”,“A3_B3”] |
5 | [“C1”,“C2”] | [“D1”,“D2”] | [“C1_D1”,“C2_D2”] |
Applying mv-apply to a property bag
This query dynamically removes properties from the packed values object based on the criteria that their values do not start with “555”. The final result contains the original columns with unwanted properties removed.
datatable(SourceNumber: string, TargetNumber: string, CharsCount: long)
[
'555-555-1234', '555-555-1212', 46,
'555-555-1212', '', int(null)
]
| extend values = pack_all()
| mv-apply removeProperties = values on
(
mv-expand kind = array values
| where values[1] !startswith "555"
| summarize propsToRemove = make_set(values[0])
)
| extend values = bag_remove_keys(values, propsToRemove)
| project-away propsToRemove
Output
SourceNumber | TargetNumber | CharsCount | values |
---|---|---|---|
555-555-1234 | 555-555-1212 | 46 | { “SourceNumber”: “555-555-1234”, “TargetNumber”: “555-555-1212” } |
555-555-1212 | { “SourceNumber”: “555-555-1212” } |
Related content
- mv-expand operator
19 - mv-expand operator
Expands multi-value dynamic arrays or property bags into multiple records.
mv-expand
can be described as the opposite of the aggregation operators
that pack multiple values into a single dynamic-typed
array or property bag, such as summarize
… make-list()
and make-series
.
Each element in the (scalar) array or property bag generates a new record in the
output of the operator. All columns of the input that aren’t expanded are duplicated to all the records in the output.
Syntax
T |mv-expand
[kind=
(bag
| array
)] [with_itemindex=
IndexColumnName] ColumnName [to typeof(
Typename)
] [,
ColumnName …] [limit
Rowlimit]
T |mv-expand
[kind=
(bag
| array
)] [Name =
] ArrayExpression [to typeof(
Typename)
] [,
[Name =
] ArrayExpression [to typeof(
Typename)
] …] [limit
Rowlimit]
Parameters
Name | Type | Required | Description |
---|---|---|---|
ColumnName, ArrayExpression | string | ✔️ | A column reference, or a scalar expression with a value of type dynamic that holds an array or a property bag. The individual top-level elements of the array or property bag get expanded into multiple records.When ArrayExpression is used and Name doesn’t equal any input column name, the expanded value is extended into a new column in the output. Otherwise, the existing ColumnName is replaced. |
Name | string | A name for the new column. | |
Typename | string | ✔️ | Indicates the underlying type of the array’s elements, which becomes the type of the column produced by the mv-expand operator. The operation of applying type is cast-only and doesn’t include parsing or type-conversion. Array elements that don’t conform with the declared type become null values. |
RowLimit | int | The maximum number of rows generated from each original row. The default is 2147483647. mvexpand is a legacy and obsolete form of the operator mv-expand . The legacy version has a default row limit of 128. | |
IndexColumnName | string | If with_itemindex is specified, the output includes another column named IndexColumnName that contains the index starting at 0 of the item in the original expanded collection. |
Returns
For each record in the input, the operator returns zero, one, or many records in the output, as determined in the following way:
Input columns that aren’t expanded appear in the output with their original value. If a single input record is expanded into multiple output records, the value is duplicated to all records.
For each ColumnName or ArrayExpression that is expanded, the number of output records is determined for each value as explained in modes of expansion. For each input record, the maximum number of output records is calculated. All arrays or property bags are expanded “in parallel” so that missing values (if any) are replaced by null values. Elements are expanded into rows in the order that they appear in the original array/bag.
If the dynamic value is null, then a single record is produced for that value (null). If the dynamic value is an empty array or property bag, no record is produced for that value. Otherwise, as many records are produced as there are elements in the dynamic value.
The expanded columns are of type dynamic
, unless they’re explicitly typed
by using the to typeof()
clause.
Modes of expansion
Two modes of property bag expansions are supported:
kind=bag
orbagexpansion=bag
: Property bags are expanded into single-entry property bags. This mode is the default mode.kind=array
orbagexpansion=array
: Property bags are expanded into two-element[
key,
value]
array structures, allowing uniform access to keys and values. This mode also allows, for example, running a distinct-count aggregation over property names.
Examples
The examples in this section show how to use the syntax to help you get started.
Single column - array expansion
datatable (a: int, b: dynamic)
[
1, dynamic([10, 20]),
2, dynamic(['a', 'b'])
]
| mv-expand b
Output
a | b |
---|---|
1 | 10 |
1 | 20 |
2 | a |
2 | b |
Single column - bag expansion
A simple expansion of a single column:
datatable (a: int, b: dynamic)
[
1, dynamic({"prop1": "a1", "prop2": "b1"}),
2, dynamic({"prop1": "a2", "prop2": "b2"})
]
| mv-expand b
Output
a | b |
---|---|
1 | {“prop1”: “a1”} |
1 | {“prop2”: “b1”} |
2 | {“prop1”: “a2”} |
2 | {“prop2”: “b2”} |
Single column - bag expansion to key-value pairs
A simple bag expansion to key-value pairs:
datatable (a: int, b: dynamic)
[
1, dynamic({"prop1": "a1", "prop2": "b1"}),
2, dynamic({"prop1": "a2", "prop2": "b2"})
]
| mv-expand kind=array b
| extend key = b[0], val=b[1]
Output
a | b | key | val |
---|---|---|---|
1 | [“prop1”,“a1”] | prop1 | a1 |
1 | [“prop2”,“b1”] | prop2 | b1 |
2 | [“prop1”,“a2”] | prop1 | a2 |
2 | [“prop2”,“b2”] | prop2 | b2 |
Zipped two columns
Expanding two columns will first ‘zip’ the applicable columns and then expand them:
datatable (a: int, b: dynamic, c: dynamic)[
1, dynamic({"prop1": "a", "prop2": "b"}), dynamic([5, 4, 3])
]
| mv-expand b, c
Output
a | b | c |
---|---|---|
1 | {“prop1”:“a”} | 5 |
1 | {“prop2”:“b”} | 4 |
1 | 3 |
Cartesian product of two columns
If you want to get a Cartesian product of expanding two columns, expand one after the other:
datatable (a: int, b: dynamic, c: dynamic)
[
1, dynamic({"prop1": "a", "prop2": "b"}), dynamic([5, 6])
]
| mv-expand b
| mv-expand c
Output
a | b | c |
---|---|---|
1 | { “prop1”: “a”} | 5 |
1 | { “prop1”: “a”} | 6 |
1 | { “prop2”: “b”} | 5 |
1 | { “prop2”: “b”} | 6 |
Convert output
To force the output of an mv-expand to a certain type (default is dynamic), use to typeof
:
datatable (a: string, b: dynamic, c: dynamic)[
"Constant", dynamic([1, 2, 3, 4]), dynamic([6, 7, 8, 9])
]
| mv-expand b, c to typeof(int)
| getschema
Output
ColumnName | ColumnOrdinal | DateType | ColumnType |
---|---|---|---|
a | 0 | System.String | string |
b | 1 | System.Object | dynamic |
c | 2 | System.Int32 | int |
Notice column b
is returned as dynamic
while c
is returned as int
.
Using with_itemindex
Expansion of an array with with_itemindex
:
range x from 1 to 4 step 1
| summarize x = make_list(x)
| mv-expand with_itemindex=Index x
Output
x | Index |
---|---|
1 | 0 |
2 | 1 |
3 | 2 |
4 | 3 |
Related content
- mv-apply operator.
- For the opposite of the mv-expand operator, see summarize make_list().
- For expanding dynamic JSON objects into columns using property bag keys, see bag_unpack() plugin.
20 - parse operator
Evaluates a string expression and parses its value into one or more calculated columns. The calculated columns return null
values for unsuccessfully parsed strings. If there’s no need to use rows where parsing doesn’t succeed, prefer using the parse-where operator.
Syntax
T | parse
[ kind=
kind [ flags=
regexFlags ]] expression with
[ *
] stringConstant columnName [:
columnType] [ *
] ,
…
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input to parse. |
kind | string | ✔️ | One of the supported kind values. The default value is simple . |
regexFlags | string | If kind is regex , then you can specify regex flags to be used like U for ungreedy, m for multi-line mode, s for match new line \n , and i for case-insensitive. More flags can be found in Flags. | |
expression | string | ✔️ | An expression that evaluates to a string. |
stringConstant | string | ✔️ | A string constant for which to search and parse. |
columnName | string | ✔️ | The name of a column to assign a value to, extracted from the string expression. |
columnType | string | The scalar value that indicates the type to convert the value to. The default is string . |
Supported kind
values
Text | Description |
---|---|
simple | This is the default value. stringConstant is a regular string value and the match is strict. All string delimiters should appear in the parsed string, and all extended columns must match the required types. |
regex | stringConstant can be a regular expression and the match is strict. All string delimiters, which can be a regex for this mode, should appear in the parsed string, and all extended columns must match the required types. |
relaxed | stringConstant is a regular string value and the match is relaxed. All string delimiters should appear in the parsed string, but extended columns might partially match the required types. Extended columns that didn’t match the required types get the value null . |
Regex mode
In regex mode, parse translates the pattern to a regex. Use regular expressions to do the matching and use numbered captured groups that are handled internally. For example:
parse kind=regex Col with * <regex1> var1:string <regex2> var2:long
In the parse statement, the regex internally generated by the parse is .*?<regex1>(.*?)<regex2>(\-\d+)
.
*
was translated to.*?
.string
was translated to.*?
.long
was translated to\-\d+
.
Returns
The input table extended according to the list of columns that are provided to the operator.
Examples
The examples in this section show how to use the syntax to help you get started.
The parse
operator provides a streamlined way to extend
a table by using multiple extract
applications on the same string
expression. This result is useful, when the table has a string
column that contains several values that you want to break into individual columns. For example, a column that’s produced by a developer trace ("printf
"/"Console.WriteLine
") statement.
Parse and extend results
In the following example, the column EventText
of table Traces
contains
strings of the form Event: NotifySliceRelease (resourceName={0}, totalSlices={1}, sliceNumber={2}, lockTime={3}, releaseTime={4}, previousLockTime={5})
.
The operation extends the table with six columns: resourceName
, totalSlices
, sliceNumber
, lockTime
, releaseTime
, and previousLockTime
.
let Traces = datatable(EventText: string)
[
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=23, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=02/17/2016 08:40:00, previousLockTime=02/17/2016 08:39:00)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces
| parse EventText with * "resourceName=" resourceName ", totalSlices=" totalSlices: long * "sliceNumber=" sliceNumber: long * "lockTime=" lockTime ", releaseTime=" releaseTime: date "," * "previousLockTime=" previousLockTime: date ")" *
| project resourceName, totalSlices, sliceNumber, lockTime, releaseTime, previousLockTime
Output
resourceName | totalSlices | sliceNumber | lockTime | releaseTime | previousLockTime |
---|---|---|---|---|---|
PipelineScheduler | 27 | 15 | 02/17/2016 08:40:00 | 2016-02-17 08:40:00.0000000 | 2016-02-17 08:39:00.0000000 |
PipelineScheduler | 27 | 23 | 02/17/2016 08:40:01 | 2016-02-17 08:40:01.0000000 | 2016-02-17 08:39:01.0000000 |
PipelineScheduler | 27 | 20 | 02/17/2016 08:40:01 | 2016-02-17 08:40:01.0000000 | 2016-02-17 08:39:01.0000000 |
PipelineScheduler | 27 | 16 | 02/17/2016 08:41:00 | 2016-02-17 08:41:00.0000000 | 2016-02-17 08:40:00.0000000 |
PipelineScheduler | 27 | 22 | 02/17/2016 08:41:01 | 2016-02-17 08:41:00.0000000 | 2016-02-17 08:40:01.0000000 |
Extract email alias and DNS
In the following example, entries from the Contacts table are parsed to extract the alias and domain from an email address, and the domain from a website URL. The query returns the EmailAddress
, EmailAlias
, and WebsiteDomain
columns, where the fullEmail
column combines the parsed email aliases and domains.
let Leads=datatable(Contacts: string)
[
"Event: LeadContact (email=john@contosohotel.com, Website=https:contosohotel.com)",
"Event: LeadContact (email=abi@fourthcoffee.com, Website=https:www.fourthcoffee.com)",
"Event: LeadContact (email=nevena@treyresearch.com, Website=https:treyresearch.com)",
"Event: LeadContact (email=faruk@tailspintoys.com, Website=https:tailspintoys.com)",
"Event: LeadContact (email=ebere@relecloud.com, Website=https:relecloud.com)",
];
Leads
| parse Contacts with * "email=" alias:string "@" domain: string ", Website=https:" WebsiteDomain: string ")"
| project EmailAddress=strcat(alias, "@", domain), EmailAlias=alias, WebsiteDomain
Output
EmailAddress | EmailAlias | WebsiteDomain |
---|---|---|
nevena@treyresearch.com | nevena | treyresearch.com |
john@contosohotel.com | john | contosohotel.com |
faruk@tailspintoys.com | faruk | tailspintoys.com |
ebere@relecloud.com | ebere | relecloud.com |
abi@fourthcoffee.com | abi | www.fourthcoffee.com |
Regex mode
In the following example, regular expressions are used to parse and extract data from the EventText
column. The extracted data is projected into new fields.
let Traces=datatable(EventText: string)
[
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=23, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=02/17/2016 08:40:00, previousLockTime=02/17/2016 08:39:00)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces
| parse kind=regex EventText with "(.*?)[a-zA-Z]*=" resourceName @", totalSlices=\s*\d+\s*.*?sliceNumber=" sliceNumber: long ".*?(previous)?lockTime=" lockTime ".*?releaseTime=" releaseTime ".*?previousLockTime=" previousLockTime: date "\\)"
| project resourceName, sliceNumber, lockTime, releaseTime, previousLockTime
Output
resourceName | sliceNumber | lockTime | releaseTime | previousLockTime |
---|---|---|---|---|
PipelineScheduler | 15 | 02/17/2016 08:40:00, | 02/17/2016 08:40:00, | 2016-02-17 08:39:00.0000000 |
PipelineScheduler | 23 | 02/17/2016 08:40:01, | 02/17/2016 08:40:01, | 2016-02-17 08:39:01.0000000 |
PipelineScheduler | 20 | 02/17/2016 08:40:01, | 02/17/2016 08:40:01, | 2016-02-17 08:39:01.0000000 |
PipelineScheduler | 16 | 02/17/2016 08:41:00, | 02/17/2016 08:41:00, | 2016-02-17 08:40:00.0000000 |
PipelineScheduler | 22 | 02/17/2016 08:41:01, | 02/17/2016 08:41:00, | 2016-02-17 08:40:01.0000000 |
Regex mode with regex flags
In the following example resourceName
is extracted.
let Traces=datatable(EventText: string)
[
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=23, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=02/17/2016 08:40:00, previousLockTime=02/17/2016 08:39:00)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces
| parse kind=regex EventText with * "resourceName=" resourceName ',' *
| project resourceName
Output
resourceName |
---|
PipelineScheduler, totalSlices=27, sliceNumber=23, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01 |
PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=02/17/2016 08:40:00 |
PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01 |
PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00 |
PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00 |
If there are records where resourceName
sometimes appears as lower-case and sometimes as upper-case, you might get nulls for some values.
The results in the previous example are unexpected, and include full event data since the default mode is greedy.
To extract only resourceName
, run the previous query with the non-greedy U
, and disable case-sensitive i
regex flags.
let Traces=datatable(EventText: string)
[
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=23, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=02/17/2016 08:40:00, previousLockTime=02/17/2016 08:39:00)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces
| parse kind=regex flags=Ui EventText with * "RESOURCENAME=" resourceName ',' *
| project resourceName
Output
resourceName |
---|
PipelineScheduler |
PipelineScheduler |
PipelineScheduler |
PipelineScheduler |
PipelineScheduler |
If the parsed string has newlines, use the flag s
, to parse the text.
let Traces=datatable(EventText: string)
[
"Event: NotifySliceRelease (resourceName=PipelineScheduler\ntotalSlices=27\nsliceNumber=23\nlockTime=02/17/2016 08:40:01\nreleaseTime=02/17/2016 08:40:01\npreviousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler\ntotalSlices=27\nsliceNumber=15\nlockTime=02/17/2016 08:40:00\nreleaseTime=02/17/2016 08:40:00\npreviousLockTime=02/17/2016 08:39:00)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler\ntotalSlices=27\nsliceNumber=20\nlockTime=02/17/2016 08:40:01\nreleaseTime=02/17/2016 08:40:01\npreviousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler\ntotalSlices=27\nsliceNumber=22\nlockTime=02/17/2016 08:41:01\nreleaseTime=02/17/2016 08:41:00\npreviousLockTime=02/17/2016 08:40:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler\ntotalSlices=27\nsliceNumber=16\nlockTime=02/17/2016 08:41:00\nreleaseTime=02/17/2016 08:41:00\npreviousLockTime=02/17/2016 08:40:00)"
];
Traces
| parse kind=regex flags=s EventText with * "resourceName=" resourceName: string "(.*?)totalSlices=" totalSlices: long "(.*?)lockTime=" lockTime: datetime "(.*?)releaseTime=" releaseTime: datetime "(.*?)previousLockTime=" previousLockTime: datetime "\\)"
| project-away EventText
Output
resourceName | totalSlices | lockTime | releaseTime | previousLockTime |
---|---|---|---|---|
PipelineScheduler | 27 | 2016-02-17 08:40:00.0000000 | 2016-02-17 08:40:00.0000000 | 2016-02-17 08:39:00.0000000 |
PipelineScheduler | 27 | 2016-02-17 08:40:01.0000000 | 2016-02-17 08:40:01.0000000 | 2016-02-17 08:39:01.0000000 |
PipelineScheduler | 27 | 2016-02-17 08:40:01.0000000 | 2016-02-17 08:40:01.0000000 | 2016-02-17 08:39:01.0000000 |
PipelineScheduler | 27 | 2016-02-17 08:41:00.0000000 | 2016-02-17 08:41:00.0000000 | 2016-02-17 08:40:00.0000000 |
PipelineScheduler | 27 | 2016-02-17 08:41:01.0000000 | 2016-02-17 08:41:00.0000000 | 2016-02-17 08:40:01.0000000 |
Relaxed mode
In the following relaxed mode example, the extended column totalSlices
must be of type long
. However, in the parsed string, it has the value nonValidLongValue
.
For the extended column, releaseTime
, the value nonValidDateTime
can’t be parsed as datetime
.
These two extended columns result in null
values while the other columns, such as sliceNumber
, still result in the correct values.
If you use option kind = simple
for the following query, you get null
results for all extended columns. This option is strict on extended columns, and is the difference between relaxed and simple mode.
let Traces=datatable(EventText: string)
[
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=23, lockTime=02/17/2016 08:40:01, releaseTime=nonValidDateTime 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=nonValidDateTime, previousLockTime=02/17/2016 08:39:00)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=nonValidLongValue, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=nonValidDateTime 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=nonValidLongValue, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces
| parse kind=relaxed EventText with * "resourceName=" resourceName ", totalSlices=" totalSlices: long ", sliceNumber=" sliceNumber: long * "lockTime=" lockTime ", releaseTime=" releaseTime: date "," * "previousLockTime=" previousLockTime: date ")" *
| project-away EventText
Output
resourceName | totalSlices | sliceNumber | lockTime | releaseTime | previousLockTime |
---|---|---|---|---|---|
PipelineScheduler | 27 | 15 | 02/17/2016 08:40:00 | 2016-02-17 08:39:00.0000000 | |
PipelineScheduler | 27 | 23 | 02/17/2016 08:40:01 | 2016-02-17 08:39:01.0000000 | |
PipelineScheduler | 20 | 02/17/2016 08:40:01 | 2016-02-17 08:39:01.0000000 | ||
PipelineScheduler | 16 | 02/17/2016 08:41:00 | 2016-02-17 08:41:00.0000000 | 2016-02-17 08:40:00.0000000 | |
PipelineScheduler | 27 | 22 | 02/17/2016 08:41:01 | 2016-02-17 08:41:00.0000000 | 2016-02-17 08:40:01.0000000 |
Related content
21 - parse-kv operator
Extracts structured information from a string expression and represents the information in a key/value form.
The following extraction modes are supported:
- Specified delimiter: Extraction based on specified delimiters that dictate how keys/values and pairs are separated from each other.
- Non-specified delimiter: Extraction with no need to specify delimiters. Any nonalphanumeric character is considered a delimiter.
- Regex: Extraction based on regular expressions.
Syntax
Specified delimiter
T |
parse-kv
Expression as
(
KeysList )
with
(
pair_delimiter
=
PairDelimiter ,
kv_delimiter
=
KvDelimiter [,
quote
=
QuoteChars … [,
escape
=
EscapeChar …]] [,
greedy
=
true
] )
Nonspecified delimiter
T |
parse-kv
Expression as
(
KeysList )
with
(
[quote
=
QuoteChars … [,
escape
=
EscapeChar …]] )
Regex
T |
parse-kv
Expression as
(
KeysList )
with
(
regex
=
RegexPattern)
)
Parameters
Name | Type | Required | Description |
---|---|---|---|
Expression | string | ✔️ | The expression from which to extract key values. |
KeysList | string | ✔️ | A comma-separated list of key names and their value data types. The order of the keys doesn’t have to match the order in which they appear in the text. |
PairDelimiter | string | A delimiter that separates key value pairs from each other. | |
KvDelimiter | string | A delimiter that separates keys from values. | |
QuoteChars | string | A one- or two-character string literal representing opening and closing quotes that key name or the extracted value may be wrapped with. The parameter can be repeated to specify a separate set of opening/closing quotes. | |
EscapeChar | string | A one-character string literal describing a character that may be used for escaping special characters in a quoted value. The parameter can be repeated if multiple escape characters are used. | |
RegexPattern | string | A regular expression containing two capturing groups exactly. The first group represents the key name, and the second group represents the key value. |
Returns
The original input tabular expression T, extended with columns per specified keys to extract.
Examples
The examples in this section show how to use the syntax to help you get started.
Extraction with well-defined delimiters
In this query, keys and values are separated by well defined delimiters. These delimeters are comma and colon characters.
print str="ThreadId:458745723, Machine:Node001, Text: The service is up, Level: Info"
| parse-kv str as (Text: string, ThreadId:long, Machine: string) with (pair_delimiter=',', kv_delimiter=':')
| project-away str
Output
Text | ThreadId | Machine |
---|---|---|
The service is up | 458745723 | Node001 |
Extraction with value quoting
Sometimes key names or values are wrapped in quotes, which allow the values themselves to contain delimiter characters. The following examples show how a quote
argument is used for extracting such values.
print str='src=10.1.1.123 dst=10.1.1.124 bytes=125 failure="connection aborted" "event time"=2021-01-01T10:00:54'
| parse-kv str as (['event time']:datetime, src:string, dst:string, bytes:long, failure:string) with (pair_delimiter=' ', kv_delimiter='=', quote='"')
| project-away str
Output
event time | src | dst | bytes | failure |
---|---|---|---|---|
2021-01-01 10:00:54.0000000 | 10.1.1.123 | 10.1.1.124 | 125 | connection aborted |
This query uses different opening and closing quotes:
print str='src=10.1.1.123 dst=10.1.1.124 bytes=125 failure=(connection aborted) (event time)=(2021-01-01 10:00:54)'
| parse-kv str as (['event time']:datetime, src:string, dst:string, bytes:long, failure:string) with (pair_delimiter=' ', kv_delimiter='=', quote='()')
| project-away str
Output
event time | src | dst | bytes | failure |
---|---|---|---|---|
2021-01-01 10:00:54.0000000 | 10.1.1.123 | 10.1.1.124 | 125 | connection aborted |
The values themselves may contain properly escaped quote characters, as the following example shows:
print str='src=10.1.1.123 dst=10.1.1.124 bytes=125 failure="the remote host sent \\"bye!\\"" time=2021-01-01T10:00:54'
| parse-kv str as (['time']:datetime, src:string, dst:string, bytes:long, failure:string) with (pair_delimiter=' ', kv_delimiter='=', quote='"', escape='\\')
| project-away str
Output
time | src | dst | bytes | failure |
---|---|---|---|---|
2021-01-01 10:00:54.0000000 | 10.1.1.123 | 10.1.1.124 | 125 | the remote host sent “bye!” |
Extraction in greedy mode
There are cases when unquoted values may contain pair delimiters. In this case, use the greedy
mode to indicate to the operator to scan until the next key appearance (or end of string) when looking for the value ending.
The following examples compare how the operator works with and without the greedy
mode specified:
print str='name=John Doe phone=555 5555 city=New York'
| parse-kv str as (name:string, phone:string, city:string) with (pair_delimiter=' ', kv_delimiter='=')
| project-away str
Output
name | phone | city |
---|---|---|
John | 555 | New |
print str='name=John Doe phone=555 5555 city=New York'
| parse-kv str as (name:string, phone:string, city:string) with (pair_delimiter=' ', kv_delimiter='=', greedy=true)
| project-away str
Output
name | phone | city |
---|---|---|
John Doe | 555 5555 | New York |
Extraction with no well-defined delimiters
In the following example, any nonalphanumeric character is considered a valid delimiter:
print str="2021-01-01T10:00:34 [INFO] ThreadId:458745723, Machine:Node001, Text: Started"
| parse-kv str as (Text: string, ThreadId:long, Machine: string)
| project-away str
Output
Text | ThreadId | Machine |
---|---|---|
Started | 458745723 | Node001 |
Values quoting and escaping is allowed in this mode as shown in the following example:
print str="2021-01-01T10:00:34 [INFO] ThreadId:458745723, Machine:Node001, Text: 'The service \\' is up'"
| parse-kv str as (Text: string, ThreadId:long, Machine: string) with (quote="'", escape='\\')
| project-away str
Output
Text | ThreadId | Machine |
---|---|---|
The service ’ is up | 458745723 | Node001 |
Extraction using regex
When no delimiters define text structure enough, regular expression-based extraction can be useful.
print str=@'["referer url: https://hostname.com/redirect?dest=/?h=1234", "request url: https://hostname.com/?h=1234", "advertiser id: 24fefbca-cf27-4d62-a623-249c2ad30c73"]'
| parse-kv str as (['referer url']:string, ['request url']:string, ['advertiser id']: guid) with (regex=@'"([\w ]+)\s*:\s*([^"]*)"')
| project-away str
Output
referer url | request url | advertiser id |
---|---|---|
https://hostname.com/redirect?dest=/?h=1234 | https://hostname.com/?h=1234 | 24fefbca-cf27-4d62-a623-249c2ad30c73 |
22 - parse-where operator
Evaluates a string expression, and parses its value into one or more calculated columns. The result is only the successfully parsed strings.
parse-where
parses the strings in the same way as parse, and filters out strings that were not parsed successfully.
See parse operator, which produces nulls for unsuccessfully parsed strings.
Syntax
T | parse-where
[kind=
kind [flags=
regexFlags]] expression with
*
(stringConstant columnName [:
columnType]) *
…
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input to parse. |
kind | string | ✔️ | One of the supported kind values. The default value is simple . |
regexFlags | string | If kind is regex , then you can specify regex flags to be used like U for ungreedy, m for multi-line mode, s for match new line \n , and i for case-insensitive. More flags can be found in Flags. | |
expression | string | ✔️ | An expression that evaluates to a string. |
stringConstant | string | ✔️ | A string constant for which to search and parse. |
columnName | string | ✔️ | The name of a column to assign a value to, extracted from the string expression. |
columnType | string | The scalar value that indicates the type to convert the value to. The default is the string . |
Supported kind values
Text | Description |
---|---|
simple | This is the default value. stringConstant is a regular string value and the match is strict. All string delimiters should appear in the parsed string, and all extended columns must match the required types. |
regex | stringConstant may be a regular expression and the match is strict. All string delimiters, which can be a regex for this mode, should appear in the parsed string, and all extended columns must match the required types. |
Regex mode
In regex mode, parse will translate the pattern to a regex and use regular expressions in order to do the matching using numbered captured groups that are handled internally. For example:
parse-where kind=regex Col with * <regex1> var1:string <regex2> var2:long
The regex that will be generated by the parse internally is .*?<regex1>(.*?)<regex2>(\-\d+)
.
*
was translated to.*?
.string
was translated to.*?
.long
was translated to\-\d+
.
Returns
The input table, which is extended according to the list of columns that are provided to the operator.
Examples
The examples in this section show how to use the syntax to help you get started.
The parse-where
operator provides a streamlined way to extend
a table by using multiple extract
applications on the same string
expression. This is most useful when the table has a string
column that contains several values that you want to break into individual columns. For example, you can break up a column that was produced by a developer trace ("printf
"/"Console.WriteLine
") statement.
Using parse
In the example below, the column EventText
of table Traces
contains strings of the form Event: NotifySliceRelease (resourceName={0}, totalSlices= {1}, sliceNumber={2}, lockTime={3}, releaseTime={4}, previousLockTime={5})
. The operation below will extend the table with six columns: resourceName
, totalSlices
, sliceNumber
, lockTime
, releaseTime
, previousLockTime
, Month
, and Day
.
A few of the strings don’t have a full match.
Using parse
, the calculated columns will have nulls.
let Traces = datatable(EventText: string)
[
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=invalid_number, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=invalid_datetime, previousLockTime=02/17/2016 08:39:00)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=invalid_number, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces
| parse EventText with * "resourceName=" resourceName ", totalSlices=" totalSlices: long * "sliceNumber=" sliceNumber: long * "lockTime=" lockTime ", releaseTime=" releaseTime: date "," * "previousLockTime=" previouLockTime: date ")" *
| project
resourceName,
totalSlices,
sliceNumber,
lockTime,
releaseTime,
previouLockTime
Output
resourceName | totalSlices | sliceNumber | lockTime | releaseTime | previousLockTime |
---|---|---|---|---|---|
PipelineScheduler | 27 | 20 | 02/17/2016 08:40:01 | 2016-02-17 08:40:01.0000000 | 2016-02-17 08:39:01.0000000 |
PipelineScheduler | 27 | 22 | 02/17/2016 08:41:01 | 2016-02-17 08:41:00.0000000 | 2016-02-17 08:40:01.0000000 |
Using parse-where
Using ‘parse-where’ will filter-out unsuccessfully parsed strings from the result.
let Traces = datatable(EventText: string)
[
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=invalid_number, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=invalid_datetime, previousLockTime=02/17/2016 08:39:00)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=invalid_number, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces
| parse-where EventText with * "resourceName=" resourceName ", totalSlices=" totalSlices: long * "sliceNumber=" sliceNumber: long * "lockTime=" lockTime ", releaseTime=" releaseTime: date "," * "previousLockTime=" previousLockTime: date ")" *
| project
resourceName,
totalSlices,
sliceNumber,
lockTime,
releaseTime,
previousLockTime
Output
resourceName | totalSlices | sliceNumber | lockTime | releaseTime | previousLockTime |
---|---|---|---|---|---|
PipelineScheduler | 27 | 20 | 02/17/2016 08:40:01 | 2016-02-17 08:40:01.0000000 | 2016-02-17 08:39:01.0000000 |
PipelineScheduler | 27 | 22 | 02/17/2016 08:41:01 | 2016-02-17 08:41:00.0000000 | 2016-02-17 08:40:01.0000000 |
Regex mode using regex flags
To get the resourceName and totalSlices, use the following query:
let Traces = datatable(EventText: string)
[
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=non_valid_integer, sliceNumber=11, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=02/17/2016 08:40:00, previousLockTime=02/17/2016 08:39:00)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=non_valid_integer, sliceNumber=44, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces
| parse-where kind = regex EventText with * "RESOURCENAME=" resourceName "," * "totalSlices=" totalSlices: long "," *
| project resourceName, totalSlices
Output
resourceName | totalSlices |
---|---|
parse-where
with case-insensitive regex flag
In the above query, the default mode was case-sensitive, so the strings were parsed successfully. No result was obtained.
To get the required result, run parse-where
with a case-insensitive (i
) regex flag.
Only three strings will be parsed successfully, so the result is three records (some totalSlices hold invalid integers).
let Traces = datatable(EventText: string)
[
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=non_valid_integer, sliceNumber=11, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=02/17/2016 08:40:00, previousLockTime=02/17/2016 08:39:00)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=non_valid_integer, sliceNumber=44, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
"Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces
| parse-where kind = regex flags=i EventText with * "RESOURCENAME=" resourceName "," * "totalSlices=" totalSlices: long "," *
| project resourceName, totalSlices
Output
resourceName | totalSlices |
---|---|
PipelineScheduler | 27 |
PipelineScheduler | 27 |
PipelineScheduler | 27 |
23 - partition operator
The partition operator partitions the records of its input table into multiple subtables according to values in a key column. The operator runs a subquery on each subtable, and produces a single output table that is the union of the results of all subqueries.
The partition operator is useful when you need to perform a subquery only on a subset of rows that belong to the same partition key, and not a query of the whole dataset. These subqueries could include aggregate functions, window functions, top N and others.
The partition operator supports several strategies of subquery operation:
- Native - use with an implicit data source with thousands of key partition values.
- Shuffle - use with an implicit source with millions of key partition values.
- Legacy - use with an implicit or explicit source for 64 or less key partition values.
Syntax
T |
partition
[ hint.strategy=
Strategy ] [ Hints ] by
Column (
TransformationSubQuery )
T |
partition
[ hint.strategy=legacy
] [ Hints ] by
Column {
SubQueryWithSource }
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input tabular source. |
Strategy | string | The value legacy , shuffle , or native . This hint defines the execution strategy of the partition operator.If no strategy is specified, the legacy strategy is used. For more information, see Strategies. | |
Column | string | ✔️ | The name of a column in T whose values determine how to partition the input tabular source. |
TransformationSubQuery | string | ✔️ | A tabular transformation expression. The source is implicitly the subtables produced by partitioning the records of T. Each subtable is homogenous on the value of Column.The expression must provide only one tabular result and shouldn’t have other types of statements, such as let statements. |
SubQueryWithSource | string | ✔️ | A tabular expression that includes its own tabular source, such as a table reference. This syntax is only supported with the legacy strategy. The subquery can only reference the key column, Column, from T. To reference the column, use the syntax toscalar( Column) .The expression must provide only one tabular result and shouldn’t have other types of statements, such as let statements. |
Hints | string | Zero or more space-separated parameters in the form of: HintName = Value that control the behavior of the operator. See the supported hints per strategy type. |
Supported hints
Hint name | Type | Strategy | Description |
---|---|---|---|
hint.shufflekey | string | shuffle | The partition key used to run the partition operator with the shuffle strategy. |
hint.materialized | bool | legacy | If set to true , materializes the source of the partition operator. The default value is false . |
hint.concurrency | int | legacy | Determines how many partitions to run in parallel. The default value is 16 . |
hint.spread | int | legacy | Determines how to distribute the partitions among cluster nodes. The default value is 1 .For example, if there are N partitions and the spread hint is set to P, then the N partitions are processed by P different cluster nodes equally, in parallel/sequentially depending on the concurrency hint. |
Returns
The operator returns a union of the results of the individual subqueries.
Strategies
The partition operator supports several strategies of subquery operation: native, shuffle, and legacy.
Native strategy
This strategy should be applied when the number of distinct values of the partition key isn’t large, roughly in the thousands.
The subquery must be a tabular transformation that doesn’t specify a tabular source. The source is implicit and is assigned according to the subtable partitions. Only certain supported operators can be used in the subquery. There’s no restriction on the number of partitions.
To use this strategy, specify hint.strategy=native
.
Shuffle strategy
This strategy should be applied when the number of distinct values of the partition key is large, in the millions.
The subquery must be a tabular transformation that doesn’t specify a tabular source. The source is implicit and is assigned according to the subtable partitions. Only certain supported operators can be used in the subquery. There’s no restriction on the number of partitions.
To use this strategy, specify hint.strategy=shuffle
. For more information about shuffle strategy and performance, see shuffle query.
Supported operators for the native and shuffle strategies
The following list of operators can be used in subqueries with the native or shuffle strategies:
- count
- distinct
- extend
- make-series
- mv-apply
- mv-expand
- parse
- parse-where
- project
- project-away
- project-keep
- project-rename
- project-reorder
- reduce
- sample
- sample-distinct
- scan
- search
- serialize
- sort
- summarize
- take
- top
- top-hitters
- top-nested
- where
Legacy strategy
For historical reasons, the legacy
strategy is the default strategy. However, we recommend favoring the native or shuffle strategies, as the legacy
approach is limited to 64 partitions and is less efficient.
In some scenarios, the legacy
strategy might be necessary due to its support for including a tabular source in the subquery. In such cases, the subquery can only reference the key column, Column, from the input tabular source, T. To reference the column, use the syntax toscalar(
Column)
.
If the subquery is a tabular transformation without a tabular source, the source is implicit and is based on the subtable partitions.
To use this strategy, specify hint.strategy=legacy
or omit any other strategy indication.
Examples
The examples in this section show how to use the syntax to help you get started.
Find top values
In some cases, it’s more performant and easier to write a query using the partition
operator than using the top-nested
operator. The following query runs a subquery calculating summarize
and top
for each State
starting with W
: “WYOMING”, “WASHINGTON”, “WEST VIRGINIA”, and “WISCONSIN”.
StormEvents
| where State startswith 'W'
| partition hint.strategy=native by State
(
summarize Events=count(), Injuries=sum(InjuriesDirect) by EventType, State
| top 3 by Events
)
Output
EventType | State | Events | Injuries |
---|---|---|---|
Hail | WYOMING | 108 | 0 |
High Wind | WYOMING | 81 | 5 |
Winter Storm | WYOMING | 72 | 0 |
Heavy Snow | WASHINGTON | 82 | 0 |
High Wind | WASHINGTON | 58 | 13 |
Wildfire | WASHINGTON | 29 | 0 |
Thunderstorm Wind | WEST VIRGINIA | 180 | 1 |
Hail | WEST VIRGINIA | 103 | 0 |
Winter Weather | WEST VIRGINIA | 88 | 0 |
Thunderstorm Wind | WISCONSIN | 416 | 1 |
Winter Storm | WISCONSIN | 310 | 0 |
Hail | WISCONSIN | 303 | 1 |
Native strategy
The following query returns the top 2 EventType
values by TotalInjuries
for each State
that starts with ‘W’:
StormEvents
| where State startswith 'W'
| partition hint.strategy = native by State
(
summarize TotalInjueries = sum(InjuriesDirect) by EventType
| top 2 by TotalInjueries
)
Output
EventType | TotalInjueries |
---|---|
Tornado | 4 |
Hail | 1 |
Thunderstorm Wind | 1 |
Excessive Heat | 0 |
High Wind | 13 |
Lightning | 5 |
High Wind | 5 |
Avalanche | 3 |
Shuffle strategy
The following query returns the top 3 DamagedProperty
values foreach EpisodeId
and the columns EpisodeId
and State
.
StormEvents
| partition hint.strategy=shuffle by EpisodeId
(
top 3 by DamageProperty
| project EpisodeId, State, DamageProperty
)
| count
Output
Count |
---|
22345 |
Legacy strategy with explicit source
The following query runs two subqueries:
- When
x == 1
, the query returns all rows fromStormEvents
that haveInjuriesIndirect == 1
. - When
x == 2
, the query returns all rows fromStormEvents
that haveInjuriesIndirect == 2
.
The final result is the union of these two subqueries.
range x from 1 to 2 step 1
| partition hint.strategy=legacy by x {StormEvents | where x == InjuriesIndirect}
| count
Output
Count |
---|
113 |
Partition reference
The following example shows how to use the as operator to give a “name” to each data partition and then reuse that name within the subquery. This approach is only relevant to the legacy
strategy.
T
| partition by Dim
(
as Partition
| extend MetricPct = Metric * 100.0 / toscalar(Partition | summarize sum(Metric))
)
24 - print operator
Outputs a single row with one or more scalar expression results as columns.
Syntax
print
[ColumnName =
] ScalarExpression [,
…]
Parameters
Name | Type | Required | Description |
---|---|---|---|
ColumnName | string | The name to assign to the output column. | |
ScalarExpression | string | ✔️ | The expression to evaluate. |
Returns
A table with one or more columns and a single row. Each column returns the corresponding value of the evaluated ScalarExpression.
Examples
The examples in this section show how to use the syntax to help you get started.
Print sum and variable value
The following example outputs a row with two columns. One column contains the sum of a series of numbers and the other column contains the value of the variable, x
.
print 0 + 1 + 2 + 3 + 4 + 5, x = "Wow!"
Output
print_0 | x |
---|---|
15 | Wow! |
Print concatenated string
The following example outputs the results of the strcat()
function as a concatenated string.
print banner=strcat("Hello", ", ", "World!")
Output
banner |
---|
Hello, World! |
25 - Project operator
Select the columns to include, rename or drop, and insert new computed columns.
The order of the columns in the result is specified by the order of the arguments. Only the columns specified in the arguments are included in the result. Any other columns in the input are dropped.
Syntax
T | project
[ColumnName | (
ColumnName[,
])
=
] Expression [,
…]
or
T | project
ColumnName [=
Expression] [,
…]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input for which to project certain columns. |
ColumnName | string | A column name or comma-separated list of column names to appear in the output. | |
Expression | string | The scalar expression to perform over the input. |
- Either ColumnName or Expression must be specified.
- If there’s no Expression, then a column of ColumnName must appear in the input.
- If ColumnName is omitted, the output column name of Expression will be automatically generated.
- If Expression returns more than one column, a list of column names can be specified in parentheses. If a list of the column names isn’t specified, all Expression’s output columns with generated names will be added to the output.
Returns
A table with columns that were named as arguments. Contains same number of rows as the input table.
Examples
The examples in this section show how to use the syntax to help you get started.
Only show specific columns
Only show the EventId
, State
, EventType
of the StormEvents
table.
StormEvents
| project EventId, State, EventType
Output
The table shows the first 10 results.
EventId | State | EventType |
---|---|---|
61032 | ATLANTIC SOUTH | Waterspout |
60904 | FLORIDA | Heavy Rain |
60913 | FLORIDA | Tornado |
64588 | GEORGIA | Thunderstorm Wind |
68796 | MISSISSIPPI | Thunderstorm Wind |
68814 | MISSISSIPPI | Tornado |
68834 | MISSISSIPPI | Thunderstorm Wind |
68846 | MISSISSIPPI | Hail |
73241 | AMERICAN SAMOA | Flash Flood |
64725 | KENTUCKY | Flood |
… | … | … |
Potential manipulations using project
The following query renames the BeginLocation
column and creates a new column called TotalInjuries
from a calculation over two existing columns.
StormEvents
| project StartLocation = BeginLocation, TotalInjuries = InjuriesDirect + InjuriesIndirect
| where TotalInjuries > 5
Output
The table shows the first 10 results.
StartLocation | TotalInjuries |
---|---|
LYDIA | 15 |
ROYAL | 15 |
GOTHENBURG | 9 |
PLAINS | 8 |
KNOXVILLE | 9 |
CAROL STREAM | 11 |
HOLLY | 9 |
RUFFIN | 9 |
ENTERPRISE MUNI ARPT | 50 |
COLLIERVILLE | 6 |
… | … |
Related content
26 - project-away operator
Select what columns from the input table to exclude from the output table.
Syntax
T | project-away
ColumnNameOrPattern [,
…]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input from which to remove columns. |
ColumnNameOrPattern | string | ✔️ | One or more column names or column wildcard-patterns to be removed from the output. |
Returns
A table with columns that weren’t named as arguments. Contains same number of rows as the input table.
Examples
The input table PopulationData
has 2 columns: State
and Population
. Project-away the Population
column and you’re left with a list of state names.
PopulationData
| project-away Population
Output
The following table shows only the first 10 results.
State |
---|
ALABAMA |
ALASKA |
ARIZONA |
ARKANSAS |
CALIFORNIA |
COLORADO |
CONNECTICUT |
DELAWARE |
DISTRICT OF COLUMBIA |
FLORIDA |
… |
Project-away using a column name pattern
This query removes columns starting with the word “session”.
ConferenceSessions
| project-away session*
Output
The table shows only the first 10 results.
conference | owner | participants | URL | level | starttime | duration | time_and_duration | kusto_affinity |
---|---|---|---|---|---|---|---|---|
PASS Summit 2019 | Avner Aharoni | https://www.eventbrite.com/e/near-real-time-interact-analytics-on-big-data-using-azure-data-explorer-fg-tickets-77532775619 | 2019-11-07T19:15:00Z | Thu, Nov 7, 11:15 AM-12:15 PM PST | Focused | |||
PASS Summit | Rohan Kumar | Ariel Pisetzky | https://www.pass.org/summit/2018/Learn/Keynotes.aspx | 2018-11-07T08:15:00Z | 90 | Wed, Nov 7, 8:15-9:45 am | Mention | |
Intelligent Cloud 2019 | Rohan Kumar | Henning Rauch | 2019-04-09T09:00:00Z | 90 | Tue, Apr 9, 9:00-10:30 AM | Mention | ||
Ignite 2019 | Jie Feng | https://myignite.techcommunity.microsoft.com/sessions/83940 | 100 | 2019-11-06T14:35:00Z | 20 | Wed, Nov 6, 9:35 AM - 9:55 AM | Mention | |
Ignite 2019 | Bernhard Rode | Le Hai Dang, Ricardo Niepel | https://myignite.techcommunity.microsoft.com/sessions/81596 | 200 | 2019-11-06T16:45:00Z | 45 | Wed, Nov 6, 11:45 AM-12:30 PM | Mention |
Ignite 2019 | Tzvia Gitlin | Troyna | https://myignite.techcommunity.microsoft.com/sessions/83933 | 400 | 2019-11-06T17:30:00Z | 75 | Wed, Nov 6, 12:30 PM-1:30 PM | Focused |
Ignite 2019 | Jie Feng | https://myignite.techcommunity.microsoft.com/sessions/81057 | 300 | 2019-11-06T20:30:00Z | 45 | Wed, Nov 6, 3:30 PM-4:15 PM | Mention | |
Ignite 2019 | Manoj Raheja | https://myignite.techcommunity.microsoft.com/sessions/83939 | 300 | 2019-11-07T18:15:00Z | 20 | Thu, Nov 7, 1:15 PM-1:35 PM | Focused | |
Ignite 2019 | Uri Barash | https://myignite.techcommunity.microsoft.com/sessions/81060 | 300 | 2019-11-08T17:30:00Z | 45 | Fri, Nov8, 10:30 AM-11:15 AM | Focused | |
Ignite 2018 | Manoj Raheja | https://azure.microsoft.com/resources/videos/ignite-2018-azure-data-explorer-%E2%80%93-query-billions-of-records-in-seconds/ | 200 | 20 | Focused | |||
… | … | … | … | … | … | … | … | … |
Related content
- To choose what columns from the input to keep in the output, use project-keep.
- To rename columns, use
project-rename
. - To reorder columns, use
project-reorder
.
27 - project-keep operator
Select what columns from the input to keep in the output. Only the columns that are specified as arguments will be shown in the result. The other columns are excluded.
Syntax
T | project-keep
ColumnNameOrPattern [,
…]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input from which to keep columns. |
ColumnNameOrPattern | string | ✔️ | One or more column names or column wildcard-patterns to be kept in the output. |
Returns
A table with columns that were named as arguments. Contains same number of rows as the input table.
Example
This query returns columns from the ConferenceSessions
table that contain the word “session”.
ConferenceSessions
| project-keep session*
Output
The output table shows only the first 10 results.
sessionid | session_title | session_type | session_location |
---|---|---|---|
COM64 | Focus Group: Azure Data Explorer | Focus Group | Online |
COM65 | Focus Group: Azure Data Explorer | Focus Group | Online |
COM08 | Ask the Team: Azure Data Explorer | Ask the Team | Online |
COM137 | Focus Group: Built-In Dashboard and Smart Auto Scaling Capabilities in Azure Data Explorer | Focus Group | Online |
CON-PRT157 | Roundtable: Monitoring and managing your Azure Data Explorer deployments | Roundtable | Online |
CON-PRT103 | Roundtable: Advanced Kusto query language topics | Roundtable | Online |
CON-PRT157 | Roundtable: Monitoring and managing your Azure Data Explorer deployments | Roundtable | Online |
CON-PRT103 | Roundtable: Advanced Kusto query language topics | Roundtable | Online |
CON-PRT130 | Roundtable: Data exploration and visualization with Azure Data Explorer | Roundtable | Online |
CON-PRT130 | Roundtable: Data exploration and visualization with Azure Data Explorer | Roundtable | Online |
… | … | … | … |
Related content
- To choose what columns from the input to exclude from the output, use project-away.
- To rename columns, use
project-rename
. - To reorder columns, use
project-reorder
.
28 - project-rename operator
Renames columns in the output table.
Syntax
T | project-rename
NewColumnName = ExistingColumnName [,
…]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input tabular data. |
NewColumnName | string | ✔️ | The new column name. |
ExistingColumnName | string | ✔️ | The name of the existing column to rename. |
Returns
A table that has the columns in the same order as in an existing table, with columns renamed.
Example
If you have a table with columns a, b, and c, and you want to rename a to new_a and b to new_b while keeping the same order, the query would look like this:
print a='alpha', b='bravo', c='charlie'
| project-rename new_a=a, new_b=b, new_c=c
Output
new_a | new_b | new_c |
---|---|---|
alpha | bravo | charlie |
29 - project-reorder operator
Reorders columns in the output table.
Syntax
T | project-reorder
ColumnNameOrPattern [asc
| desc
| granny-asc
| granny-desc
] [,
…]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input tabular data. |
ColumnNameOrPattern | string | ✔️ | The name of the column or column wildcard pattern by which to order the columns. |
asc , desc , granny-asc , granny-desc | string | Indicates how to order the columns when a wildcard pattern is used. asc or desc orders columns by column name in ascending or descending manner, respectively. granny-asc or granny-desc orders by ascending or descending, respectively, while secondarily sorting by the next numeric value. For example, a20 comes before a100 when granny-asc is specified. |
Returns
A table that contains columns in the order specified by the operator arguments. project-reorder
doesn’t rename or remove columns from the table, therefore, all columns that existed in the source table, appear in the result table.
Examples
The examples in this section show how to use the syntax to help you get started.
Reorder with b first
Reorder a table with three columns (a, b, c) so the second column (b) will appear first.
print a='a', b='b', c='c'
| project-reorder b
Output
b | a | c |
---|---|---|
b | a | c |
Reorder with a first
Reorder columns of a table so that columns starting with a
will appear before other columns.
print b = 'b', a2='a2', a3='a3', a1='a1'
| project-reorder a* asc
Output
a1 | a2 | a3 | b |
---|---|---|---|
a1 | a2 | a3 | b |
30 - Queries
A query is a read-only operation against data ingested into your cluster. Queries always run in the context of a particular database in the cluster. They may also refer to data in another database, or even in another cluster.
As ad-hoc query of data is the top-priority scenario for Kusto, the Kusto Query Language syntax is optimized for non-expert users authoring and running queries over their data and being able to understand unambiguously what each query does (logically).
The language syntax is that of a data flow, where “data” means “tabular data” (data in one or more rows/columns rectangular shape). At a minimum, a query consists of source data references (references to Kusto tables) and one or more query operators applied in sequence, indicated visually by the use of a pipe character (|
) to delimit operators.
For example:
StormEvents
| where State == 'FLORIDA' and StartTime > datetime(2000-01-01)
| count
Each filter prefixed by the pipe character |
is an instance of an operator, with some parameters. The input to the operator is the table that is the result of the preceding pipeline. In most cases, any parameters are scalar expressions over the columns of the input.
In a few cases, the parameters are the names of input columns, and in a few cases, the parameter is a second table. The result of a query is always a table, even if it only has one column and one row.
T
is used in query to denote the preceding pipeline or source table.
31 - range operator
Generates a single-column table of values.
Syntax
range
columnName from
start to
stop step
step
Parameters
Name | Type | Required | Description |
---|---|---|---|
columnName | string | ✔️ | The name of the single column in the output table. |
start | int, long, real, datetime, or timespan | ✔️ | The smallest value in the output. |
stop | int, long, real, datetime, or timespan | ✔️ | The highest value being generated in the output or a bound on the highest value if step is over this value. |
step | int, long, real, datetime, or timespan | ✔️ | The difference between two consecutive values. |
Returns
A table with a single column called columnName,
whose values are start, start +
step, … up to and until stop.
Examples
The example in this section shows how to use the syntax to help you get started.
Range over the past seven days
The following example creates a table with entries for the current time stamp extended over the past seven days, once a day.
range LastWeek from ago(7d) to now() step 1d
Output
LastWeek |
---|
2015-12-05 09:10:04.627 |
2015-12-06 09:10:04.627 |
… |
2015-12-12 09:10:04.627 |
Combine different stop times
The following example shows how to extend ranges to use multiple stop times by using the union
operator.
let Range1 = range Time from datetime(2024-01-01) to datetime(2024-01-05) step 1d;
let Range2 = range Time from datetime(2024-01-06) to datetime(2024-01-10) step 1d;
union Range1, Range2
| order by Time asc
Output
Time |
---|
2024-01-04 00:00:00.0000000 |
2024-01-05 00:00:00.0000000 |
2024-01-06 00:00:00.0000000 |
2024-01-07 00:00:00.0000000 |
2024-01-08 00:00:00.0000000 |
2024-01-09 00:00:00.0000000 |
2024-01-10 00:00:00.0000000 |
Range using parameters
The following example shows how to use the range
operator with parameters, which are then extended and consumed as a table.
let toUnixTime = (dt:datetime)
{
(dt - datetime(1970-01-01)) / 1s
};
let MyMonthStart = startofmonth(now()); //Start of month
let StepBy = 4.534h; //Supported timespans
let nn = 64000; // Row Count parametrized
let MyTimeline = range MyMonthHour from MyMonthStart to now() step StepBy
| extend MyMonthHourinUnixTime = toUnixTime(MyMonthHour), DateOnly = bin(MyMonthHour,1d), TimeOnly = MyMonthHour - bin(MyMonthHour,1d)
; MyTimeline | order by MyMonthHour asc | take nn
Output
MyMonthHour | MyMonthHourinUnixTime | DateOnly | TimeOnly |
---|---|---|---|
2023-02-01 | 00:00:00.0000000 | 1675209600 | 2023-02-01 00:00:00.0000000 |
2023-02-01 | 04:32:02.4000000 | 1675225922.4 | 2023-02-01 00:00:00.0000000 |
2023-02-01 | 09:04:04.8000000 | 1675242244.8 | 2023-02-01 00:00:00.0000000 |
2023-02-01 | 13:36:07.2000000 | 1675258567.2 | 2023-02-01 00:00:00.0000000 |
… | … | … | … |
Incremented steps
The following example creates a table with a single column called Steps
whose type is long
and results in values from one to eight incremented by three.
range Steps from 1 to 8 step 3
Output
Steps |
---|
1 |
4 |
7 |
Traces over a time range
The following example shows how the range
operator can be used to create a dimension table that is used to introduce zeros where the source data has no values. It takes timestamps from the last four hours and counts traces for each one-minute interval. When there are no traces for a specific interval, the count is zero.
range TIMESTAMP from ago(4h) to now() step 1m
| join kind=fullouter
(Traces
| where TIMESTAMP > ago(4h)
| summarize Count=count() by bin(TIMESTAMP, 1m)
) on TIMESTAMP
| project Count=iff(isnull(Count), 0, Count), TIMESTAMP
| render timechart
32 - reduce operator
Groups a set of strings together based on value similarity.
For each such group, the operator returns a pattern
, count
, and representative
. The pattern
best describes the group, in which the *
character represents a wildcard. The count
is the number of values in the group, and the representative
is one of the original values in the group.
Syntax
T |
reduce
[kind
=
ReduceKind] by
Expr [with
[threshold
=
Threshold] [,
characters
=
Characters]]
Parameters
Name | Type | Required | Description |
---|---|---|---|
Expr | string | ✔️ | The value by which to reduce. |
Threshold | real | A value between 0 and 1 that determines the minimum fraction of rows required to match the grouping criteria in order to trigger a reduction operation. The default value is 0.1. We recommend setting a small threshold value for large inputs. With a smaller threshold value, more similar values are grouped together, resulting in fewer but more similar groups. A larger threshold value requires less similarity, resulting in more groups that are less similar. See Examples. | |
ReduceKind | string | The only valid value is source . If source is specified, the operator appends the Pattern column to the existing rows in the table instead of aggregating by Pattern . |
Returns
A table with as many rows as there are groups and columns titled pattern
, count
, and representative
. The pattern
best describes the group, in which the *
character represents a wildcard, or placeholder for an arbitrary insertion string. The count
is the number of values in the group, and the representative
is one of the original values in the group.
For example, the result of reduce by city
might include:
Pattern | Count | Representative |
---|---|---|
San * | 5182 | San Bernard |
Saint * | 2846 | Saint Lucy |
Moscow | 3726 | Moscow |
* -on- * | 2730 | One -on- One |
Paris | 2716 | Paris |
Examples
The example in this section shows how to use the syntax to help you get started.
Small threshold value
This query generates a range of numbers, creates a new column with concatenated strings and random integers, and then groups the rows by the new column with specific reduction parameters.
range x from 1 to 1000 step 1
| project MyText = strcat("MachineLearningX", tostring(toint(rand(10))))
| reduce by MyText with threshold=0.001 , characters = "X"
Output
Pattern | Count | Representative |
---|---|---|
MachineLearning* | 1000 | MachineLearningX4 |
Large threshold value
This query generates a range of numbers, creates a new column with concatenated strings and random integers, and then groups the rows by the new column with specific reduction parameters.
range x from 1 to 1000 step 1
| project MyText = strcat("MachineLearningX", tostring(toint(rand(10))))
| reduce by MyText with threshold=0.9 , characters = "X"
Output
The result includes only those groups where the MyText value appears in at least 90% of the rows.
Pattern | Count | Representative |
---|---|---|
MachineLearning* | 177 | MachineLearningX9 |
MachineLearning* | 102 | MachineLearningX0 |
MachineLearning* | 106 | MachineLearningX1 |
MachineLearning* | 96 | MachineLearningX6 |
MachineLearning* | 110 | MachineLearningX4 |
MachineLearning* | 100 | MachineLearningX3 |
MachineLearning* | 99 | MachineLearningX8 |
MachineLearning* | 104 | MachineLearningX7 |
MachineLearning* | 106 | MachineLearningX2 |
Behavior of Characters
parameter
If the Characters parameter is unspecified, then every non-ascii numeric character becomes a term separator.
range x from 1 to 10 step 1 | project str = strcat("foo", "Z", tostring(x)) | reduce by str
Output
Pattern | Count | Representative |
---|---|---|
others | 10 |
However, if you specify that “Z” is a separator, then it’s as if each value in str
is two terms: foo
and tostring(x)
:
range x from 1 to 10 step 1 | project str = strcat("foo", "Z", tostring(x)) | reduce by str with characters="Z"
Output
Pattern | Count | Representative |
---|---|---|
foo* | 10 | fooZ1 |
Apply reduce
to sanitized input
The following example shows how one might apply the reduce
operator to a “sanitized”
input, in which GUIDs in the column being reduced are replaced before reducing:
Start with a few records from the Trace table. Then reduce the Text column which includes random GUIDs. As random GUIDs interfere with the reduce operation, replace them all by the string “GUID”. Now perform the reduce operation. In case there are other “quasi-random” identifiers with embedded ‘-’ or ‘_’ characters in them, treat characters as non-term-breakers.
Trace
| take 10000
| extend Text = replace(@"[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}", "GUID", Text)
| reduce by Text with characters="-_"
Related content
33 - sample operator
Returns up to the specified number of random rows from the input table.
Syntax
T | sample
NumberOfRows
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input tabular expression. |
NumberOfRows | int, long, or real | ✔️ | The number of rows to return. You can specify any numeric expression. |
Examples
The example in this section shows how to use the syntax to help you get started.
Generate a sample
This query creates a range of numbers, samples one value, and then duplicates that sample.
let _data = range x from 1 to 100 step 1;
let _sample = _data | sample 1;
union (_sample), (_sample)
Output
x |
---|
74 |
63 |
To ensure that in example above _sample
is calculated once, one can use materialize() function:
let _data = range x from 1 to 100 step 1;
let _sample = materialize(_data | sample 1);
union (_sample), (_sample)
Output
x |
---|
24 |
24 |
Generate a sample of a certain percentage of data
To sample a certain percentage of your data (rather than a specified number of rows), you can use
StormEvents | where rand() < 0.1
Output
The table contains the first few rows of the output. Run the query to view the full result.
StartTime | EndTime | EpisodeId | EventId | State | EventType |
---|---|---|---|---|---|
2007-01-01T00:00:00Z | 2007-01-20T10:24:00Z | 2403 | 11914 | INDIANA | Flood |
2007-01-01T00:00:00Z | 2007-01-24T18:47:00Z | 2408 | 11930 | INDIANA | Flood |
2007-01-01T00:00:00Z | 2007-01-01T12:00:00Z | 1979 | 12631 | DELAWARE | Heavy Rain |
2007-01-01T00:00:00Z | 2007-01-01T00:00:00Z | 2592 | 13208 | NORTH CAROLINA | Thunderstorm Wind |
2007-01-01T00:00:00Z | 2007-01-31T23:59:00Z | 1492 | 7069 | MINNESOTA | Drought |
2007-01-01T00:00:00Z | 2007-01-31T23:59:00Z | 2240 | 10858 | TEXAS | Drought |
… | … | … | … | … | … |
Generate a sample of keys
To sample keys rather than rows (for example - sample 10 Ids and get all rows for these Ids), you can use sample-distinct
in combination with the in
operator.
let sampleEpisodes = StormEvents | sample-distinct 10 of EpisodeId;
StormEvents
| where EpisodeId in (sampleEpisodes)
Output
The table contains the first few rows of the output. Run the query to view the full result.
StartTime | EndTime | EpisodeId | EventId | State | EventType |
---|---|---|---|---|---|
2007-09-18T20:00:00Z | 2007-09-19T18:00:00Z | 11074 | 60904 | FLORIDA | Heavy Rain |
2007-09-20T21:57:00Z | 2007-09-20T22:05:00Z | 11078 | 60913 | FLORIDA | Tornado |
2007-09-29T08:11:00Z | 2007-09-29T08:11:00Z | 11091 | 61032 | ATLANTIC SOUTH | Waterspout |
2007-12-07T14:00:00Z | 2007-12-08T04:00:00Z | 13183 | 73241 | AMERICAN SAMOA | Flash Flood |
2007-12-11T21:45:00Z | 2007-12-12T16:45:00Z | 12826 | 70787 | KANSAS | Flood |
2007-12-13T09:02:00Z | 2007-12-13T10:30:00Z | 11780 | 64725 | KENTUCKY | Flood |
… | … | … | … | … | … |
34 - sample-distinct operator
Returns a single column that contains up to the specified number of distinct values of the requested column.
The operator tries to return an answer as quickly as possible rather than trying to make a fair sample.
Syntax
T | sample-distinct
NumberOfValues of
ColumnName
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input tabular expression. |
NumberOfValues | int, long, or real | ✔️ | The number distinct values of T to return. You can specify any numeric expression. |
ColumnName | string | ✔️ | The name of the column from which to sample. |
Examples
The example in this section shows how to use the syntax to help you get started.
Get 10 distinct values from a population
StormEvents | sample-distinct 10 of EpisodeId
Output
EpisodeId |
---|
11074 |
11078 |
11749 |
12554 |
12561 |
13183 |
11780 |
11781 |
12826 |
Further compute the sample values
let sampleEpisodes = StormEvents | sample-distinct 10 of EpisodeId;
StormEvents
| where EpisodeId in (sampleEpisodes)
| summarize totalInjuries=sum(InjuriesDirect) by EpisodeId
Output
EpisodeId | totalInjuries |
---|---|
11091 | 0 |
11074 | 0 |
11078 | 0 |
11749 | 0 |
12554 | 3 |
12561 | 0 |
13183 | 0 |
11780 | 0 |
11781 | 0 |
12826 | 0 |
35 - scan operator
Scans data, matches, and builds sequences based on the predicates.
Matching records are determined according to predicates defined in the operator’s steps. A predicate can depend on the state that is generated by previous steps. The output for the matching record is determined by the input record and assignments defined in the operator’s steps.
Syntax
T | scan
[ with_match_id
=
MatchIdColumnName ] [ declare
(
ColumnDeclarations )
] with
(
StepDefinitions )
ColumnDeclarations syntax
ColumnName :
ColumnType[=
DefaultValue ] [,
… ]
StepDefinition syntax
step
StepName [ output
= all
| last
| none
] :
Condition [ =>
Column =
Assignment [,
… ] ] ;
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input tabular source. |
MatchIdColumnName | string | The name of a column of type long that is appended to the output as part of the scan execution. Indicates the 0-based index of the match for the record. | |
ColumnDeclarations | string | Declares an extension to the schema of T. These columns are assigned values in the steps. If not assigned, the DefaultValue is returned. Unless otherwise specified, DefaultValue is null . | |
StepName | string | ✔️ | Used to reference values in the state of scan for conditions and assignments. The step name must be unique. |
Condition | string | ✔️ | An expression that evaluates to true or false that defines which records from the input match the step. A record matches the step when the condition is true with the step’s state or with the previous step’s state. |
Assignment | string | A scalar expression that is assigned to the corresponding column when a record matches a step. | |
output | string | Controls the output logic of the step on repeated matches. all outputs all records matching the step, last outputs only the last record in a series of repeating matches for the step, and none doesn’t output records matching the step. The default is all . |
Returns
A record for each match of a record from the input to a step. The schema of the output is the schema of the source extended with the column in the declare
clause.
Scan logic
scan
goes over the serialized input data, record by record, comparing each record against each step’s condition while taking into account the current state of each step.
State
The underlying state of the scan
operator can be thought of as a table with a row for each step
. Each step maintains its own state with the latest values of the columns and declared variables from all of the previous steps and the current step. If relevant, it also holds the match ID for the ongoing sequence.
If a scan operator has n steps named s_1, s_2, …, s_n then step s_k would have k records in its state corresponding to s_1, s_2, …, s_k. The StepName.ColumnName format is used to reference a value in the state. For instance, s_2.col1
would reference column col1
that belongs to step s_2 in the state of s_k. For a detailed example, see the scan logic walkthrough.
The state starts empty and updates whenever a scanned input record matches a step. When the state of the current step is nonempty, the step is referred to as having an active sequence.
Matching logic
Each input record is evaluated against all of the steps in reverse order, from the last step to the first. When a record r is evaluated against some step s_k, the following logic is applied:
Check 1: If the state of the previous step (s_k-1) is nonempty, and r meets the Condition of s_k, then a match occurs. The match leads to the following actions:
- The state of s_k is cleared.
- The state of s_k-1 is promoted to become the state of s_k.
- The assignments of s_k are calculated and extend r.
- The extended r is added to the output and to the state of s_k.
[!NOTE] If Check 1 results in a match, Check 2 is disregarded, and r moves on to be evaluated against s_k-1.
Check 2: If the state of s_k has an active sequence or s_k is the first step, and r meets the Condition of s_k, then a match occurs. The match leads to the following actions:
- The assignments of s_k are calculated and extend r.
- The values that represent s_k in the state of s_k are replaced with the values of the extended r.
- If s_k is defined as
output=all
, the extended r is added to the output. - If s_k is the first step, a new sequence begins and the match ID increases by
1
. This only affects the output whenwith_match_id
is used.
Once the checks for s_k are complete, r moves on to be evaluated against s_k-1.
For a detailed example of this logic, see the scan logic walkthrough.
Examples
The example in this section shows how to use the syntax to help you get started.
Cumulative sum
Calculate the cumulative sum for an input column. The result of this example is equivalent to using row_cumsum().
range x from 1 to 5 step 1
| scan declare (cumulative_x:long=0) with
(
step s1: true => cumulative_x = x + s1.cumulative_x;
)
Output
x | cumulative_x |
---|---|
1 | 1 |
2 | 3 |
3 | 6 |
4 | 10 |
5 | 15 |
Cumulative sum on multiple columns with a reset condition
Calculate the cumulative sum for two input columns, reset the sum value to the current record value whenever the cumulative sum reached 10 or more.
range x from 1 to 5 step 1
| extend y = 2 * x
| scan declare (cumulative_x:long=0, cumulative_y:long=0) with
(
step s1: true => cumulative_x = iff(s1.cumulative_x >= 10, x, x + s1.cumulative_x),
cumulative_y = iff(s1.cumulative_y >= 10, y, y + s1.cumulative_y);
)
Output
x | y | cumulative_x | cumulative_y |
---|---|---|---|
1 | 2 | 1 | 2 |
2 | 4 | 3 | 6 |
3 | 6 | 6 | 12 |
4 | 8 | 10 | 8 |
5 | 10 | 5 | 18 |
Fill forward a column
Fill forward a string column. Each empty value is assigned the last seen nonempty value.
let Events = datatable (Ts: timespan, Event: string) [
0m, "A",
1m, "",
2m, "B",
3m, "",
4m, "",
6m, "C",
8m, "",
11m, "D",
12m, ""
]
;
Events
| sort by Ts asc
| scan declare (Event_filled: string="") with
(
step s1: true => Event_filled = iff(isempty(Event), s1.Event_filled, Event);
)
Output
Ts | Event | Event_filled |
---|---|---|
00:00:00 | A | A |
00:01:00 | A | |
00:02:00 | B | B |
00:03:00 | B | |
00:04:00 | B | |
00:06:00 | C | C |
00:08:00 | C | |
00:11:00 | D | D |
00:12:00 | D |
Sessions tagging
Divide the input into sessions: a session ends 30 minutes after the first event of the session, after which a new session starts. Note the use of with_match_id
flag, which assigns a unique value for each distinct match (session) of scan. Also note the special use of two steps in this example, inSession
has true
as condition so it captures and outputs all the records from the input while endSession
captures records that happen more than 30m from the sessionStart
value for the current match. The endSession
step has output=none
meaning it doesn’t produce output records. The endSession
step is used to advance the state of the current match from inSession
to endSession
, allowing a new match (session) to begin, starting from the current record.
let Events = datatable (Ts: timespan, Event: string) [
0m, "A",
1m, "A",
2m, "B",
3m, "D",
32m, "B",
36m, "C",
38m, "D",
41m, "E",
75m, "A"
]
;
Events
| sort by Ts asc
| scan with_match_id=session_id declare (sessionStart: timespan) with
(
step inSession: true => sessionStart = iff(isnull(inSession.sessionStart), Ts, inSession.sessionStart);
step endSession output=none: Ts - inSession.sessionStart > 30m;
)
Output
Ts | Event | sessionStart | session_id |
---|---|---|---|
00:00:00 | A | 00:00:00 | 0 |
00:01:00 | A | 00:00:00 | 0 |
00:02:00 | B | 00:00:00 | 0 |
00:03:00 | D | 00:00:00 | 0 |
00:32:00 | B | 00:32:00 | 1 |
00:36:00 | C | 00:32:00 | 1 |
00:38:00 | D | 00:32:00 | 1 |
00:41:00 | E | 00:32:00 | 1 |
01:15:00 | A | 01:15:00 | 2 |
Events between Start and Stop
Find all sequences of events between the event Start
and the event Stop
that occur within 5 minutes. Assign a match ID for each sequence.
let Events = datatable (Ts: timespan, Event: string) [
0m, "A",
1m, "Start",
2m, "B",
3m, "D",
4m, "Stop",
6m, "C",
8m, "Start",
11m, "E",
12m, "Stop"
]
;
Events
| sort by Ts asc
| scan with_match_id=m_id with
(
step s1: Event == "Start";
step s2: Event != "Start" and Event != "Stop" and Ts - s1.Ts <= 5m;
step s3: Event == "Stop" and Ts - s1.Ts <= 5m;
)
Output
Ts | Event | m_id |
---|---|---|
00:01:00 | Start | 0 |
00:02:00 | B | 0 |
00:03:00 | D | 0 |
00:04:00 | Stop | 0 |
00:08:00 | Start | 1 |
00:11:00 | E | 1 |
00:12:00 | Stop | 1 |
Calculate a custom funnel of events
Calculate a funnel completion of the sequence Hail
-> Tornado
-> Thunderstorm Wind
by State
with custom thresholds on the times between the events (Tornado
within 1h
and Thunderstorm Wind
within 2h
). This example is similar to the funnel_sequence_completion plugin, but allows greater flexibility.
StormEvents
| partition hint.strategy=native by State
(
sort by StartTime asc
| scan with
(
step hail: EventType == "Hail";
step tornado: EventType == "Tornado" and StartTime - hail.StartTime <= 1h;
step thunderstormWind: EventType == "Thunderstorm Wind" and StartTime - tornado.StartTime <= 2h;
)
)
| summarize dcount(State) by EventType
Output
EventType | dcount_State |
---|---|
Hail | 50 |
Tornado | 34 |
Thunderstorm Wind | 32 |
Scan logic walkthrough
This section demonstrates the scan logic using a step-by-step walkthrough of the Events between start and stop example:
let Events = datatable (Ts: timespan, Event: string) [
0m, "A",
1m, "Start",
2m, "B",
3m, "D",
4m, "Stop",
6m, "C",
8m, "Start",
11m, "E",
12m, "Stop"
]
;
Events
| sort by Ts asc
| scan with_match_id=m_id with
(
step s1: Event == "Start";
step s2: Event != "Start" and Event != "Stop" and Ts - s1.Ts <= 5m;
step s3: Event == "Stop" and Ts - s1.Ts <= 5m;
)
Output
Ts | Event | m_id |
---|---|---|
00:01:00 | Start | 0 |
00:02:00 | B | 0 |
00:03:00 | D | 0 |
00:04:00 | Stop | 0 |
00:08:00 | Start | 1 |
00:11:00 | E | 1 |
00:12:00 | Stop | 1 |
The state
Think of the state of the scan
operator as a table with a row for each step, in which each step has its own state. This state contains the latest values of the columns and declared variables from all of the previous steps and the current step. To learn more, see State.
For this example, the state can be represented with the following table:
step | m_id | s1.Ts | s1.Event | s2.Ts | s2.Event | s3.Ts | s3.Event |
---|---|---|---|---|---|---|---|
s1 | X | X | X | X | |||
s2 | X | X | |||||
s3 |
The “X” indicates that a specific field is irrelevant for that step.
The matching logic
This section follows the matching logic through each record of the Events
table, explaining the transformation of the state and output at each step.
Record 1
Ts | Event |
---|---|
0m | “A” |
Record evaluation at each step:
s3
: Check 1 isn’t passed because the state ofs2
is empty, and Check 2 isn’t passed becauses3
lacks an active sequence.s2
: Check 1 isn’t passed because the state ofs1
is empty, and Check 2 isn’t passed becauses2
lacks an active sequence.s1
: Check 1 is irrelevant because there’s no previous step. Check 2 isn’t passed because the record doesn’t meet the condition ofEvent == "Start"
. Record 1 is discarded without affecting the state or output.
State:
step | m_id | s1.Ts | s1.Event | s2.Ts | s2.Event | s3.Ts | s3.Event |
---|---|---|---|---|---|---|---|
s1 | X | X | X | X | |||
s2 | X | X | |||||
s3 |
Record 2
Ts | Event |
---|---|
1m | “Start” |
Record evaluation at each step:
s3
: Check 1 isn’t passed because the state ofs2
is empty, and Check 2 isn’t passed becauses3
lacks an active sequence.s2
: Check 1 isn’t passed because the state ofs1
is empty, and Check 2 isn’t passed becauses2
lacks an active sequence.s1
: Check 1 is irrelevant because there’s no previous step. Check 2 is passed because the record meets the condition ofEvent == "Start"
. This match initiates a new sequence, and them_id
is assigned. Record 2 and itsm_id
(0
) are added to the state and the output.
State:
step | m_id | s1.Ts | s1.Event | s2.Ts | s2.Event | s3.Ts | s3.Event |
---|---|---|---|---|---|---|---|
s1 | 0 | 00:01:00 | “Start” | X | X | X | X |
s2 | X | X | |||||
s3 |
Record 3
Ts | Event |
---|---|
2m | “B” |
Record evaluation at each step:
s3
: Check 1 isn’t passed because the state ofs2
is empty, and Check 2 isn’t passed becauses3
lacks an active sequence.s2
: Check 1 is passed because the state ofs1
is nonempty and the record meets the condition ofTs - s1.Ts < 5m
. This match causes the state ofs1
to be cleared and the sequence ins1
to be promoted tos2
. Record 3 and itsm_id
(0
) are added to the state and the output.s1
: Check 1 is irrelevant because there’s no previous step, and Check 2 isn’t passed because the record doesn’t meet the condition ofEvent == "Start"
.
State:
step | m_id | s1.Ts | s1.Event | s2.Ts | s2.Event | s3.Ts | s3.Event |
---|---|---|---|---|---|---|---|
s1 | X | X | X | X | |||
s2 | 0 | 00:01:00 | “Start” | 00:02:00 | “B” | X | X |
s3 |
Record 4
Ts | Event |
---|---|
3m | “D” |
Record evaluation at each step:
s3
: Check 1 isn’t passed because the record doesn’t meet the condition ofEvent == "Stop"
, and Check 2 isn’t passed becauses3
lacks an active sequence.s2
: Check 1 isn’t passed because the state ofs1
is empty. it passes Check 2 because it meets the condition ofTs - s1.Ts < 5m
. Record 4 and itsm_id
(0
) are added to the state and the output. The values from this record overwrite the previous state values fors2.Ts
ands2.Event
.s1
: Check 1 is irrelevant because there’s no previous step, and Check 2 isn’t passed because the record doesn’t meet the condition ofEvent == "Start"
.
State:
step | m_id | s1.Ts | s1.Event | s2.Ts | s2.Event | s3.Ts | s3.Event |
---|---|---|---|---|---|---|---|
s1 | X | X | X | X | |||
s2 | 0 | 00:01:00 | “Start” | 00:03:00 | “D” | X | X |
s3 |
Record 5
Ts | Event |
---|---|
4m | “Stop” |
Record evaluation at each step:
s3
: Check 1 is passed becauses2
is nonempty and it meets thes3
condition ofEvent == "Stop"
. This match causes the state ofs2
to be cleared and the sequence ins2
to be promoted tos3
. Record 5 and itsm_id
(0
) are added to the state and the output.s2
: Check 1 isn’t passed because the state ofs1
is empty, and Check 2 isn’t passed becauses2
lacks an active sequence.s1
: Check 1 is irrelevant because there’s no previous step. Check 2 isn’t passed because the record doesn’t meet the condition ofEvent == "Start"
.
State:
step | m_id | s1.Ts | s1.Event | s2.Ts | s2.Event | s3.Ts | s3.Event |
---|---|---|---|---|---|---|---|
s1 | X | X | X | X | |||
s2 | X | X | |||||
s3 | 0 | 00:01:00 | “Start” | 00:03:00 | “D” | 00:04:00 | “Stop” |
Record 6
Ts | Event |
---|---|
6m | “C” |
Record evaluation at each step:
s3
: Check 1 isn’t passed because the state ofs2
is empty, and Check 2 isn’t passed becauses3
doesn’t meet thes3
condition ofEvent == "Stop"
.s2
: Check 1 isn’t passed because the state ofs1
is empty, and Check 2 isn’t passed becauses2
lacks an active sequence.s1
: Check 1 isn’t passed because there’s no previous step, and Check 2 isn’t passed because it doesn’t meet the condition ofEvent == "Start"
. Record 6 is discarded without affecting the state or output.
State:
step | m_id | s1.Ts | s1.Event | s2.Ts | s2.Event | s3.Ts | s3.Event |
---|---|---|---|---|---|---|---|
s1 | X | X | X | X | |||
s2 | X | X | |||||
s3 | 0 | 00:01:00 | “Start” | 00:03:00 | “D” | 00:04:00 | “Stop” |
Record 7
Ts | Event |
---|---|
8m | “Start” |
Record evaluation at each step:
s3
: Check 1 isn’t passed because the state ofs2
is empty, and Check 2 isn’t passed because it doesn’t meet the condition ofEvent == "Stop"
.s2
: Check 1 isn’t passed because the state ofs1
is empty, and Check 2 isn’t passed becauses2
lacks an active sequence.s1
: Check 1 isn’t passed because there’s no previous step. it passes Check 2 because it meets the condition ofEvent == "Start"
. This match initiates a new sequence ins1
with a newm_id
. Record 7 and itsm_id
(1
) are added to the state and the output.
State:
step | m_id | s1.Ts | s1.Event | s2.Ts | s2.Event | s3.Ts | s3.Event |
---|---|---|---|---|---|---|---|
s1 | 1 | 00:08:00 | “Start” | X | X | X | X |
s2 | X | X | |||||
s3 | 0 | 00:01:00 | “Start” | 00:03:00 | “D” | 00:04:00 | “Stop” |
Record 8
Ts | Event |
---|---|
11m | “E” |
Record evaluation at each step:
s3
: Check 1 isn’t passed because the state ofs2
is empty, and Check 2 isn’t passed because it doesn’t meet thes3
condition ofEvent == "Stop"
.s2
: Check 1 is passed because the state ofs1
is nonempty and the record meets the condition ofTs - s1.Ts < 5m
. This match causes the state ofs1
to be cleared and the sequence ins1
to be promoted tos2
. Record 8 and itsm_id
(1
) are added to the state and the output.s1
: Check 1 is irrelevant because there’s no previous step, and Check 2 isn’t passed because the record doesn’t meet the condition ofEvent == "Start"
.
State:
step | m_id | s1.Ts | s1.Event | s2.Ts | s2.Event | s3.Ts | s3.Event |
---|---|---|---|---|---|---|---|
s1 | X | X | X | X | |||
s2 | 1 | 00:08:00 | “Start” | 00:11:00 | “E” | X | X |
s3 | 0 | 00:01:00 | “Start” | 00:03:00 | “D” | 00:04:00 | “Stop” |
Record 9
Ts | Event |
---|---|
12m | “Stop” |
Record evaluation at each step:
s3
: Check 1 is passed becauses2
is nonempty and it meets thes3
condition ofEvent == "Stop"
. This match causes the state ofs2
to be cleared and the sequence ins2
to be promoted tos3
. Record 9 and itsm_id
(1
) are added to the state and the output.s2
: Check 1 isn’t passed because the state ofs1
is empty, and Check 2 isn’t passed becauses2
lacks an active sequence.s1
: Check 1 isn’t passed because there’s no previous step. it passes Check 2 because it meets the condition ofEvent == "Start"
. This match initiates a new sequence ins1
with a newm_id
.
State:
step | m_id | s1.Ts | s1.Event | s2.Ts | s2.Event | s3.Ts | s3.Event |
---|---|---|---|---|---|---|---|
s1 | X | X | X | X | |||
s2 | X | X | |||||
s3 | 1 | 00:08:00 | “Start” | 00:11:00 | “E” | 00:12:00 | “Stop” |
Final output
Ts | Event | m_id |
---|---|---|
00:01:00 | Start | 0 |
00:02:00 | B | 0 |
00:03:00 | D | 0 |
00:04:00 | Stop | 0 |
00:08:00 | Start | 1 |
00:11:00 | E | 1 |
00:12:00 | Stop | 1 |
36 - search operator
Searches a text pattern in multiple tables and columns.
Syntax
[T |
] search
[kind=
CaseSensitivity ] [in
(
TableSources)
] SearchPredicate
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | The tabular data source to be searched over, such as a table name, a union operator, or the results of a tabular query. Can’t be specified together with TableSources. | |
CaseSensitivity | string | A flag that controls the behavior of all string scalar operators, such as has , with respect to case sensitivity. Valid values are default , case_insensitive , case_sensitive . The options default and case_insensitive are synonymous, since the default behavior is case insensitive. | |
TableSources | string | A comma-separated list of “wildcarded” table names to take part in the search. The list has the same syntax as the list of the union operator. Can’t be specified together with tabular data source (T). | |
SearchPredicate | string | ✔️ | A boolean expression to be evaluated for every record in the input. If it returns true , the record is outputted. See Search predicate syntax. |
Search predicate syntax
The SearchPredicate allows you to search for specific terms in all columns of a table. The operator that is applied to a search term depends on the presence and placement of a wildcard asterisk (*
) in the term, as shown in the following table.
Literal | Operator |
---|---|
billg | has |
*billg | hassuffix |
billg* | hasprefix |
*billg* | contains |
bi*lg | matches regex |
You can also restrict the search to a specific column, look for an exact match instead of a term match, or search by regular expression. The syntax for each of these cases is shown in the following table.
Syntax | Explanation |
---|---|
ColumnName: StringLiteral | This syntax can be used to restrict the search to a specific column. The default behavior is to search all columns. |
ColumnName== StringLiteral | This syntax can be used to search for exact matches of a column against a string value. The default behavior is to look for a term-match. |
Column matches regex StringLiteral | This syntax indicates regular expression matching, in which StringLiteral is the regex pattern. |
Use boolean expressions to combine conditions and create more complex searches. For example, "error" and x==123
would result in a search for records that have the term error
in any columns and the value 123
in the x
column.
Search predicate syntax examples
# | Syntax | Meaning (equivalent where ) | Comments |
---|---|---|---|
1 | search "err" | where * has "err" | |
2 | search in (T1,T2,A*) "err" | union T1,T2,A* | where * has “err” | |
3 | search col:"err" | where col has "err" | |
4 | search col=="err" | where col=="err" | |
5 | search "err*" | where * hasprefix "err" | |
6 | search "*err" | where * hassuffix "err" | |
7 | search "*err*" | where * contains "err" | |
8 | search "Lab*PC" | where * matches regex @"\bLab.*PC\b" | |
9 | search * | where 0==0 | |
10 | search col matches regex "..." | where col matches regex "..." | |
11 | search kind=case_sensitive | All string comparisons are case-sensitive | |
12 | search "abc" and ("def" or "hij") | where * has "abc" and (* has "def" or * has hij") | |
13 | search "err" or (A>a and A<b) | where * has "err" or (A>a and A<b) |
Remarks
Unlike the find operator, the search
operator doesn’t support the following syntax:
withsource=
: The output always includes a column called$table
of typestring
whose value is the table name from which each record was retrieved (or some system-generated name if the source isn’t a table but a composite expression).project=
,project-smart
: The output schema is equivalent toproject-smart
output schema.
Examples
The example in this section shows how to use the syntax to help you get started.
Global term search
Search for the term Green in all the tables of the ContosoSales database.
The output finds records with the term Green as a last name or a color in the Customers
, Products
, and SalesTable
tables.
search "Green"
Output
$table | CityName | ContinentName | CustomerKey | Education | FirstName | Gender | LastName |
---|---|---|---|---|---|---|---|
Customers | Ballard | North America | 16549 | Partial College | Mason | M | Green |
Customers | Bellingham | North America | 2070 | High School | Adam | M | Green |
Customers | Bellingham | North America | 10658 | Bachelors | Sara | F | Green |
Customers | Beverly Hills | North America | 806 | Graduate Degree | Richard | M | Green |
Customers | Beverly Hills | North America | 7674 | Graduate Degree | James | M | Green |
Customers | Burbank | North America | 5241 | Graduate Degree | Madeline | F | Green |
Conditional global term search
Search for records that contain the term Green and one of either terms Deluxe or Proseware in the ContosoSales database.
search "Green" and ("Deluxe" or "Proseware")
Output
$table | ProductName | Manufacturer | ColorName | ClassName | ProductCategoryName |
---|---|---|---|---|---|
Products | Contoso 8GB Clock & Radio MP3 Player X850 Green | Contoso, Ltd | Green | Deluxe | Audio |
Products | Proseware Scan Jet Digital Flat Bed Scanner M300 Green | Proseware, Inc. | Green | Regular | Computers |
Products | Proseware All-In-One Photo Printer M200 Green | Proseware, Inc. | Green | Regular | Computers |
Products | Proseware Ink Jet Wireless All-In-One Printer M400 Green | Proseware, Inc. | Green | Regular | Computers |
Products | Proseware Ink Jet Instant PDF Sheet-Fed Scanner M300 Green | Proseware, Inc. | Green | Regular | Computers |
Products | Proseware Desk Jet All-in-One Printer, Scanner, Copier M350 Green | Proseware, Inc. | Green | Regular | Computers |
Products | Proseware Duplex Scanner M200 Green | Proseware, Inc. | Green | Regular | Computers |
Search a specific table
Search for the term Green only in the Customers
table.
search in (Products) "Green"
Output
$table | ProductName | Manufacturer | ColorName |
---|---|---|---|
Products | Contoso 4G MP3 Player E400 Green | Contoso, Ltd | Green |
Products | Contoso 8GB Super-Slim MP3/Video Player M800 Green | Contoso, Ltd | Green |
Products | Contoso 16GB Mp5 Player M1600 Green | Contoso, Ltd | Green |
Products | Contoso 8GB Clock & Radio MP3 Player X850 Green | Contoso, Ltd | Green |
Products | NT Wireless Bluetooth Stereo Headphones M402 Green | Northwind Traders | Green |
Products | NT Wireless Transmitter and Bluetooth Headphones M150 Green | Northwind Traders | Green |
Case-sensitive search
Search for records that match the case-sensitive term in the ContosoSales database.
search kind=case_sensitive "blue"
Output
$table | ProductName | Manufacturer | ColorName | ClassName |
---|---|---|---|---|
Products | Contoso 16GB New Generation MP5 Player M1650 blue | Contoso, Ltd | blue | Regular |
Products | Contoso Bright Light battery E20 blue | Contoso, Ltd | blue | Economy |
Products | Litware 120mm Blue LED Case Fan E901 blue | Litware, Inc. | blue | Economy |
NewSales | Litware 120mm Blue LED Case Fan E901 blue | Litware, Inc. | blue | Economy |
NewSales | Litware 120mm Blue LED Case Fan E901 blue | Litware, Inc. | blue | Economy |
NewSales | Litware 120mm Blue LED Case Fan E901 blue | Litware, Inc. | blue | Economy |
NewSales | Litware 120mm Blue LED Case Fan E901 blue | Litware, Inc. | blue | Economy |
Search specific columns
Search for the terms Aaron and Hughes, in the “FirstName” and “LastName” columns respectively, in the ContosoSales database.
search FirstName:"Aaron" or LastName:"Hughes"
Output
$table | CustomerKey | Education | FirstName | Gender | LastName |
---|---|---|---|---|---|
Customers | 18285 | High School | Riley | F | Hughes |
Customers | 802 | Graduate Degree | Aaron | M | Sharma |
Customers | 986 | Bachelors | Melanie | F | Hughes |
Customers | 12669 | High School | Jessica | F | Hughes |
Customers | 13436 | Graduate Degree | Mariah | F | Hughes |
Customers | 10152 | Graduate Degree | Aaron | M | Campbell |
Limit search by timestamp
Search for the term Hughes in the ContosoSales database, if the term appears in a record with a date greater than the given date in ‘datetime’.
search "Hughes" and DateKey > datetime('2009-01-01')
Output
$table | DateKey | SalesAmount_real |
---|---|---|
SalesTable | 2021-12-13T00:00:00Z | 446.4715 |
SalesTable | 2021-12-13T00:00:00Z | 120.555 |
SalesTable | 2021-12-13T00:00:00Z | 48.4405 |
SalesTable | 2021-12-13T00:00:00Z | 39.6435 |
SalesTable | 2021-12-13T00:00:00Z | 56.9905 |
Performance Tips
# | Tip | Prefer | Over |
---|---|---|---|
1 | Prefer to use a single search operator over several consecutive search operators | search "billg" and ("steveb" or "satyan") | search “billg” | search “steveb” or “satyan” |
2 | Prefer to filter inside the search operator | search "billg" and "steveb" | search * | where * has “billg” and * has “steveb” |
37 - serialize operator
Marks that the order of the input row set is safe to use for window functions.
The operator has a declarative meaning. It marks the input row set as serialized (ordered), so that window functions can be applied to it.
Syntax
serialize
[Name1 =
Expr1 [,
Name2 =
Expr2]…]
Parameters
Name | Type | Required | Description |
---|---|---|---|
Name | string | The name of the column to add or update. If omitted, the output column name is automatically generated. | |
Expr | string | ✔️ | The calculation to perform over the input. |
Examples
The example in this section shows how to use the syntax to help you get started.
Serialize subset of rows by condition
This query retrieves all log entries from the TraceLogs table that have a specific ClientRequestId and preserves the order of these entries during processing.
TraceLogs
| where ClientRequestId == "5a848f70-9996-eb17-15ed-21b8eb94bf0e"
| serialize
Output
This table only shows the top 5 query results.
Timestamp | Node | Component | ClientRequestId | Message |
---|---|---|---|---|
2014-03-08T12:24:55.5464757Z | Engine000000000757 | INGESTOR_GATEWAY | 5a848f70-9996-eb17-15ed-21b8eb94bf0e | $$IngestionCommand table=fogEvents format=json |
2014-03-08T12:24:56.0929514Z | Engine000000000757 | DOWNLOADER | 5a848f70-9996-eb17-15ed-21b8eb94bf0e | Downloading file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_0.json.gz"" |
2014-03-08T12:25:40.3574831Z | Engine000000000341 | INGESTOR_EXECUTER | 5a848f70-9996-eb17-15ed-21b8eb94bf0e | IngestionCompletionEvent: finished ingestion file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_0.json.gz"" |
2014-03-08T12:25:40.9039588Z | Engine000000000341 | DOWNLOADER | 5a848f70-9996-eb17-15ed-21b8eb94bf0e | Downloading file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_1.json.gz"" |
2014-03-08T12:26:25.1684905Z | Engine000000000057 | INGESTOR_EXECUTER | 5a848f70-9996-eb17-15ed-21b8eb94bf0e | IngestionCompletionEvent: finished ingestion file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_1.json.gz"" |
… | … | … | … | … |
Add row number to the serialized table
To add a row number to the serialized table, use the row_number() function.
TraceLogs
| where ClientRequestId == "5a848f70-9996-eb17-15ed-21b8eb94bf0e"
| serialize rn = row_number()
Output
This table only shows the top 5 query results.
Timestamp | rn | Node | Component | ClientRequestId | Message |
---|---|---|---|---|---|
2014-03-08T13:00:01.6638235Z | 1 | Engine000000000899 | INGESTOR_EXECUTER | 5a848f70-9996-eb17-15ed-21b8eb94bf0e | IngestionCompletionEvent: finished ingestion file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_46.json.gz"" |
2014-03-08T13:00:02.2102992Z | 2 | Engine000000000899 | DOWNLOADER | 5a848f70-9996-eb17-15ed-21b8eb94bf0e | Downloading file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_47.json.gz"" |
2014-03-08T13:00:46.4748309Z | 3 | Engine000000000584 | INGESTOR_EXECUTER | 5a848f70-9996-eb17-15ed-21b8eb94bf0e | IngestionCompletionEvent: finished ingestion file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_47.json.gz"" |
2014-03-08T13:00:47.0213066Z | 4 | Engine000000000584 | DOWNLOADER | 5a848f70-9996-eb17-15ed-21b8eb94bf0e | Downloading file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_48.json.gz"" |
2014-03-08T13:01:31.2858383Z | 5 | Engine000000000380 | INGESTOR_EXECUTER | 5a848f70-9996-eb17-15ed-21b8eb94bf0e | IngestionCompletionEvent: finished ingestion file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_48.json.gz"" |
… | … | … | … | … |
Serialization behavior of operators
The output row set of the following operators is marked as serialized.
The output row set of the following operators is marked as nonserialized.
All other operators preserve the serialization property. If the input row set is serialized, then the output row set is also serialized.
38 - Shuffle query
The shuffle
query is a semantic-preserving transformation used with a set of operators that support the shuffle
strategy. Depending on the data involved, querying with the shuffle
strategy can yield better performance. It’s better to use the shuffle query strategy when the shuffle
key (a join
key, summarize
key, make-series
key or partition
key) has a high cardinality and the regular operator query hits query limits.
You can use the following operators with the shuffle command:
To use the shuffle
query strategy, add the expression hint.strategy = shuffle
or hint.shufflekey = <key>
. When you use hint.strategy=shuffle
, the operator data will be shuffled by all the keys. Use this expression when the compound key is unique but each key isn’t unique enough, so you’ll shuffle the data using all the keys of the shuffled operator.
When partitioning data with the shuffle strategy, the data load is shared on all cluster nodes. Each node processes one partition of the data. The default number of partitions is equal to the number of cluster nodes.
The partition number can be overridden by using the syntax hint.num_partitions = total_partitions
, which will control the number of partitions. This is useful when the cluster has a small number of cluster nodes and the default partitions number will be small, and the query fails or takes a long execution time.
In some cases, the hint.strategy = shuffle
is ignored, and the query won’t run in shuffle
strategy. This can happen when:
- The
join
operator has anothershuffle
-compatible operator (join
,summarize
,make-series
orpartition
) on the left side or the right side. - The
summarize
operator appears after anothershuffle
-compatible operator (join
,summarize
,make-series
orpartition
) in the query.
Syntax
With hint.strategy
= shuffle
T |
DataExpression |
join
hint.strategy
= shuffle
(
DataExpression )
T |
summarize
hint.strategy
= shuffle
DataExpression
T |
Query |
partition hint.strategy
= shuffle
(
SubQuery )
With hint.shufflekey
= key
T |
DataExpression |
join
hint.shufflekey
= key (
DataExpression )
T |
summarize
hint.shufflekey
= key DataExpression
T |
make-series
hint.shufflekey
= key DataExpression
T |
Query |
partition hint.shufflekey
= key (
SubQuery )
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular source whose data is to be processed by the operator. |
DataExpression | string | An implicit or explicit tabular transformation expression. | |
Query | string | A transformation expression run on the records of T. | |
key | string | Use a join key, summarize key, make-series key or partition key. | |
SubQuery | string | A transformation expression. |
Examples
The example in this section shows how to use the syntax to help you get started.
Use summarize with shuffle
The shuffle
strategy query with summarize
operator shares the load on all cluster nodes, where each node processes one partition of the data.
StormEvents
| summarize hint.strategy = shuffle count(), avg(InjuriesIndirect) by State
| count
Output
Count |
---|
67 |
Use join with shuffle
StormEvents
| where State has "West"
| where EventType has "Flood"
| join hint.strategy=shuffle
(
StormEvents
| where EventType has "Hail"
| project EpisodeId, State, DamageProperty
)
on State
| count
Output
Count |
---|
103 |
Use make-series with shuffle
StormEvents
| where State has "North"
| make-series hint.shufflekey = State sum(DamageProperty) default = 0 on StartTime in range(datetime(2007-01-01 00:00:00.0000000), datetime(2007-01-31 23:59:00.0000000), 15d) by State
Output
State | sum_DamageProperty | StartTime | |
---|---|---|---|
NORTH DAKOTA | [60000,0,0] | [“2006-12-31T00:00:00.0000000Z”,“2007-01-15T00:00:00.0000000Z”,“2007-01-30T00:00:00.0000000Z”] | |
NORTH CAROLINA | [20000,0,1000] | [“2006-12-31T00:00:00.0000000Z”,“2007-01-15T00:00:00.0000000Z”,“2007-01-30T00:00:00.0000000Z”] | |
ATLANTIC NORTH | [0,0,0] | [“2006-12-31T00:00:00.0000000Z”,“2007-01-15T00:00:00.0000000Z”,“2007-01-30T00:00:00.0000000Z”] |
Use partition with shuffle
StormEvents
| partition hint.strategy=shuffle by EpisodeId
(
top 3 by DamageProperty
| project EpisodeId, State, DamageProperty
)
| count
Output
Count |
---|
22345 |
Compare hint.strategy=shuffle and hint.shufflekey=key
When you use hint.strategy=shuffle
, the shuffled operator will be shuffled by all the keys. In the following example, the query shuffles the data using both EpisodeId
and EventId
as keys:
StormEvents
| where StartTime > datetime(2007-01-01 00:00:00.0000000)
| join kind = inner hint.strategy=shuffle (StormEvents | where DamageCrops > 62000000) on EpisodeId, EventId
| count
Output
Count |
---|
14 |
The following query uses hint.shufflekey = key
. The query above is equivalent to this query.
StormEvents
| where StartTime > datetime(2007-01-01 00:00:00.0000000)
| join kind = inner hint.shufflekey = EpisodeId hint.shufflekey = EventId (StormEvents | where DamageCrops > 62000000) on EpisodeId, EventId
Output
Count |
---|
14 |
Shuffle the data with multiple keys
In some cases, the hint.strategy=shuffle
will be ignored, and the query won’t run in shuffle strategy. For example, in the following example, the join has summarize on its left side, so using hint.strategy=shuffle
won’t apply shuffle strategy to the query:
StormEvents
| where StartTime > datetime(2007-01-01 00:00:00.0000000)
| summarize count() by EpisodeId, EventId
| join kind = inner hint.strategy=shuffle (StormEvents | where DamageCrops > 62000000) on EpisodeId, EventId
Output
EpisodeId | EventId | … | EpisodeId1 | EventId1 | … |
---|---|---|---|---|---|
1030 | 4407 | … | 1030 | 4407 | … |
1030 | 13721 | … | 1030 | 13721 | … |
2477 | 12530 | … | 2477 | 12530 | … |
2103 | 10237 | … | 2103 | 10237 | … |
2103 | 10239 | … | 2103 | 10239 | … |
… | … | … | … | … | … |
To overcome this issue and run in shuffle strategy, choose the key that is common for the summarize
and join
operations. In this case, this key is EpisodeId
. Use the hint hint.shufflekey
to specify the shuffle key on the join
to hint.shufflekey = EpisodeId
:
StormEvents
| where StartTime > datetime(2007-01-01 00:00:00.0000000)
| summarize count() by EpisodeId, EventId
| join kind = inner hint.shufflekey=EpisodeId (StormEvents | where DamageCrops > 62000000) on EpisodeId, EventId
Output
EpisodeId | EventId | … | EpisodeId1 | EventId1 | … |
---|---|---|---|---|---|
1030 | 4407 | … | 1030 | 4407 | … |
1030 | 13721 | … | 1030 | 13721 | … |
2477 | 12530 | … | 2477 | 12530 | … |
2103 | 10237 | … | 2103 | 10237 | … |
2103 | 10239 | … | 2103 | 10239 | … |
… | … | … | … | … | … |
Use summarize with shuffle to improve performance
In this example, using the summarize
operator with shuffle
strategy improves performance. The source table has 150M records and the cardinality of the group by key is 10M, which is spread over 10 cluster nodes.
Using summarize
operator without shuffle
strategy, the query ends after 1:08 and the memory usage peak is ~3 GB:
orders
| summarize arg_max(o_orderdate, o_totalprice) by o_custkey
| where o_totalprice < 1000
| count
Output
Count |
---|
1086 |
While using shuffle
strategy with summarize
, the query ends after ~7 seconds and the memory usage peak is 0.43 GB:
orders
| summarize hint.strategy = shuffle arg_max(o_orderdate, o_totalprice) by o_custkey
| where o_totalprice < 1000
| count
Output
Count |
---|
1086 |
The following example demonstrates performance on a cluster that has two cluster nodes, with a table that has 60M records, where the cardinality of the group by key is 2M.
Running the query without hint.num_partitions
will use only two partitions (as cluster nodes number) and the following query will take ~1:10 mins:
lineitem
| summarize hint.strategy = shuffle dcount(l_comment), dcount(l_shipdate) by l_partkey
| consume
If setting the partitions number to 10, the query will end after 23 seconds:
lineitem
| summarize hint.strategy = shuffle hint.num_partitions = 10 dcount(l_comment), dcount(l_shipdate) by l_partkey
| consume
Use join with shuffle to improve performance
The following example shows how using shuffle
strategy with the join
operator improves performance.
The examples were sampled on a cluster with 10 nodes where the data is spread over all these nodes.
The query’s left-side source table has 15M records where the cardinality of the join
key is ~14M. The query’s right-side source has 150M records and the cardinality of the join
key is 10M. The query ends after ~28 seconds and the memory usage peak is 1.43 GB:
customer
| join
orders
on $left.c_custkey == $right.o_custkey
| summarize sum(c_acctbal) by c_nationkey
When using shuffle
strategy with a join
operator, the query ends after ~4 seconds and the memory usage peak is 0.3 GB:
customer
| join
hint.strategy = shuffle orders
on $left.c_custkey == $right.o_custkey
| summarize sum(c_acctbal) by c_nationkey
In another example, we try the same queries on a larger dataset with the following conditions:
- Left-side source of the
join
is 150M and the cardinality of the key is 148M. - Right-side source of the
join
is 1.5B, and the cardinality of the key is ~100M.
The query with just the join
operator hits limits and times-out after 4 mins. However, when using shuffle
strategy with the join
operator, the query ends after ~34 seconds and the memory usage peak is 1.23 GB.
The following example shows the improvement on a cluster that has two cluster nodes, with a table of 60M records, where the cardinality of the join
key is 2M.
Running the query without hint.num_partitions
will use only two partitions (as cluster nodes number) and the following query will take ~1:10 mins:
lineitem
| summarize dcount(l_comment), dcount(l_shipdate) by l_partkey
| join
hint.shufflekey = l_partkey part
on $left.l_partkey == $right.p_partkey
| consume
When setting the partitions number to 10, the query will end after 23 seconds:
lineitem
| summarize dcount(l_comment), dcount(l_shipdate) by l_partkey
| join
hint.shufflekey = l_partkey hint.num_partitions = 10 part
on $left.l_partkey == $right.p_partkey
| consume
39 - sort operator
Sorts the rows of the input table into order by one or more columns.
Syntax
T | sort by
column [asc
| desc
] [nulls first
| nulls last
] [,
…]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input to sort. |
column | scalar | ✔️ | The column of T by which to sort. The type of the column values must be numeric, date, time or string. |
asc or desc | string | asc sorts into ascending order, low to high. Default is desc , high to low. | |
nulls first or nulls last | string | nulls first will place the null values at the beginning and nulls last will place the null values at the end. Default for asc is nulls first . Default for desc is nulls last . |
Returns
A copy of the input table sorted in either ascending or descending order based on the provided column.
Using special floating-point values
When the input table contains the special values null
, NaN
, -inf
and +inf
, the order will be as follows:
Value | Ascending | Descending |
---|---|---|
Nulls first | null ,NaN ,-inf ,-5 ,0 ,5 ,+inf | null ,NaN ,+inf ,5 ,0 ,-5 |
Nulls last | -inf ,-5 ,0 ,+inf ,NaN ,null | +inf ,5 ,0 ,-5 ,NaN ,null |
Example
The following example shows storm events by state in alphabetical order with the most recent storms in each state appearing first.
StormEvents
| sort by State asc, StartTime desc
Output
This table only shows the top 10 query results.
StartTime | State | EventType | … |
---|---|---|---|
2007-12-28T12:10:00Z | ALABAMA | Hail | … |
2007-12-28T04:30:00Z | ALABAMA | Hail | … |
2007-12-28T04:16:00Z | ALABAMA | Hail | … |
2007-12-28T04:15:00Z | ALABAMA | Hail | … |
2007-12-28T04:13:00Z | ALABAMA | Hail | … |
2007-12-21T14:30:00Z | ALABAMA | Strong Wind | … |
2007-12-20T18:15:00Z | ALABAMA | Strong Wind | … |
2007-12-20T18:00:00Z | ALABAMA | Strong Wind | … |
2007-12-20T18:00:00Z | ALABAMA | Strong Wind | … |
2007-12-20T17:45:00Z | ALABAMA | Strong Wind | … |
2007-12-20T17:45:00Z | ALABAMA | Strong Wind | … |
40 - take operator
Return up to the specified number of rows.
There is no guarantee which records are returned, unless the source data is sorted. If the data is sorted, then the top values will be returned.
Syntax
take
NumberOfRows
Parameters
Name | Type | Required | Description |
---|---|---|---|
NumberOfRows | int | ✔️ | The number of rows to return. |
Paging of query results
Methods for implementing paging include:
- Export the result of a query to an external storage and paging through the generated data.
- Write a middle-tier application that provides a stateful paging API by caching the results of a Kusto query.
- Use pagination in Stored query results
Example
StormEvents | take 5
Related content
41 - top operator
Returns the first N records sorted by the specified column.
Syntax
T | top
NumberOfRows by
Expression [asc
| desc
] [nulls first
| nulls last
]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input to sort. |
NumberOfRows | int | ✔️ | The number of rows of T to return. |
Expression | string | ✔️ | The scalar expression by which to sort. |
asc or desc | string | Controls whether the selection is from the “bottom” or “top” of the range. Default desc . | |
nulls first or nulls last | string | Controls whether null values appear at the “bottom” or “top” of the range. Default for asc is nulls first . Default for desc is nulls last . |
Example
Show top three storms with most direct injuries.
StormEvents
| top 3 by InjuriesDirect
The below table shows only the relevant column. Run the query above to see more storm details for these events.
InjuriesDirect | … |
---|---|
519 | … |
422 | … |
200 | … |
Related content
- Use top-nested operator to produce hierarchical (nested) top results.
42 - top-hitters operator
Returns an approximation for the most popular distinct values, or the values with the largest sum, in the input.
Syntax
T |
top-hitters
NumberOfValues of
ValueExpression [ by
SummingExpression ]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input tabular expression. |
NumberOfValues | int, long, or real | ✔️ | The number of distinct values of ValueExpression. |
ValueExpression | string | ✔️ | An expression over the input table T whose distinct values are returned. |
SummingExpression | string | If specified, a numeric expression over the input table T whose sum per distinct value of ValueExpression establishes which values to emit. If not specified, the count of each distinct value of ValueExpression is used instead. |
Remarks
The first syntax (no SummingExpression) is conceptually equivalent to:
T
|
summarize
C``=``count()
by
ValueExpression
|
top
NumberOfValues by C
desc
The second syntax (with SummingExpression) is conceptually equivalent to:
T
|
summarize
S``=``sum(*SummingExpression*)
by
ValueExpression
|
top
NumberOfValues by S
desc
Examples
Get most frequent items
StormEvents
| top-hitters 5 of EventType
Output
EventType | approximate_count_EventType |
---|---|
Thunderstorm Wind | 13015 |
Hail | 12711 |
Flash Flood | 3688 |
Drought | 3616 |
Winter Weather | 3349 |
Get top hitters based on column value
The next example shows how to find the States with the most “Thunderstorm Wind” events.
StormEvents
| where EventType == "Thunderstorm Wind"
| top-hitters 10 of State
Output
State | approximate_sum_State |
---|---|
TEXAS | 830 |
GEORGIA | 609 |
MICHIGAN | 602 |
IOWA | 585 |
PENNSYLVANIA | 549 |
ILLINOIS | 533 |
NEW YORK | 502 |
VIRGINIA | 482 |
KANSAS | 476 |
OHIO | 455 |
43 - top-nested operator
The top-nested
operator performs hierarchical aggregation and value selection.
Imagine you have a table with sales information like regions, salespeople, and amounts sold. The top-nested
operator can help you answer complex questions, such as “What are the top five regions by sales, and who are the top three salespeople in each of those regions?”
The source data is partitioned based on the criteria set in the first top-nested
clause, such as region. Next, the operator picks the top records in each partition using an aggregation, such as adding sales amounts. Each subsequent top-nested
clause refines the partitions created by the previous clause, creating a hierarchy of more precise groups.
The result is a table with two columns per clause. One column holds the partitioning values, such as region, while the other column holds the outcomes of the aggregation calculation, like the sum of sales.
Syntax
T |
top-nested
[ N ] of
Expr [with
others
=
ConstExpr] by
Aggregation [asc
| desc
] [,
top-nested
… ]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input tabular expression. |
N | int | The number of top values to be returned for this hierarchy level. If omitted, all distinct values are returned. | |
Expr | string | ✔️ | An expression over the input record indicating which value to return for this hierarchy level. Typically, it refers to a column from T or involves a calculation like bin() on a column. Optionally, set an output column name as Name = Expr. |
ConstExpr | string | If specified, for each hierarchy level, one record is added with the value that is the aggregation over all records that didn’t make it to the top. | |
Aggregation | string | The aggregation function applied to records with the same Expr value. The result determines the top records. See Supported aggregation functions. Optionally, set an output column name as Name = Aggregation. |
Supported aggregation functions
The following aggregation functions are supported:
Returns
A table with two columns for each clause. One column contains unique values computed using Expr, and the other column shows the results obtained from the Aggregation calculation.
Using the with
others
clause
Using the top-nested
operator with with
others
adds the ability to see your top content contextualized in a wider data set. Evaluating your data in this way is valuable when rendering the data visually.
Include data from other columns
Only columns specified as a top-nested
clause Expr are displayed in the output table.
To include all values of a column at a specific level:
- Don’t specify the value of N.
- Use the column name as the value of Expr.
- Use
Ignore=max(1)
as the value of Aggregation. - Remove the unnecessary
Ignore
column with project-away.
For an example, see Most recent events per state with other column data.
Performance considerations
The number of records can grow exponentially with the number of top-nested
clauses, and record growth is even faster if the N parameter is not specified. This operator can consume a considerable amount of resources.
If the aggregation distribution is irregular, limit the number of distinct values to return by specifying N. Then, use the with
others
=
ConstExpr clause to get a sense of the weight of all other cases.
Examples
Top damaged states, event types, and end locations by property damage
The following query partitions the StormEvents
table by the State
column and calculates the total property damage for each state. The query selects the top two states with the largest amount of property damage. Within these top two states, the query groups the data by EventType
and selects the top three event types with the most damage. Then the query groups the data by EndLocation
and selects the EndLocation
with the highest damage. Only one EndLocation
value appears in the results, possibly due to the large nature of the storm events or not documenting the end location.
StormEvents // Data source.
| top-nested 2 of State by sum(DamageProperty), // Top 2 States by total damaged property.
top-nested 3 of EventType by sum(DamageProperty), // Top 3 EventType by total damaged property for each State.
top-nested 1 of EndLocation by sum(DamageProperty) // Top 1 EndLocation by total damaged property for each EventType and State.
| project State, EventType, EndLocation, StateTotalDamage = aggregated_State, EventTypeTotalDamage = aggregated_EventType, EndLocationDamage = aggregated_EndLocation
Output
State | EventType | EndLocation | StateTotalDamage | EventTypeTotalDamage | EndLocationDamage |
---|---|---|---|---|---|
CALIFORNIA | Wildfire | 1445937600 | 1326315000 | 1326315000 | |
CALIFORNIA | HighWind | 1445937600 | 61320000 | 61320000 | |
CALIFORNIA | DebrisFlow | 1445937600 | 48000000 | 48000000 | |
OKLAHOMA | IceStorm | 915470300 | 826000000 | 826000000 | |
OKLAHOMA | WinterStorm | 915470300 | 40027000 | 40027000 | |
OKLAHOMA | Flood | COMMERCE | 915470300 | 21485000 | 20000000 |
Top five states with property damage with
others
grouped
The following example uses the top-nested
operator to identify the top five states with the most property damage and uses the with
others
clause to group damaged property for all other states. It then visualizes damaged property for the top five states and all other states as a piechart
using the render
command.
StormEvents
| top-nested 5 of State with others="OtherStates" by sum(DamageProperty)
| render piechart
Output
Most recent events per state with other column data
The following query retrieves the two most recent events for each US state with relevant event details. It uses max(1)
within certain columns to propagate data without using the top-nested selection logic. The generated Ignore
aggregation columns are removed using project-away
.
StormEvents
| top-nested of State by Ignore0=max(1), // Partition the data by each unique value of state.
top-nested 2 of StartTime by Ignore1=max(StartTime), // Get the 2 most recent events in each state.
top-nested of EndTime by Ignore2=max(1), // Append the EndTime for each event.
top-nested of EpisodeId by Ignore3=max(1) // Append the EpisodeId for each event.
| project-away Ignore* // Remove the unnecessary aggregation columns.
| order by State asc, StartTime desc // Sort results alphabetically and chronologically.
Latest records per identity with other column data
The following top-nested
example extracts the latest records per identity and builds on the concepts introduced in the previous example. The first top-nested
clause partitions the data by distinct values of id
using Ignore0=max(1)
as a placeholder. For each id
, it identifies the two most recent records based on the timestamp
. Other information is appended using a top-nested
operator without specifying a count and using Ignore2=max(1)
as a placeholder. Finally, unnecessary aggregation columns are removed using the project-away
operator.
datatable(id: string, timestamp: datetime, otherInformation: string) // Create a source datatable.
[
"Barak", datetime(2015-01-01), "1",
"Barak", datetime(2016-01-01), "2",
"Barak", datetime(2017-01-20), "3",
"Donald", datetime(2017-01-20), "4",
"Donald", datetime(2017-01-18), "5",
"Donald", datetime(2017-01-19), "6"
]
| top-nested of id by Ignore0=max(1), // Partition the data by each unique value of id.
top-nested 2 of timestamp by Ignore1=max(timestamp), // Get the 2 most recent events for each state.
top-nested of otherInformation by Ignore2=max(1) // Append otherInformation for each event.
| project-away Ignore0, Ignore1, Ignore2 // Remove the unnecessary aggregation columns.
Output
id | timestamp | otherInformation |
---|---|---|
Barak | 2016-01-01T00:00:00Z | 2 |
Donald | 2017-01-19T00:00:00Z | 6 |
Barak | 2017-01-20T00:00:00Z | 3 |
Donald | 2017-01-20T00:00:00Z | 4 |
Related content
44 - union operator
Takes two or more tables and returns the rows of all of them.
Syntax
[ T |
] union
[ UnionParameters ] [kind=
inner
|outer
] [withsource=
ColumnName] [isfuzzy=
true
|false
] Tables
[ T |
] union
[kind=
inner
|outer
] [withsource=
ColumnName] [isfuzzy=
true
|false
] Tables
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | The input tabular expression. | |
UnionParameters | string | Zero or more space-separated parameters in the form of Name = Value that control the behavior of the row-match operation and execution plan. See supported union parameters. | |
kind | string | Either inner or outer . inner causes the result to have the subset of columns that are common to all of the input tables. outer causes the result to have all the columns that occur in any of the inputs. Cells that aren’t defined by an input row are set to null . The default is outer .With outer , the result has all the columns that occur in any of the inputs, one column for each name and type occurrences. This means that if a column appears in multiple tables and has multiple types, it has a corresponding column for each type in the union’s result. This column name is suffixed with a ‘_’ followed by the origin column type. | |
withsource= ColumnName | string | If specified, the output includes a column called ColumnName whose value indicates which source table has contributed each row. If the query effectively references tables from more than one database including the default database, then the value of this column has a table name qualified with the database. cluster and database qualifications are present in the value if more than one cluster is referenced. | |
isfuzzy | bool | If set to true , allows fuzzy resolution of union legs. The set of union sources is reduced to the set of table references that exist and are accessible at the time while analyzing the query and preparing for execution. If at least one such table was found, any resolution failure yields a warning in the query status results, but won’t prevent the query execution. If no resolutions were successful, the query returns an error. The default is false .isfuzzy=true only applies to the union sources resolution phase. Once the set of source tables is determined, possible additional query failures won’t be suppressed. | |
Tables | string | One or more comma-separated table references, a query expression enclosed with parenthesis, or a set of tables specified with a wildcard. For example, E* would form the union of all the tables in the database whose names begin E . |
Supported union parameters
Name | Type | Required | Description |
---|---|---|---|
hint.concurrency | int | Hints the system how many concurrent subqueries of the union operator should be executed in parallel. The default is the number of CPU cores on the single node of the cluster (2 to 16). | |
hint.spread | int | Hints the system how many nodes should be used by the concurrent union subqueries execution. The default is 1. |
Name | Type | Required | Description |
---|---|---|---|
T | string | The input tabular expression. | |
kind | string | Either inner or outer . inner causes the result to have the subset of columns that are common to all of the input tables. outer causes the result to have all the columns that occur in any of the inputs. Cells that aren’t defined by an input row are set to null . The default is outer .With outer , the result has all the columns that occur in any of the inputs, one column for each name and type occurrences. This means that if a column appears in multiple tables and has multiple types, it has a corresponding column for each type in the union’s result. This column name is suffixed with a ‘_’ followed by the origin column type. | |
withsource= ColumnName | string | If specified, the output includes a column called ColumnName whose value indicates which source table has contributed each row. If the query effectively references tables from more than one database including the default database, then the value of this column has a table name qualified with the database. cluster and database qualifications are present in the value if more than one cluster is referenced. | |
isfuzzy | bool | If set to true , allows fuzzy resolution of union legs. The set of union sources is reduced to the set of table references that exist and are accessible at the time while analyzing the query and preparing for execution. If at least one such table was found, any resolution failure yields a warning in the query status results, but won’t prevent the query execution. If no resolutions were successful, the query returns an error. However, in cross-workspace and cross-app queries, if any of the workspaces or apps is not found, the query will fail. The default is false .isfuzzy=true only applies to the union sources resolution phase. Once the set of source tables is determined, possible additional query failures won’t be suppressed. | |
Tables | string | One or more comma-separated table references, a query expression enclosed with parenthesis, or a set of tables specified with a wildcard. For example, E* would form the union of all the tables in the database whose names begin E .Whenever the list of tables is known, refrain from using wildcards. Some workspaces contains very large number of tables that would lead to inefficient execution. Tables may also be added over time leading to unpredicted results. |
Returns
A table with as many rows as there are in all the input tables.
Examples
Tables with string in name or column
union K* | where * has "Kusto"
Rows from all tables in the database whose name starts with K
, and in which any column includes the word Kusto
.
Distinct count
union withsource=SourceTable kind=outer Query, Command
| where Timestamp > ago(1d)
| summarize dcount(UserId)
The number of distinct users that have produced
either a Query
event or a Command
event over the past day. In the result, the ‘SourceTable’ column will indicate either “Query” or “Command”.
Query
| where Timestamp > ago(1d)
| union withsource=SourceTable kind=outer
(Command | where Timestamp > ago(1d))
| summarize dcount(UserId)
This more efficient version produces the same result. It filters each table before creating the union.
Using isfuzzy=true
// Using union isfuzzy=true to access non-existing view:
let View_1 = view () { print x=1 };
let View_2 = view () { print x=1 };
let OtherView_1 = view () { print x=1 };
union isfuzzy=true
(View_1 | where x > 0),
(View_2 | where x > 0),
(View_3 | where x > 0)
| count
Output
Count |
---|
2 |
Observing Query Status - the following warning returned:
Failed to resolve entity 'View_3'
// Using union isfuzzy=true and wildcard access:
let View_1 = view () { print x=1 };
let View_2 = view () { print x=1 };
let OtherView_1 = view () { print x=1 };
union isfuzzy=true View*, SomeView*, OtherView*
| count
Output
Count |
---|
3 |
Observing Query Status - the following warning returned:
Failed to resolve entity 'SomeView*'
Source columns types mismatch
let View_1 = view () { print x=1 };
let View_2 = view () { print x=toint(2) };
union withsource=TableName View_1, View_2
Output
TableName | x_long | x_int |
---|---|---|
View_1 | 1 | |
View_2 | 2 |
let View_1 = view () { print x=1 };
let View_2 = view () { print x=toint(2) };
let View_3 = view () { print x_long=3 };
union withsource=TableName View_1, View_2, View_3
Output
TableName | x_long1 | x_int | x_long |
---|---|---|---|
View_1 | 1 | ||
View_2 | 2 | ||
View_3 | 3 |
Column x
from View_1
received the suffix _long
, and as a column named x_long
already exists in the result schema, the column names were de-duplicated, producing a new column- x_long1
45 - where operator
Filters a table to the subset of rows that satisfy a predicate.
Syntax
T | where
Predicate
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | Tabular input whose records are to be filtered. |
Predicate | string | ✔️ | Expression that evaluates to a bool for each row in T. |
Returns
Rows in T for which Predicate is true
.
Performance tips
Use simple comparisons between column names and constants. (‘Constant’ means constant over the table - so
now()
andago()
are OK, and so are scalar values assigned using alet
statement.)For example, prefer
where Timestamp >= ago(1d)
towhere bin(Timestamp, 1d) == ago(1d)
.Simplest terms first: If you have multiple clauses conjoined with
and
, put first the clauses that involve just one column. SoTimestamp > ago(1d) and OpId == EventId
is better than the other way around.
For more information, see the summary of available String operators and the summary of available Numerical operators.
Examples
Order comparisons by complexity
The following query returns storm records that report damaged property, are floods, and start and end in different places.
Notice that we put the comparison between two columns last, as the where operator can’t use the index and forces a scan.
StormEvents
| project DamageProperty, EventType, BeginLocation, EndLocation
| where DamageProperty > 0
and EventType == "Flood"
and BeginLocation != EndLocation
The following table only shows the top 10 results. To see the full output, run the query.
DamageProperty | EventType | BeginLocation | EndLocation |
---|---|---|---|
5000 | Flood | FAYETTE CITY LOWBER | |
5000 | Flood | MORRISVILLE WEST WAYNESBURG | |
10000 | Flood | COPELAND HARRIS GROVE | |
5000 | Flood | GLENFORD MT PERRY | |
25000 | Flood | EAST SENECA BUFFALO AIRPARK ARPT | |
20000 | Flood | EBENEZER SLOAN | |
10000 | Flood | BUEL CALHOUN | |
10000 | Flood | GOODHOPE WEST MILFORD | |
5000 | Flood | DUNKIRK FOREST | |
20000 | Flood | FARMINGTON MANNINGTON |
Check if column contains string
The following query returns the rows in which the word “cow” appears in any column.
StormEvents
| where * has "cow"