This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Queries

1 - Aggregation functions

1.1 - Aggregation Functions

Learn how to use aggregation functions to perform calculations on a set of values and return a single value.

An aggregation function performs a calculation on a set of values, and returns a single value. These functions are used in conjunction with the summarize operator. This article lists all available aggregation functions grouped by type. For scalar functions, see Scalar function types.

Binary functions

Function NameDescription
binary_all_and()Returns aggregated value using the binary AND of the group.
binary_all_or()Returns aggregated value using the binary OR of the group.
binary_all_xor()Returns aggregated value using the binary XOR of the group.

Dynamic functions

Function NameDescription
buildschema()Returns the minimal schema that admits all values of the dynamic input.
make_bag(), make_bag_if()Returns a property bag of dynamic values within the group without/with a predicate.
make_list(), make_list_if()Returns a list of all the values within the group without/with a predicate.
make_list_with_nulls()Returns a list of all the values within the group, including null values.
make_set(), make_set_if()Returns a set of distinct values within the group without/with a predicate.

Row selector functions

Function NameDescription
arg_max()Returns one or more expressions when the argument is maximized.
arg_min()Returns one or more expressions when the argument is minimized.
take_any(), take_anyif()Returns a random non-empty value for the group without/with a predicate.

Statistical functions

Function NameDescription
avg()Returns an average value across the group.
avgif()Returns an average value across the group (with predicate).
count(), countif()Returns a count of the group without/with a predicate.
count_distinct(), count_distinctif()Returns a count of unique elements in the group without/with a predicate.
dcount(), dcountif()Returns an approximate distinct count of the group elements without/with a predicate.
hll()Returns the HyperLogLog (HLL) results of the group elements, an intermediate value of the dcount approximation.
hll_if()Returns the HyperLogLog (HLL) results of the group elements, an intermediate value of the dcount approximation (with predicate).
hll_merge()Returns a value for merged HLL results.
max(), maxif()Returns the maximum value across the group without/with a predicate.
min(), minif()Returns the minimum value across the group without/with a predicate.
percentile()Returns a percentile estimation of the group.
percentiles()Returns percentile estimations of the group.
percentiles_array()Returns the percentile approximates of the array.
percentilesw()Returns the weighted percentile approximate of the group.
percentilesw_array()Returns the weighted percentile approximate of the array.
stdev(), stdevif()Returns the standard deviation across the group for a population that is considered a sample without/with a predicate.
stdevp()Returns the standard deviation across the group for a population that is considered representative.
sum(), sumif()Returns the sum of the elements within the group without/with a predicate.
tdigest()Returns an intermediate result for the percentiles approximation, the weighted percentile approximate of the group.
tdigest_merge()Returns the merged tdigest value across the group.
variance(), varianceif()Returns the variance across the group without/with a predicate.
variancep()Returns the variance across the group for a population that is considered representative.

1.2 - arg_max() (aggregation function)

Learn how to use the arg_max() aggregation function to find a row in a table that maximizes the input expression.

Finds a row in the table that maximizes the specified expression. It returns all columns of the input table or specified columns.

Syntax

arg_max (ExprToMaximize, * | ExprToReturn [, …])

Parameters

NameTypeRequiredDescription
ExprToMaximizestring✔️The expression for which the maximum value is determined.
ExprToReturnstring✔️The expression determines which columns’ values are returned, from the row that has the maximum value for ExprToMaximize. Use a wildcard * to return all columns.

Returns

Returns a row in the table that maximizes the specified expression ExprToMaximize, and the values of columns specified in ExprToReturn.

Examples

Find maximum latitude

The following example finds the maximum latitude of a storm event in each state.

StormEvents 
| summarize arg_max(BeginLat, BeginLocation) by State

Output

The results table displays only the first 10 rows.

StateBeginLatBeginLocation
MISSISSIPPI34.97BARTON
VERMONT45NORTH TROY
AMERICAN SAMOA-14.2OFU
HAWAII22.2113PRINCEVILLE
MINNESOTA49.35ARNESEN
RHODE ISLAND42WOONSOCKET
INDIANA41.73FREMONT
WEST VIRGINIA40.62CHESTER
SOUTH CAROLINA35.18LANDRUM
TEXAS36.4607DARROUZETT

Find last state fatal event

The following example finds the last time an event with a direct death happened in each state, showing all the columns.

The query first filters the events to include only those events where there was at least one direct death. Then the query returns the entire row with the most recent StartTime.

StormEvents
| where DeathsDirect > 0
| summarize arg_max(StartTime, *) by State

Output

The results table displays only the first 10 rows and first three columns.

StateStartTimeEndTime
GUAM2007-01-27T11:15:00Z2007-01-27T11:30:00Z
MASSACHUSETTS2007-02-03T22:00:00Z2007-02-04T10:00:00Z
AMERICAN SAMOA2007-02-17T13:00:00Z2007-02-18T11:00:00Z
IDAHO2007-02-17T13:00:00Z2007-02-17T15:00:00Z
DELAWARE2007-02-25T13:00:00Z2007-02-26T01:00:00Z
WYOMING2007-03-10T17:00:00Z2007-03-10T17:00:00Z
NEW MEXICO2007-03-23T18:42:00Z2007-03-23T19:06:00Z
INDIANA2007-05-15T14:14:00Z2007-05-15T14:14:00Z
MONTANA2007-05-18T14:20:00Z2007-05-18T14:20:00Z
LAKE MICHIGAN2007-06-07T13:00:00Z2007-06-07T13:00:00Z

Handle nulls

The following example demonstrates null handling.

datatable(Fruit: string, Color: string, Version: int) [
    "Apple", "Red", 1,
    "Apple", "Green", int(null),
    "Banana", "Yellow", int(null),
    "Banana", "Green", int(null),
    "Pear", "Brown", 1,
    "Pear", "Green", 2,
]
| summarize arg_max(Version, *) by Fruit

Output

FruitVersionColor
Apple1Red
BananaYellow
Pear2Green

Comparison to max()

The arg_max() function differs from the max() function. The arg_max() function allows you to return other columns along with the maximum value, and max() only returns the maximum value itself.

Examples

arg_max()

Find the last time an event with a direct death happened, showing all the columns in the table.

The query first filters the events to only include events where there was at least one direct death. Then the query returns the entire row with the most recent (maximum) StartTime.

StormEvents
| where DeathsDirect > 0
| summarize arg_max(StartTime, *)

The results table returns all the columns for the row containing the highest value in the expression specified.

| StartTime | EndTime | EpisodeId | EventId | State | EventType | … | |–|–|–|–| | 2007-12-31T15:00:00Z | 2007-12-31T15:00:00 | 12688 | 69700 | UTAH | Avalanche | … |

max()

Find the last time an event with a direct death happened.

The query filters events to only include events where there is at least one direct death, and then returns the maximum value for StartTime.

StormEvents
| where DeathsDirect > 0
| summarize max(StartTime)

The results table returns the maximum value of StartTime, without returning other columns for this record.

max_StartTime
2007-12-31T15:00:00Z

1.3 - arg_min() (aggregation function)

Learn how to use the arg_min() aggregation function to find a row in a table that minimizes the input expression.

Finds a row in the table that minimizes the specified expression. It returns all columns of the input table or specified columns.

Syntax

arg_min (ExprToMinimize, * | ExprToReturn [, …])

Parameters

NameTypeRequiredDescription
ExprToMinimizestring✔️The expression for which the minimum value is determined.
ExprToReturnstring✔️The expression determines which columns’ values are returned, from the row that has the minimum value for ExprToMinimize. Use a wildcard * to return all columns.

Null handling

When ExprToMinimize is null for all rows in a table, one row in the table is picked. Otherwise, rows where ExprToMinimize is null are ignored.

Returns

Returns a row in the table that minimizes ExprToMinimize, and the values of columns specified in ExprToReturn. Use or * to return the entire row.

Examples

Find the minimum latitude of a storm event in each state.

StormEvents 
| summarize arg_min(BeginLat, BeginLocation) by State

Output

The results table shown includes only the first 10 rows.

StateBeginLatBeginLocation
AMERICAN SAMOA-14.3PAGO PAGO
CALIFORNIA32.5709NESTOR
MINNESOTA43.5BIGELOW
WASHINGTON45.58WASHOUGAL
GEORGIA30.67FARGO
ILLINOIS37CAIRO
FLORIDA24.6611SUGARLOAF KEY
KENTUCKY36.5HAZEL
TEXAS25.92BROWNSVILLE
OHIO38.42SOUTH PT

Find the first time an event with a direct death happened in each state, showing all of the columns.

The query first filters the events to only include those where there was at least one direct death. Then the query returns the entire row with the lowest value for StartTime.

StormEvents
| where DeathsDirect > 0
| summarize arg_min(StartTime, *) by State

Output

The results table shown includes only the first 10 rows and first 3 columns.

StateStartTimeEndTime
INDIANA2007-01-01T00:00:00Z2007-01-22T18:49:00Z
FLORIDA2007-01-03T10:55:00Z2007-01-03T10:55:00Z
NEVADA2007-01-04T09:00:00Z2007-01-05T14:00:00Z
LOUISIANA2007-01-04T15:45:00Z2007-01-04T15:52:00Z
WASHINGTON2007-01-09T17:00:00Z2007-01-09T18:00:00Z
CALIFORNIA2007-01-11T22:00:00Z2007-01-24T10:00:00Z
OKLAHOMA2007-01-12T00:00:00Z2007-01-18T23:59:00Z
MISSOURI2007-01-13T03:00:00Z2007-01-13T08:30:00Z
TEXAS2007-01-13T10:30:00Z2007-01-13T14:30:00Z
ARKANSAS2007-01-14T03:00:00Z2007-01-14T03:00:00Z

The following example demonstrates null handling.

datatable(Fruit: string, Color: string, Version: int) [
    "Apple", "Red", 1,
    "Apple", "Green", int(null),
    "Banana", "Yellow", int(null),
    "Banana", "Green", int(null),
    "Pear", "Brown", 1,
    "Pear", "Green", 2,
]
| summarize arg_min(Version, *) by Fruit

Output

FruitVersionColor
Apple1Red
BananaYellow
Pear1Brown

Comparison to min()

The arg_min() function differs from the min() function. The arg_min() function allows you to return additional columns along with the minimum value, and min() only returns the minimum value itself.

Examples

arg_min()

Find the first time an event with a direct death happened, showing all the columns in the table.

The query first filters the events to only include those where there was at least one direct death. Then the query returns the entire row with the lowest value for StartTime.

StormEvents
| where DeathsDirect > 0
| summarize arg_min(StartTime, *)

The results table returns all the columns for the row containing the lowest value in the expression specified.

| StartTime | EndTime | EpisodeId | EventId | State | EventType | … | |–|–|–|–| | 2007-01-01T00:00:00Z | 2007-01-22T18:49:00Z | 2408 | 11929 | INDIANA | Flood | … |

min()

Find the first time an event with a direct death happened.

The query filters events to only include those where there is at least one direct death, and then returns the minimum value for StartTime.

StormEvents
| where DeathsDirect > 0
| summarize min(StartTime)

The results table returns the lowest value in the specific column only.

min_StartTime
2007-01-01T00:00:00Z

1.4 - avg() (aggregation function)

Learn how to use the avg() function to calculate the average value of an expression.

Calculates the average (arithmetic mean) of expr across the group.

Syntax

avg(expr)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression used for aggregation calculation. Records with null values are ignored and not included in the calculation.

Returns

Returns the average value of expr across the group.

Example

The following example returns the average number of damaged crops per state.

StormEvents
| summarize AvgDamageToCrops = avg(DamageCrops) by State

The results table shown includes only the first 10 rows.

StateAvgDamageToCrops
TEXAS7524.569241
KANSAS15366.86671
IOWA4332.477535
ILLINOIS44568.00198
MISSOURI340719.2212
GEORGIA490702.5214
MINNESOTA2835.991494
WISCONSIN17764.37838
NEBRASKA21366.36467
NEW YORK5.714285714

1.5 - avgif() (aggregation function)

Learn how to use the avgif() function to return the average value of an expression where the predicate evaluates to true.

Calculates the average of expr in records for which predicate evaluates to true.

Syntax

avgif (expr, predicate)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression used for aggregation calculation. Records with null values are ignored and not included in the calculation.
predicatestring✔️The predicate that if true, the expr calculated value is added to the average.

Returns

Returns the average value of expr in records where predicate evaluates to true.

Example

The following example calculates the average damage by state in cases where there was any damage.

StormEvents
| summarize Averagedamage=tolong(avg( DamageCrops)),AverageWhenDamage=tolong(avgif(DamageCrops,DamageCrops >0)) by State

Output

The results table shown includes only the first 10 rows.

StateAveragedamageAveragewhendamage
TEXAS7524491291
KANSAS15366695021
IOWA433228203
ILLINOIS445682574757
MISSOURI3407198806281
GEORGIA49070257239005
MINNESOTA2835144175
WISCONSIN17764438188
NEBRASKA21366187726
NEW YORK510000

1.6 - binary_all_and() (aggregation function)

Learn how to use the binary_all_and() function to aggregate values using the binary AND operation.

Accumulates values using the binary AND operation for each summarization group, or in total if a group isn’t specified.

Syntax

binary_all_and (expr)

Parameters

NameTypeRequiredDescription
exprlong✔️The value used for the binary AND calculation.

Returns

Returns an aggregated value using the binary AND operation over records for each summarization group, or in total if a group isn’t specified.

Example

The following example produces CAFEF00D using binary AND operations:

datatable(num:long)
[
  0xFFFFFFFF, 
  0xFFFFF00F,
  0xCFFFFFFD,
  0xFAFEFFFF,
]
| summarize result = toupper(tohex(binary_all_and(num)))

Output

result
CAFEF00D

1.7 - binary_all_or() (aggregation function)

Learn how to use the binary_all_or() function to aggregate values using the binary OR operation.

Accumulates values using the binary OR operation for each summarization group, or in total if a group isn’t specified.

Syntax

binary_all_or (expr)

Parameters

NameTypeRequiredDescription
exprlong✔️The value used for the binary OR calculation.

Returns

Returns an aggregated value using the binary OR operation over records for each summarization group, or in total if a group isn’t specified.

Example

The following example produces CAFEF00D using binary OR operations:

datatable(num:long)
[
  0x88888008,
  0x42000000,
  0x00767000,
  0x00000005, 
]
| summarize result = toupper(tohex(binary_all_or(num)))

Output

result
CAFEF00D

1.8 - binary_all_xor() (aggregation function)

Learn how to use the binary_all_xor() function to aggregate values using the binary XOR operation.

Accumulates values using the binary XOR operation for each summarization group, or in total if a group is not specified.

Syntax

binary_all_xor (expr)

Parameters

NameTypeRequiredDescription
exprlong✔️The value used for the binary XOR calculation.

Returns

Returns a value that is aggregated using the binary XOR operation over records for each summarization group, or in total if a group isn’t specified.

Example

The following example produces CAFEF00D using binary XOR operations:

datatable(num:long)
[
  0x44404440,
  0x1E1E1E1E,
  0x90ABBA09,
  0x000B105A,
]
| summarize result = toupper(tohex(binary_all_xor(num)))

Output

results
CAFEF00D

1.9 - buildschema() (aggregation function)

Learn how to use the buildschema() function to build a table schema from a dynamic expression.

Builds the minimal schema that admits all values of DynamicExpr.

Syntax

buildschema (DynamicExpr)

Parameters

NameTypeRequiredDescription
DynamicExprdynamic✔️Expression used for the aggregation calculation.

Returns

Returns the minimal schema that admits all values of DynamicExpr.

Example

The following example builds a schema based on:

  • {"x":1, "y":3.5}
  • {"x":"somevalue", "z":[1, 2, 3]}
  • {"y":{"w":"zzz"}, "t":["aa", "bb"], "z":["foo"]}
datatable(value: dynamic) [
    dynamic({"x":1, "y":3.5}),
    dynamic({"x":"somevalue", "z":[1, 2, 3]}),
    dynamic({"y":{"w":"zzz"}, "t":["aa", "bb"], "z":["foo"]})
]
| summarize buildschema(value)

Results

schema_value
{“x”:[“long”,“string”],“y”:[“double”,{“w”:“string”}],“z”:{"indexer":[“long”,“string”]},“t”:{"indexer":“string”}}

Schema breakdown

In the resulting schema:

  • The root object is a container with four properties named x, y, z, and t.
  • Property x is either type long or type string.
  • Property y is either type double or another container with a property w of type string.
  • Property z is an array, indicated by the indexer keyword, where each item can be either type long or type string.
  • Property t is an array, indicated by the indexer keyword, where each item is a string.
  • Every property is implicitly optional, and any array might be empty.

1.10 - count_distinct() (aggregation function) - (preview)

Learn how to use the count_distinct() (aggregation function) to count unique values specified by a scalar expression per summary group.

Counts unique values specified by the scalar expression per summary group, or the total number of unique values if the summary group is omitted.

If you only need an estimation of unique values count, we recommend using the less resource-consuming dcount aggregation function.

To count only records for which a predicate returns true, use the count_distinctif aggregation function.

Syntax

count_distinct (expr)

Parameters

NameTypeRequiredDescription
exprscalar✔️The expression whose unique values are to be counted.

Returns

Long integer value indicating the number of unique values of expr per summary group.

Example

The following example shows how many types of storm events happened in each state.

Function performance can be degraded when operating on multiple data sources from different clusters.

StormEvents
| summarize UniqueEvents=count_distinct(EventType) by State
| top 5 by UniqueEvents

Output

StateUniqueEvents
TEXAS27
CALIFORNIA26
PENNSYLVANIA25
GEORGIA24
NORTH CAROLINA23

1.11 - count_distinctif() (aggregation function) - (preview)

Learn how to use the count_distinctif() function to count unique values of a scalar expression in records for which the predicate evaluates to true.

Conditionally counts unique values specified by the scalar expression per summary group, or the total number of unique values if the summary group is omitted. Only records for which predicate evaluates to true are counted.

If you only need an estimation of unique values count, we recommend using the less resource-consuming dcountif aggregation function.

Syntax

count_distinctif (expr, predicate)

Parameters

NameTypeRequiredDescription
exprscalar✔️The expression whose unique values are to be counted.
predicatestring✔️The expression used to filter records to be aggregated.

Returns

Integer value indicating the number of unique values of expr per summary group, for all records for which the predicate evaluates to true.

Example

The following example shows how many types of death-causing storm events happened in each state. Only storm events with a nonzero count of deaths are counted.

StormEvents
| summarize UniqueFatalEvents=count_distinctif(EventType,(DeathsDirect + DeathsIndirect)>0) by State
| where UniqueFatalEvents > 0
| top 5 by UniqueFatalEvents

Output

StateUniqueFatalEvents
TEXAS12
CALIFORNIA12
OKLAHOMA10
NEW YORK9
KANSAS9

1.12 - count() (aggregation function)

Learn how to use the count() function to count the number of records in a group.

Counts the number of records per summarization group, or total if summarization is done without grouping.

To only count records for which a predicate returns true, use countif().

Syntax

count()

Returns

Returns a count of the records per summarization group, or in total if summarization is done without grouping.

Example

The following example returns a count of events in states:

StormEvents
| summarize Count=count() by State

Output

StateCount
TEXAS4701
KANSAS3166
IOWA2337
ILLINOIS2022
MISSOURI2016
GEORGIA1983
MINNESOTA1881
WISCONSIN1850
NEBRASKA1766
NEW YORK1750

1.13 - countif() (aggregation function)

Learn how to use the countif() function to count the rows where the predicate evaluates to true.

Counts the rows in which predicate evaluates to true.

Syntax

countif (predicate)

Parameters

NameTypeRequiredDescription
predicatestring✔️The expression used for aggregation calculation. The value can be any scalar expression with a return type of bool.

Returns

Returns a count of rows in which predicate evaluates to true.

Examples

Count storms by state

This example shows the number of storms with damage to crops by state.

StormEvents
| summarize TotalCount=count(),TotalWithDamage=countif(DamageCrops >0) by State

The results table shown includes only the first 10 rows.

StateTotalCountTotalWithDamage
TEXAS470172
KANSAS316670
IOWA2337359
ILLINOIS202235
MISSOURI201678
GEORGIA198317
MINNESOTA188137
WISCONSIN185075
NEBRASKA1766201
NEW YORK17501

Count based on string length

This example shows the number of names with more than four letters.

let T = datatable(name:string, day_of_birth:long)
[
   "John", 9,
   "Paul", 18,
   "George", 25,
   "Ringo", 7
];
T
| summarize countif(strlen(name) > 4)

Output

countif_
2

1.14 - dcount() (aggregation function)

Learn how to use the dcount() function to return an estimate of the number of distinct values of an expression within a group.

Calculates an estimate of the number of distinct values that are taken by a scalar expression in the summary group.

Syntax

dcount (expr[, accuracy])

Parameters

NameTypeRequiredDescription
exprstring✔️The input whose distinct values are to be counted.
accuracyintThe value that defines the requested estimation accuracy. The default value is 1. See Estimation accuracy for supported values.

Returns

Returns an estimate of the number of distinct values of expr in the group.

Example

This example shows how many types of storm events happened in each state.

StormEvents
| summarize DifferentEvents=dcount(EventType) by State
| order by DifferentEvents

The results table shown includes only the first 10 rows.

StateDifferentEvents
TEXAS27
CALIFORNIA26
PENNSYLVANIA25
GEORGIA24
ILLINOIS23
MARYLAND23
NORTH CAROLINA23
MICHIGAN22
FLORIDA22
OREGON21
KANSAS21

Estimation accuracy

1.15 - dcountif() (aggregation function)

Learn how to use the dcountif() function to return an estimate of the number of distinct values of an expression for rows where the predicate evaluates to true.

Estimates the number of distinct values of expr for rows in which predicate evaluates to true.

Syntax

dcountif (expr, predicate, [, accuracy])

Parameters

NameTypeRequiredDescription
exprstring✔️The expression used for the aggregation calculation.
predicatestring✔️The expression used to filter rows.
accuracyintThe control between speed and accuracy. If unspecified, the default value is 1. See Estimation accuracy for supported values.

Returns

Returns an estimate of the number of distinct values of expr for rows in which predicate evaluates to true.

Example

This example shows how many types of fatal storm events happened in each state.

StormEvents
| summarize DifferentFatalEvents=dcountif(EventType,(DeathsDirect + DeathsIndirect)>0) by State
| where DifferentFatalEvents > 0
| order by DifferentFatalEvents 

The results table shown includes only the first 10 rows.

StateDifferentFatalEvents
CALIFORNIA12
TEXAS12
OKLAHOMA10
ILLINOIS9
KANSAS9
NEW YORK9
NEW JERSEY7
WASHINGTON7
MICHIGAN7
MISSOURI7

Estimation accuracy

1.16 - hll_if() (aggregation function)

Learn how to use the hll_if() function to calculate the intermediate results of the dcount() function.

Calculates the intermediate results of dcount in records for which the predicate evaluates to true.

Read about the underlying algorithm (HyperLogLog) and the estimation accuracy.

Syntax

hll_if (expr, predicate [, accuracy])

Parameters

NameTypeRequiredDescription
exprstring✔️The expression used for the aggregation calculation.
predicatestring✔️The Expr used to filter records to add to the intermediate result of dcount.
accuracyintThe value that controls the balance between speed and accuracy. If unspecified, the default value is 1. For supported values, see Estimation accuracy.

Returns

Returns the intermediate results of distinct count of Expr for which Predicate evaluates to true.

Examples

The following query results in the number of unique flood event sources in Iowa and Kansas. It uses the hll_if() function to show only flood events.

StormEvents
| where State in ("IOWA", "KANSAS")
| summarize hll_flood = hll_if(Source, EventType == "Flood") by State
| project State, SourcesOfFloodEvents = dcount_hll(hll_flood)

Output

StateSourcesOfFloodEvents
KANSAS11
IOWA7

Estimation accuracy

AccuracySpeedError (%)
0Fastest1.6
1Balanced0.8
2Slow0.4
3Slow0.28
4Slowest0.2

1.17 - hll_merge() (aggregation function)

Learn how to use the hll_merge() function to merge HLL results into a single HLL value.

Merges HLL results across the group into a single HLL value.

For more information, see the underlying algorithm (HyperLogLog) and estimation accuracy.

Syntax

hll_merge (hll)

Parameters

NameTypeRequiredDescription
hllstring✔️The column name containing HLL values to merge.

Returns

The function returns the merged HLL values of hll across the group.

Example

The following example shows HLL results across a group merged into a single HLL value.

StormEvents
| summarize hllRes = hll(DamageProperty) by bin(StartTime,10m)
| summarize hllMerged = hll_merge(hllRes)

Output

The results show only the first five results in the array.

hllMerged
[[1024,14],["-6903255281122589438","-7413697181929588220","-2396604341988936699",“5824198135224880646”,"-6257421034880415225", …],[]]

Estimation accuracy

1.18 - hll() (aggregation function)

Learn how to use the hll() function to calculate the results of the dcount() function.

The hll() function is a way to estimate the number of unique values in a set of values. It does so by calculating intermediate results for aggregation within the summarize operator for a group of data using the dcount function.

Read about the underlying algorithm (HyperLogLog) and the estimation accuracy.

Syntax

hll (expr [, accuracy])

Parameters

NameTypeRequiredDescription
exprstring✔️The expression used for the aggregation calculation.
accuracyintThe value that controls the balance between speed and accuracy. If unspecified, the default value is 1. For supported values, see Estimation accuracy.

Returns

Returns the intermediate results of distinct count of expr across the group.

Example

In the following example, the hll() function is used to estimate the number of unique values of the DamageProperty column within each 10-minute time bin of the StartTime column.

StormEvents
| summarize hll(DamageProperty) by bin(StartTime,10m)

Output

The results table shown includes only the first 10 rows.

StartTimehll_DamageProperty
2007-01-01T00:20:00Z[[1024,14],[“3803688792395291579”],[]]
2007-01-01T01:00:00Z[[1024,14],[“7755241107725382121”,"-5665157283053373866",“3803688792395291579”,"-1003235211361077779"],[]]
2007-01-01T02:00:00Z[[1024,14],["-1003235211361077779","-5665157283053373866",“7755241107725382121”],[]]
2007-01-01T02:20:00Z[[1024,14],[“7755241107725382121”],[]]
2007-01-01T03:30:00Z[[1024,14],[“3803688792395291579”],[]]
2007-01-01T03:40:00Z[[1024,14],["-5665157283053373866"],[]]
2007-01-01T04:30:00Z[[1024,14],[“3803688792395291579”],[]]
2007-01-01T05:30:00Z[[1024,14],[“3803688792395291579”],[]]
2007-01-01T06:30:00Z[[1024,14],[“1589522558235929902”],[]]

Estimation accuracy

1.19 - make_bag_if() (aggregation function)

Learn how to use the make_bag_if() function to create a dynamic JSON property bag of expression values where the predicate evaluates to true.

Creates a dynamic JSON property bag (dictionary) of expr values in records for which predicate evaluates to true.

Syntax

make_bag_if(expr, predicate [, maxSize])

Parameters

NameTypeRequiredDescription
exprdynamic✔️The expression used for the aggregation calculation.
predicatebool✔️The predicate that evaluates to true, in order for expr to be added to the result.
maxSizeintThe limit on the maximum number of elements returned. The default and max value is 1048576.

Returns

Returns a dynamic JSON property bag (dictionary) of expr values in records for which predicate evaluates to true. Nondictionary values are skipped. If a key appears in more than one row, an arbitrary value, out of the possible values for this key, are selected.

Example

The following example shows a packed JSON property bag.

let T = datatable(prop:string, value:string, predicate:bool)
[
    "prop01", "val_a", true,
    "prop02", "val_b", false,
    "prop03", "val_c", true
];
T
| extend p = bag_pack(prop, value)
| summarize dict=make_bag_if(p, predicate)

Output

dict
{ “prop01”: “val_a”, “prop03”: “val_c” }

Use bag_unpack() plugin for transforming the bag keys in the make_bag_if() output into columns.

let T = datatable(prop:string, value:string, predicate:bool)
[
    "prop01", "val_a", true,
    "prop02", "val_b", false,
    "prop03", "val_c", true
];
T
| extend p = bag_pack(prop, value)
| summarize bag=make_bag_if(p, predicate)
| evaluate bag_unpack(bag)

Output

prop01prop03
val_aval_c

1.20 - make_bag() (aggregation function)

Learn how to use the make_bag() aggregation function to create a dynamic JSON property bag.

Creates a dynamic JSON property bag (dictionary) of all the values of expr in the group.

Syntax

make_bag (expr [, maxSize])

Parameters

NameTypeRequiredDescription
exprdynamic✔️The expression used for the aggregation calculation.
maxSizeintThe limit on the maximum number of elements returned. The default and max value is 1048576.

Returns

Returns a dynamic JSON property bag (dictionary) of all the values of Expr in the group, which are property bags. Nondictionary values are skipped. If a key appears in more than one row, an arbitrary value, out of the possible values for this key, is selected.

Example

The following example shows a packed JSON property bag.

let T = datatable(prop:string, value:string)
[
    "prop01", "val_a",
    "prop02", "val_b",
    "prop03", "val_c",
];
T
| extend p = bag_pack(prop, value)
| summarize dict=make_bag(p)

Output

dict
{ “prop01”: “val_a”, “prop02”: “val_b”, “prop03”: “val_c” }

Use the bag_unpack() plugin for transforming the bag keys in the make_bag() output into columns.

let T = datatable(prop:string, value:string)
[
    "prop01", "val_a",
    "prop02", "val_b",
    "prop03", "val_c",
];
T
| extend p = bag_pack(prop, value)
| summarize bag=make_bag(p)
| evaluate bag_unpack(bag)

Output

prop01prop02prop03
val_aval_bval_c

1.21 - make_list_if() (aggregation function)

Learn how to use the make_list_if() aggregation function to create a dynamic JSON object of expression values where the predicate evaluates to true.

Creates a dynamic array of expr values in the group for which predicate evaluates to true.

Syntax

make_list_if(expr, predicate [, maxSize])

Parameters

NameTypeRequiredDescription
exprstring✔️The expression used for the aggregation calculation.
predicatestring✔️A predicate that has to evaluate to true in order for expr to be added to the result.
maxSizeintegerThe maximum number of elements returned. The default and max value is 1048576.

Returns

Returns a dynamic array of expr values in the group for which predicate evaluates to true. If the input to the summarize operator isn’t sorted, the order of elements in the resulting array is undefined. If the input to the summarize operator is sorted, the order of elements in the resulting array tracks that of the input.

Example

The following example shows a list of names with more than 4 letters.

let T = datatable(name:string, day_of_birth:long)
[
   "John", 9,
   "Paul", 18,
   "George", 25,
   "Ringo", 7
];
T
| summarize make_list_if(name, strlen(name) > 4)

Output

list_name
[“George”, “Ringo”]

1.22 - make_list_with_nulls() (aggregation function)

Learn how to use the make_list_with_nulls() aggregation function to create a dynamic JSON object (array) which includes null values.

Creates a dynamic array of all the values of expr in the group, including null values.

Syntax

make_list_with_nulls(expr)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression that to use to create the array.

Returns

Returns a dynamic JSON object (array) of all the values of expr in the group, including null values. If the input to the summarize operator isn’t sorted, the order of elements in the resulting array is undefined. If the input to the summarize operator is sorted, the order of elements in the resulting array tracks that of the input.

Example

The following example shows null values in the results.

let shapes = datatable (name:string , sideCount: int)
[
    "triangle", int(null),
    "square", 4,
    "rectangle", 4,
    "pentagon", 5,
    "hexagon", 6,
    "heptagon", 7,
    "octagon", 8,
    "nonagon", 9,
    "decagon", 10
];
shapes
| summarize mylist = make_list_with_nulls(sideCount)

Output

mylist
[null,4,4,5,6,7,8,9,10]

1.23 - make_list() (aggregation function)

Learn how to use the make_list() function to create a dynamic JSON object array of all the values of the expressions in the group.

Creates a dynamic array of all the values of expr in the group.

Syntax

make_list(expr [, maxSize])

Parameters

NameTypeRequiredDescription
exprdynamic✔️The expression used for the aggregation calculation.
maxSizeintThe maximum number of elements returned. The default and max value is 1048576.

Returns

Returns a dynamic array of all the values of expr in the group. If the input to the summarize operator isn’t sorted, the order of elements in the resulting array is undefined. If the input to the summarize operator is sorted, the order of elements in the resulting array tracks that of the input.

Examples

The examples in this section show how to use the syntax to help you get started.

One column

The following example uses the datatable, shapes, to return a list of shapes in a single column.

let shapes = datatable (name: string, sideCount: int)
[
    "triangle", 3,
    "square", 4,
    "rectangle", 4,
    "pentagon", 5,
    "hexagon", 6,
    "heptagon", 7,
    "octagon", 8,
    "nonagon", 9,
    "decagon", 10
];
shapes
| summarize mylist = make_list(name)

Output

mylist
[“triangle”,“square”,“rectangle”,“pentagon”,“hexagon”,“heptagon”,“octagon”,“nonagon”,“decagon”]

Using the ‘by’ clause

The following example uses the make_list function and the by clause to create two lists of objects grouped by whether they have an even or odd number of sides.

let shapes = datatable (name: string, sideCount: int)
[
    "triangle", 3,
    "square", 4,
    "rectangle", 4,
    "pentagon", 5,
    "hexagon", 6,
    "heptagon", 7,
    "octagon", 8,
    "nonagon", 9,
    "decagon", 10
];
shapes
| summarize mylist = make_list(name) by isEvenSideCount = sideCount % 2 == 0

Output

isEvenSideCountmylist
false[“triangle”,“pentagon”,“heptagon”,“nonagon”]
true[“square”,“rectangle”,“hexagon”,“octagon”,“decagon”]

Packing a dynamic object

The following examples show how to pack a dynamic object in a column before making it a list. It returns a column with a boolean table isEvenSideCount indicating whether the side count is even or odd and a mylist column that contains lists of packed bags int each category.

let shapes = datatable (name: string, sideCount: int)
[
    "triangle", 3,
    "square", 4,
    "rectangle", 4,
    "pentagon", 5,
    "hexagon", 6,
    "heptagon", 7,
    "octagon", 8,
    "nonagon", 9,
    "decagon", 10
];
shapes
| extend d = bag_pack("name", name, "sideCount", sideCount)
| summarize mylist = make_list(d) by isEvenSideCount = sideCount % 2 == 0

Output

isEvenSideCountmylist
false[{“name”:“triangle”,“sideCount”:3},{“name”:“pentagon”,“sideCount”:5},{“name”:“heptagon”,“sideCount”:7},{“name”:“nonagon”,“sideCount”:9}]
true[{“name”:“square”,“sideCount”:4},{“name”:“rectangle”,“sideCount”:4},{“name”:“hexagon”,“sideCount”:6},{“name”:“octagon”,“sideCount”:8},{“name”:“decagon”,“sideCount”:10}]

1.24 - make_set_if() (aggregation function)

Learn how to use the make_set_if() function to create a dynamic JSON object of a set of distinct values that an expression takes where the predicate evaluates to true.

Creates a dynamic array of the set of distinct values that expr takes in records for which predicate evaluates to true.

Syntax

make_set_if(expr, predicate [, maxSize])

Parameters

NameTypeRequiredDescription
exprstring✔️The expression used for the aggregation calculation.
predicatestring✔️A predicate that has to evaluate to true in order for expr to be added to the result.
maxSizeintThe maximum number of elements returned. The default and max value is 1048576.

Returns

Returns a dynamic array of the set of distinct values that expr takes in records for which predicate evaluates to true. The array’s sort order is undefined.

Example

The following example shows a list of names with more than four letters.

let T = datatable(name:string, day_of_birth:long)
[
   "John", 9,
   "Paul", 18,
   "George", 25,
   "Ringo", 7
];
T
| summarize make_set_if(name, strlen(name) > 4)

Output

set_name
[“George”, “Ringo”]

1.25 - make_set() (aggregation function)

Learn how to use the make_set() function to return a JSON array of the distinct values that the expression takes in the group.

Creates a dynamic array of the set of distinct values that expr takes in the group.

Syntax

make_set(expr [, maxSize])

Parameters

NameTypeRequiredDescription
exprstring✔️The expression used for the aggregation calculation.
maxSizeintThe maximum number of elements returned. The default and max value is 1048576.

Returns

Returns a dynamic array of the set of distinct values that expr takes in the group. The array’s sort order is undefined.

Example

Set from a scalar column

The following example shows the set of states grouped with the same amount of crop damage.

StormEvents 
| summarize states=make_set(State) by DamageCrops

The results table shown includes only the first 10 rows.

DamageCropsstates
0[“NORTH CAROLINA”,“WISCONSIN”,“NEW YORK”,“ALASKA”,“DELAWARE”,“OKLAHOMA”,“INDIANA”,“ILLINOIS”,“MINNESOTA”,“SOUTH DAKOTA”,“TEXAS”,“UTAH”,“COLORADO”,“VERMONT”,“NEW JERSEY”,“VIRGINIA”,“CALIFORNIA”,“PENNSYLVANIA”,“MONTANA”,“WASHINGTON”,“OREGON”,“HAWAII”,“IDAHO”,“PUERTO RICO”,“MICHIGAN”,“FLORIDA”,“WYOMING”,“GULF OF MEXICO”,“NEVADA”,“LOUISIANA”,“TENNESSEE”,“KENTUCKY”,“MISSISSIPPI”,“ALABAMA”,“GEORGIA”,“SOUTH CAROLINA”,“OHIO”,“NEW MEXICO”,“ATLANTIC SOUTH”,“NEW HAMPSHIRE”,“ATLANTIC NORTH”,“NORTH DAKOTA”,“IOWA”,“NEBRASKA”,“WEST VIRGINIA”,“MARYLAND”,“KANSAS”,“MISSOURI”,“ARKANSAS”,“ARIZONA”,“MASSACHUSETTS”,“MAINE”,“CONNECTICUT”,“GUAM”,“HAWAII WATERS”,“AMERICAN SAMOA”,“LAKE HURON”,“DISTRICT OF COLUMBIA”,“RHODE ISLAND”,“LAKE MICHIGAN”,“LAKE SUPERIOR”,“LAKE ST CLAIR”,“LAKE ERIE”,“LAKE ONTARIO”,“E PACIFIC”,“GULF OF ALASKA”]
30000[“TEXAS”,“NEBRASKA”,“IOWA”,“MINNESOTA”,“WISCONSIN”]
4000000[“CALIFORNIA”,“KENTUCKY”,“NORTH DAKOTA”,“WISCONSIN”,“VIRGINIA”]
3000000[“CALIFORNIA”,“ILLINOIS”,“MISSOURI”,“SOUTH CAROLINA”,“NORTH CAROLINA”,“MISSISSIPPI”,“NORTH DAKOTA”,“OHIO”]
14000000[“CALIFORNIA”,“NORTH DAKOTA”]
400000[“CALIFORNIA”,“MISSOURI”,“MISSISSIPPI”,“NEBRASKA”,“WISCONSIN”,“NORTH DAKOTA”]
50000[“CALIFORNIA”,“GEORGIA”,“NEBRASKA”,“TEXAS”,“WEST VIRGINIA”,“KANSAS”,“MISSOURI”,“MISSISSIPPI”,“NEW MEXICO”,“IOWA”,“NORTH DAKOTA”,“OHIO”,“WISCONSIN”,“ILLINOIS”,“MINNESOTA”,“KENTUCKY”]
18000[“WASHINGTON”,“WISCONSIN”]
107900000[“CALIFORNIA”]
28900000[“CALIFORNIA”]

Set from array column

The following example shows the set of elements in an array.

datatable (Val: int, Arr1: dynamic)
[
    1, dynamic(['A1', 'A2', 'A3']), 
    5, dynamic(['A2', 'C1']),
    7, dynamic(['C2', 'A3']),
    5, dynamic(['C2', 'A1'])
] 
| summarize Val_set=make_set(Val), Arr1_set=make_set(Arr1)
Val_setArr1_set
[1,5,7][“A1”,“A2”,“A3”,“C1”,“C2”]

1.26 - max() (aggregation function)

Learn how to use the max() function to find the maximum value of the expression in the table.

Finds the maximum value of the expression in the table.

Syntax

max(expr)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression for which the maximum value is determined.

Returns

Returns the value in the table that maximizes the specified expression.

Example

The following example returns the last record in a table by querying the maximum value for StartTime.

StormEvents
| summarize LatestEvent=max(StartTime)

Output

LatestEvent
2007-12-31T23:53:00Z

1.27 - maxif() (aggregation function)

Learn how to use the maxif() function to calculate the maximum value of an expression where the predicate evaluates to true.

Calculates the maximum value of expr in records for which predicate evaluates to true.

See also - max() function, which returns the maximum value across the group without predicate expression.

Syntax

maxif(expr,predicate)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression used for the aggregation calculation.
predicatestring✔️The expression used to filter rows.

Returns

Returns the maximum value of expr in records for which predicate evaluates to true.

Example

This example shows the maximum damage for events with no casualties.

StormEvents
| extend Damage=DamageCrops + DamageProperty, Deaths=DeathsDirect + DeathsIndirect
| summarize MaxDamageNoCasualties=maxif(Damage, Deaths == 0) by State

Output

The results table shown includes only the first 10 rows.

TEXAS25000000
KANSAS37500000
IOWA15000000
ILLINOIS5000000
MISSOURI500005000
GEORGIA344000000
MINNESOTA38390000
WISCONSIN45000000
NEBRASKA4000000
NEW YORK26000000

1.28 - min() (aggregation function)

Learn how to use the min() function to find the minimum value in a table.

Finds the minimum value of the expression in the table.

Syntax

min (expr)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression for which the minimum value is determined.

Returns

Returns the minimum value of expr across the table.

Example

This example returns the first record in a table.

StormEvents
| summarize FirstEvent=min(StartTime)

Output

FirstEvent
2007-01-01T00:00:00Z

1.29 - minif() (aggregation function)

Learn how to use the minif() function to return the minimum value of an expression where the predicate evaluates to true.

Returns the minimum of Expr in records for which Predicate evaluates to true.

  • Can be used only in context of aggregation inside summarize

See also - min() function, which returns the minimum value across the group without predicate expression.

Syntax

minif (Expr,Predicate)

Parameters

NameTypeRequiredDescription
Exprstring✔️Expression that will be used for aggregation calculation.
Predicatestring✔️Expression that will be used to filter rows.

Returns

The minimum value of Expr in records for which Predicate evaluates to true.

Example

This example shows the minimum damage for events with casualties (Except 0)

StormEvents
| extend Damage=DamageCrops+DamageProperty, Deaths=DeathsDirect+DeathsIndirect
| summarize MinDamageWithCasualties=minif(Damage,(Deaths >0) and (Damage >0)) by State 
| where MinDamageWithCasualties >0 and isnotnull(MinDamageWithCasualties)

Output

The results table shown includes only the first 10 rows.

StateMinDamageWithCasualties
TEXAS8000
KANSAS5000
IOWA45000
ILLINOIS100000
MISSOURI10000
GEORGIA500000
MINNESOTA200000
WISCONSIN10000
NEW YORK25000
NORTH CAROLINA15000

1.30 - percentile(), percentiles()

Learn how to use the percentile(), percentiles() functions to calculate estimates for nearest rank percentiles.

The percentile() function calculates an estimate for the specified nearest-rank percentile of the population defined by expr. The accuracy depends on the density of population in the region of the percentile.

percentiles() works similarly to percentile(). However, percentiles() can calculate multiple percentile values at once, which is more efficient than calculating each percentile value separately.

To calculate weighted percentiles, see percentilesw().

Syntax

percentile(expr, percentile)

percentiles(expr, percentiles)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression to use for aggregation calculation.
percentileint or long✔️A constant that specifies the percentile.
percentilesint or long✔️One or more comma-separated percentiles.

Returns

Returns a table with the estimates for expr of the specified percentiles in the group, each in a separate column.

Examples

Calculate single percentile

The following example shows the value of DamageProperty being larger than 95% of the sample set and smaller than 5% of the sample set.

StormEvents | summarize percentile(DamageProperty, 95) by State

Output

The results table shown includes only the first 10 rows.

Statepercentile_DamageProperty_95
ATLANTIC SOUTH0
FLORIDA40000
GEORGIA143333
MISSISSIPPI80000
AMERICAN SAMOA250000
KENTUCKY35000
OHIO150000
KANSAS51392
MICHIGAN49167
ALABAMA50000

Calculate multiple percentiles

The following example shows the value of DamageProperty simultaneously calculated using 5, 50 (median) and 95.

StormEvents | summarize percentiles(DamageProperty, 5, 50, 95) by State

Output

The results table shown includes only the first 10 rows.

Statepercentile_DamageProperty_5percentile_DamageProperty_50percentile_DamageProperty_95
ATLANTIC SOUTH000
FLORIDA0040000
GEORGIA00143333
MISSISSIPPI0080000
AMERICAN SAMOA00250000
KENTUCKY0035000
OHIO02000150000
KANSAS0051392
MICHIGAN0049167
ALABAMA0050000

Return percentiles as an array

Instead of returning the values in individual columns, use the percentiles_array() function to return the percentiles in a single column of dynamic array type.

Syntax

percentiles_array(expr, percentiles)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression to use for aggregation calculation.
percentilesint, long, or dynamic✔️One or more comma-separated percentiles or a dynamic array of percentiles. Each percentile can be an integer or long value.

Returns

Returns an estimate for expr of the specified percentiles in the group as a single column of dynamic array type.

Examples

Comma-separated percentiles

Multiple percentiles can be obtained as an array in a single dynamic column, instead of in multiple columns as with percentiles().

TransformedSensorsData
| summarize percentiles_array(Value, 5, 25, 50, 75, 95), avg(Value) by SensorName

Output

The results table displays only the first 10 rows.

SensorNamepercentiles_Valueavg_Value
sensor-82[“0.048141473520867069”,“0.24407515500271132”,“0.48974511106780577”,“0.74160998970950343”,“0.94587903204190071”]0.493950914
sensor-130[“0.049200214398937764”,“0.25735850440187535”,“0.51206374010048239”,“0.74182335059053839”,“0.95210342463616771”]0.505111463
sensor-56[“0.04857779335488676”,“0.24709868149337144”,“0.49668762923789589”,“0.74458470404241883”,“0.94889104840865857”]0.497955018
sensor-24[“0.051507199150534679”,“0.24803904945640423”,“0.50397070213183581”,“0.75653888126010793”,“0.9518782718727431”]0.501084379
sensor-47[“0.045991246974755672”,“0.24644331118208851”,“0.48089197707088743”,“0.74475142784472248”,“0.9518322864959039”]0.49386228
sensor-135[“0.05132897529660399”,“0.24204987641954018”,“0.48470113942206461”,“0.74275730068433621”,“0.94784079559229406”]0.494817619
sensor-74[“0.048914714739047828”,“0.25160926036445724”,“0.49832498850160978”,“0.75257887767110776”,“0.94932261924236094”]0.501627252
sensor-173[“0.048333149363009836”,“0.26084250046756496”,“0.51288012531934613”,“0.74964772791583412”,“0.95156058795294”]0.505401226
sensor-28[“0.048511161184567046”,“0.2547387968731824”,“0.50101318228599656”,“0.75693845702682039”,“0.95243122486483989”]0.502066244
sensor-34[“0.049980293859462954”,“0.25094722564949412”,“0.50914023067384762”,“0.75571549713447961”,“0.95176564809278674”]0.504309494

Dynamic array of percentiles

Percentiles for percentiles_array can be specified in a dynamic array of integer or floating-point numbers. The array must be constant but doesn’t have to be literal.

TransformedSensorsData
| summarize percentiles_array(Value, dynamic([5, 25, 50, 75, 95])), avg(Value) by SensorName

Output

The results table displays only the first 10 rows.

SensorNamepercentiles_Valueavg_Value
sensor-82[“0.048141473520867069”,“0.24407515500271132”,“0.48974511106780577”,“0.74160998970950343”,“0.94587903204190071”]0.493950914
sensor-130[“0.049200214398937764”,“0.25735850440187535”,“0.51206374010048239”,“0.74182335059053839”,“0.95210342463616771”]0.505111463
sensor-56[“0.04857779335488676”,“0.24709868149337144”,“0.49668762923789589”,“0.74458470404241883”,“0.94889104840865857”]0.497955018
sensor-24[“0.051507199150534679”,“0.24803904945640423”,“0.50397070213183581”,“0.75653888126010793”,“0.9518782718727431”]0.501084379
sensor-47[“0.045991246974755672”,“0.24644331118208851”,“0.48089197707088743”,“0.74475142784472248”,“0.9518322864959039”]0.49386228
sensor-135[“0.05132897529660399”,“0.24204987641954018”,“0.48470113942206461”,“0.74275730068433621”,“0.94784079559229406”]0.494817619
sensor-74[“0.048914714739047828”,“0.25160926036445724”,“0.49832498850160978”,“0.75257887767110776”,“0.94932261924236094”]0.501627252
sensor-173[“0.048333149363009836”,“0.26084250046756496”,“0.51288012531934613”,“0.74964772791583412”,“0.95156058795294”]0.505401226
sensor-28[“0.048511161184567046”,“0.2547387968731824”,“0.50101318228599656”,“0.75693845702682039”,“0.95243122486483989”]0.502066244
sensor-34[“0.049980293859462954”,“0.25094722564949412”,“0.50914023067384762”,“0.75571549713447961”,“0.95176564809278674”]0.504309494

Nearest-rank percentile

P-th percentile (0 < P <= 100) of a list of ordered values, sorted in ascending order, is the smallest value in the list. The P percent of the data is less or equal to P-th percentile value (from Wikipedia article on percentiles).

Define 0-th percentiles to be the smallest member of the population.

Estimation error in percentiles

The percentiles aggregate provides an approximate value using T-Digest.

1.31 - percentilew(), percentilesw()

Learn how to use the percentilew(), percentilesw() functions to calculate weighted percentiles.

The percentilew() function calculates a weighted estimate for the specified nearest-rank percentile of the population defined by expr. percentilesw() works similarly to percentilew(). However, percentilesw() can calculate multiple weighted percentile values at once, which is more efficient than calculating each weighted percentile value separately.

Weighted percentiles calculate percentiles in a dataset by giving each value in the input dataset a weight. In this method, each value is considered to be repeated a number of times equal to its weight, which is then used to calculate the percentile. By giving more importance to certain values, weighted percentiles provide a way to calculate percentiles in a “weighted” manner.

To calculate unweighted percentiles, see percentiles().

Syntax

percentilew(expr, weightExpr, percentile)

percentilesw(expr, weightExpr, percentiles)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression to use for aggregation calculation.
weightExprlong✔️The weight to give each value.
percentileint or long✔️A constant that specifies the percentile.
percentilesint or long✔️One or more comma-separated percentiles.

Returns

Returns a table with the estimates for expr of the specified percentiles in the group, each in a separate column.

Examples

Calculate weighted percentiles

Assume you repetitively measure the time (Duration) it takes an action to complete. Instead of recording every value of the measurement, you record each value of Duration, rounded to 100 msec, and how many times the rounded value appeared (BucketSize).

Use summarize percentilesw(Duration, BucketSize, ...) to calculate the given percentiles in a “weighted” way. Treat each value of Duration as if it was repeated BucketSize times in the input, without actually needing to materialize those records.

The following example shows weighted percentiles. Using the following set of latency values in milliseconds: { 1, 1, 2, 2, 2, 5, 7, 7, 12, 12, 15, 15, 15, 18, 21, 22, 26, 35 }.

To reduce bandwidth and storage, do pre-aggregation to the following buckets: { 10, 20, 30, 40, 50, 100 }. Count the number of events in each bucket to produce the following table:

let latencyTable = datatable (ReqCount:long, LatencyBucket:long) 
[ 
    8, 10, 
    6, 20, 
    3, 30, 
    1, 40 
];
latencyTable

The table displays:

  • Eight events in the 10-ms bucket (corresponding to subset { 1, 1, 2, 2, 2, 5, 7, 7 })
  • Six events in the 20-ms bucket (corresponding to subset { 12, 12, 15, 15, 15, 18 })
  • Three events in the 30-ms bucket (corresponding to subset { 21, 22, 26 })
  • One event in the 40-ms bucket (corresponding to subset { 35 })

At this point, the original data is no longer available. Only the number of events in each bucket. To compute percentiles from this data, use the percentilesw() function. For the 50, 75, and 99.9 percentiles, use the following query:

let latencyTable = datatable (ReqCount:long, LatencyBucket:long) 
[ 
    8, 10, 
    6, 20, 
    3, 30, 
    1, 40 
];
latencyTable
| summarize percentilesw(LatencyBucket, ReqCount, 50, 75, 99.9)

Output

percentile_LatencyBucket_50percentile_LatencyBucket_75percentile_LatencyBucket_99_9
202040

Return percentiles as an array

Instead of returning the values in individual columns, use the percentilesw_array() function to return the percentiles in a single column of dynamic array type.

Syntax

percentilesw_array(expr, weightExpr, percentiles)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression to use for aggregation calculation.
percentilesint, long, or dynamic✔️One or more comma-separated percentiles or a dynamic array of percentiles. Each percentile can be an integer or long value.
weightExprlong✔️The weight to give each value.

Returns

Returns an estimate for expr of the specified percentiles in the group as a single column of dynamic array type.

Examples

Comma-separated percentiles

let latencyTable = datatable (ReqCount:long, LatencyBucket:long) 
[ 
    8, 10, 
    6, 20, 
    3, 30, 
    1, 40 
];
latencyTable
| summarize percentilesw_array(LatencyBucket, ReqCount, 50, 75, 99.9)

Output

percentile_LatencyBucket
[20, 20, 40]

Dynamic array of percentiles

let latencyTable = datatable (ReqCount:long, LatencyBucket:long) 
[ 
    8, 10, 
    6, 20, 
    3, 30, 
    1, 40 
];
latencyTable
| summarize percentilesw_array(LatencyBucket, ReqCount, dynamic([50, 75, 99.9]))

Output

percentile_LatencyBucket
[20, 20, 40]

1.32 - stdev() (aggregation function)

Learn how to use the stdev() aggregation function to calculate the standard deviation of an expression using Bessel’s correction.

Calculates the standard deviation of expr across the group, using Bessel’s correction for a small dataset that is considered a sample.

For a large dataset that is representative of the population, use stdevp() (aggregation function).

Formula

This function uses the following formula.

Image showing a Stdev sample formula.

Syntax

stdev(expr)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression used for the standard deviation aggregation calculation.

Returns

Returns the standard deviation value of expr across the group.

Example

The following example shows the standard deviation for the group.

range x from 1 to 5 step 1
| summarize make_list(x), stdev(x)

Output

list_xstdev_x
[ 1, 2, 3, 4, 5]1.58113883008419

1.33 - stdevif() (aggregation function)

Learn how to use the stdevif() function to calculate the standard deviation of an expression where the predicate evaluates to true.

Calculates the standard deviation of expr in records for which predicate evaluates to true.

Syntax

stdevif(expr,predicate)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression used for the standards deviation aggregation calculation.
predicatestring✔️The predicate that has to evaluate to true in order for expr to be added to the result.

Returns

Returns the standard deviation value of expr in records for which predicate evaluates to true.

Example

The following example shows the standard deviation in a range of 1 to 100.

range x from 1 to 100 step 1
| summarize stdevif(x, x % 2 == 0)

Output

stdevif_x
29.1547594742265

1.34 - stdevp() (aggregation function)

Learn how to use the stdevp() aggregation function to calculate the standard deviation of an expression.

Calculates the standard deviation of expr across the group, considering the group as a population for a large dataset that is representative of the population.

For a small dataset that is a sample, use stdev() (aggregation function).

Formula

This function uses the following formula.

Image showing a Stdev sample formula.

Syntax

stdevp(expr)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression used for the standards deviation aggregation calculation.

Returns

Returns the standard deviation value of expr across the group.

Example

range x from 1 to 5 step 1
| summarize make_list(x), stdevp(x)

Output

list_xstdevp_x
[ 1, 2, 3, 4, 5]1.4142135623731

1.35 - sum() (aggregation function)

Learn how to use the sum() (aggregation function) function to calculate the sum of an expression across the group.

Calculates the sum of expr across the group.

Syntax

sum(expr)

Parameters

NameTypeRequiredDescription
expr string✔️The expression used for the aggregation calculation.

Returns

Returns the sum value of expr across the group.

Example

This example returns the total value of crop and property damages by state, and sorted in descending value.

StormEvents 
| summarize EventCount=count(), TotalDamages = sum(DamageCrops+DamageProperty) by State 
| sort by TotalDamages

Output

The results table shown includes only the first 10 rows.

| State | Eventcount | TotalDamages | | —- | — | | CALIFORNIA | 898 | 2801954600 | | GEORGIA | 1983 | 1190448750 | | MISSOURI | 2016 | 1096887450 | | OKLAHOMA | 1716 | 916557300 | | MISSISSIPPI | 1218 | 802890160 | | KANSAS | 3166 | 738830000 | | TEXAS | 4701 | 572086700 | | OHIO | 1233 | 417989500 | | FLORIDA | 1042 | 379455260 | | NORTH DAKOTA | 905 | 342460100 | | … | … | … |

1.36 - sumif() (aggregation function)

Learn how to use the sumif() (aggregation function) function to calculate the sum of an expression value in records for which the predicate evaluates to true.

Calculates the sum of expr in records for which predicate evaluates to true.

You can also use the sum() function, which sums rows without predicate expression.

Syntax

sumif(expr,predicate)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression used for the aggregation calculation.
predicatestring✔️The expression used to filter rows. If the predicate evaluates to true, the row will be included in the result.

Returns

Returns the sum of expr for which predicate evaluates to true.

Example showing the sum of damages based on no casualty count

This example shows the sum total damage for storms without casualties.

StormEvents
| summarize DamageNoCasualties=sumif((DamageCrops+DamageProperty),(DeathsDirect+DeathsIndirect)==0) by State

Output

The results table shown includes only the first 10 rows.

StateDamageNoCasualties
TEXAS242638700
KANSAS407360000
IOWA135353700
ILLINOIS120394500
MISSOURI1096077450
GEORGIA1077448750
MINNESOTA230407300
WISCONSIN241550000
NEBRASKA70356050
NEW YORK58054000

Example showing the sum of birth dates

This example shows the sum of the birth dates for all names that have more than 4 letters.

let T = datatable(name:string, day_of_birth:long)
[
   "John", 9,
   "Paul", 18,
   "George", 25,
   "Ringo", 7
];
T
| summarize sumif(day_of_birth, strlen(name) > 4)

Output

sumif_day_of_birth
32

1.37 - take_any() (aggregation function)

Learn how to use the take_any() (aggregation function) to return the value of an arbitrarily selected record.

Arbitrarily chooses one record for each group in a summarize operator, and returns the value of one or more expressions over each such record.

Syntax

take_any(expr_1 [, expr_2 …])

take_any(*)

Parameters

NameTypeRequiredDescription
expr_Nstring✔️The expression used for selecting a record. If the wildcard value (*) is given in place of an expression, all records will be selected.

Returns

The take_any aggregation function returns the values of the expressions calculated for each of the records selected Indeterministically from each group of the summarize operator.

If the * argument is provided, the function behaves as if the expressions are all columns of the input to the summarize operator barring the group-by columns, if any.

Remarks

This function is useful when you want to get a sample value of one or more columns per value of the compound group key.

When the function is provided with a single column reference, it will attempt to return a non-null/non-empty value, if such value is present.

As a result of the indeterministic nature of this function, using this function multiple times in a single application of the summarize operator isn’t equivalent to using this function a single time with multiple expressions. The former may have each application select a different record, while the latter guarantees that all values are calculated over a single record (per distinct group).

Examples

Show indeterministic State:

StormEvents
| summarize take_any(State)

Output

State
ATLANTIC SOUTH

Show all the details for a random record:

StormEvents
| project StartTime, EpisodeId, State, EventType
| summarize take_any(*)

Output

StartTimeEpisodeIdStateEventType
2007-09-29 08:11:00.000000011091ATLANTIC SOUTHWaterspout

Show all the details of a random record for each State starting with ‘A’:

StormEvents
| where State startswith "A"
| project StartTime, EpisodeId, State, EventType
| summarize take_any(*) by State

Output

StateStartTimeEpisodeIdEventType
ALASKA2007-02-01 00:00:00.00000001733Flood
ATLANTIC SOUTH2007-09-29 08:11:00.000000011091Waterspout
ATLANTIC NORTH2007-11-27 00:00:00.000000011523Marine Thunderstorm Wind
ARIZONA2007-12-01 10:40:00.000000011955Flash Flood
AMERICAN SAMOA2007-12-07 14:00:00.000000013183Flash Flood
ARKANSAS2007-12-09 16:00:00.000000011319Lightning
ALABAMA2007-12-15 18:00:00.000000012580Heavy Rain

1.38 - take_anyif() (aggregation function)

Learn how to use the take_anyif() function to return the value of an arbitrarily selected record for which the predicate is ’true'.

Arbitrarily selects one record for each group in a summarize operator in records for which the predicate is ’true’. The function returns the value of an expression over each such record.

This function is useful when you want to get a sample value of one column per value of the compound group key, subject to some predicate that is true. If such a value is present, the function attempts to return a non-null/non-empty value.

Syntax

take_anyif( expr, predicate )

Parameters

NameTypeRequiredDescription
exprstring✔️The expression used for selecting a record.
predicatestring✔️Indicates which records may be considered for evaluation.

Returns

The take_anyif aggregation function returns the value of the expression calculated for each of the records randomly selected from each group of the summarize operator. Only records for which predicate returns ’true’ may be selected. If the predicate doesn’t return ’true’, a null value is produced.

Examples

Pick a random EventType from Storm events, where event description has a key phrase.

StormEvents
| summarize take_anyif(EventType, EventNarrative has 'strong wind')

Output

EventType
Strong Wind

1.39 - tdigest_merge() (aggregation functions)

Learn how to use the tdigest_merge() aggregation function to merge tdigest results across the group.

Merges tdigest results across the group.

For more information about the underlying algorithm (T-Digest) and the estimated error, see estimation error in percentiles.

Syntax

tdigest_merge(expr)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression used for the aggregation calculation.

Returns

Returns the merged tdigest values of expr across the group.

Example

StormEvents
| summarize PreAggDamageProperty=tdigest(DamageProperty) by State
| summarize tdigest_merge(PreAggDamageProperty)

Output

merge_tdigests_PreAggDamageProperty
[[7],[91,30,73667,966,110000000,24428,2500,20000,16500000,6292,40000,123208,1000000,133091,90583,20000000,977000,20007,547000,19000000,1221,9600000,300000,70072,55940,75000,417500,1410000,20400000,331500,15000000,62000000,50222,121690000,160400,6200000,252500,450,11000000,2200000,5700000,11566,12000000,263,50000,200000,3700000,13286,171000,100000000,28200000,65000000,17709,30693,16000000,7938,5200,2875,1500000,3480000,151100000,9800000,18200000,21600000,199,2570000,30000000,38000000,72000,891250,500000000,26385,80092,27000000,35000000,754500,11500000,3262500,113945,5000,62429,175294,9071,6500000,3321,15159,21850000,300000000,22683,3000,10000000,60055,600000,52000000,496000,15000,50000000,10140000,11900000,2100000,62600000,77125,310667,70000000,101000000,2088,1608571,19182,400000,179833,775000,612000,150000000,13500000,2600000,1250000,65400,45000000,297000,2500000,40000000,24846,30000,59067,1893,15762,142571,220666,195000,2000000,355000,2275000,6000000,46000000,38264,50857,4002,97333,27750,1000,1111429,7043,272500,455200,503,37500000,10000,1489,0,1200000,110538,60000000,250000,10730,1901429,291000,698750,649000,2716667,137000000,6400000,29286,41051,6850000,102000,4602,80000000,250000000,371667,8000000,729,8120000,5000000,20830,152400,803300,349667,202000,207000,81150000,48000000,750000,26000000,8900000,239143,75000000,248000,14342,74857,5992,500000,150000,938000,10533333,45248,105000000,7000000,35030,4000000,2000,7692500,3000000,25000000,4500000,87222,12054,100000,25000,9771,4840000,28000000,1307143,32024],[19,1,3,32,1,14,45,572,1,51,126,41,101,11,12,8,2,14,4,1,27,1,58,42,20,177,6,4,1,12,10,2,9,1,5,1,2,28,3,6,1,23,4,30,610,145,1,21,4,2,1,1,24,13,1,153,5,4,26,5,1,6,1,1,28,1,5,1,11,4,1,13,44,2,4,2,1,4,9,1672,7,17,47,2,39,17,2,1,17,666,16,71,21,3,1,530,10,1,1,2,1,4,6,4,1,20,7,11,40,6,2,1,1,2,1,3,5,2,1,21,2,13,271,3,14,23,7,15,2,41,1,2,7,1,27,7,205,3,4,1403,7,69,4,10,215,1,1472,127,45756,10,13,1,198,17,7,1,12,7,6,1,1,14,7,2,2,17,1,2,3,2,48,5,21,10,5,10,21,4,5,1,2,39,2,2,7,1,1,22,7,60,175,119,3,3,40,1,8,101,15,1135,4,22,3,3,9,76,430,611,12,1,2,7,8]]

1.40 - tdigest() (aggregation function)

Learn how to use the tdigest() (aggregation function) function to calculate the intermediate results of the weighted percentiles of expressions across the group.

Calculates the intermediate results of percentiles() across the group.

For more information, see the underlying algorithm (T-Digest) and the estimated error.

Syntax

tdigest(expr [, weight])

Parameters

NameTypeRequiredDescription
exprstring✔️The expression used for the aggregation calculation.
weightstringThe weights of the values for the aggregation calculation.

Returns

The Intermediate results of weighted percentiles of *expr* across the group.

Examples

Results per state

This example shows the results of the tdigest percentiles sorted by state.

StormEvents
| summarize tdigest(DamageProperty) by State

The results table shown includes only the first 10 rows.

Statetdigest_DamageProperty
NEBRASKA[[7],[800,250,300000,5000,240000,1500000,20000,550000,0,75000,100000,1000,10000,30000,13000,2000000,1000000,650000,125000,35000,7000,2500000,4000000,450000,85000,460000,500000,6000,150000,350000,4000,72500,1200000,180000,400000,25000,50000,2000,45000,8000,120000,200000,40000,1200,15000,55000,3000,250000],[5,1,3,72,1,1,44,1,1351,12,24,17,46,13,6,1,2,1,2,6,8,1,1,1,2,1,4,2,6,1,2,2,1,1,2,26,18,12,2,2,1,7,6,4,28,4,6,6]]
MINNESOTA[[7],[700,500,2000000,2500,1200000,12000000,16000,7000000,0,300000,425000,750,6000,30000,10000,22000000,10000000,9600000,600000,50000,4000,27000000,35000000,4000000,400000,5000000,6000000,3000,750000,2500000,2000,250000,11000000,38000000,3000000,20000,120000,1000,100000,5000,500000,1000000,60000,800,15000,200000,1500,1500000,900000],[1,3,1,3,1,2,1,1,1793,1,1,2,2,2,3,1,1,1,2,2,1,1,1,1,2,1,2,1,1,1,6,1,1,1,3,5,1,5,2,5,2,2,1,2,2,2,2,1,1]]
KANSAS[[7],[667,200,6000000,3400,80000,300000,18875,210000,0,45857,750000,37500000,10000,81150000,15000000,6400000,2570000,225000,59400,25000,5000,400000,7000000,4500000,2500000,6500000,200000,4500,70000,122500,2785,12000000,1900000,18200000,150000,1150000,27000000,2000,30000,2000000,250000000,75000,26000,1500,1500000,1000000,2500,100000,21600000,50000,335000,600000,175000,500000,160000,51000,40000,20000,15000,252500,7520,350000,250000,3400000,1000,338000,16000000,106000,4840000,305000,540000,337500,9800000,45000,12500,700000,4000000,71000,30000000,35000,3700000,22000,56000],[12,2,2,5,2,3,8,1,2751,7,2,1,37,1,1,1,1,2,5,12,33,8,1,1,1,2,10,1,5,2,7,1,4,1,5,1,1,9,11,4,1,5,2,6,4,8,2,23,1,44,2,3,2,3,1,1,1,18,5,2,5,1,7,1,25,1,1,3,1,1,1,2,6,1,1,2,1,1,1,3,1,1,1]]
NEW MEXICO[[7],[600,500,2500000,7000,1500,28000,40000,10000,0,500000,20000,1000,21000,70000,25000,3500000,200000,16500000,50000,100000,15000,4000,5000,2000],[1,3,1,1,1,1,1,7,466,1,7,4,1,1,2,1,1,1,1,2,1,4,10,8]]
KENTUCKY[[7],[600,200,700000,5000,400000,12000,15000,100000,0,60000,80000,1000,9000,20000,10000,50000,30000,300000,120000,25000,7000,3000,500000,11500000,75000,35000,8000,6000,150000,1500000,4000,56000,1911,250000,2500000,18000,45000,2000],[6,2,1,42,1,3,9,8,999,2,1,52,1,21,37,25,7,2,3,14,11,35,1,1,6,10,9,10,4,1,13,1,9,3,1,2,1,37]]
VIRGINIA[[7],[536,500,125000,3000,100000,7250,8000,60000,0,40000,50000,956,6000,11500,7000,25000,15000,98000,70000,12000,4000,2000,120000,1000000,45000,16000,5000,3500,75000,175000,2500,30000,1000,80000,300000,10000,20000,1500],[7,11,1,48,2,2,2,1,1025,2,6,9,2,2,1,5,16,1,3,5,12,122,1,1,1,1,64,2,2,1,1,7,209,3,2,42,19,6]]
OREGON[[7],[5000,1000,60000,434000,20000,50000,100000,500000,0,1500000,20400000,6000,62600000],[8,2,1,1,1,1,3,1,401,1,1,1,1]]
ALASKA[[7],[5000,1000,25000,700000,12060,15000,100000,1600000,0,10000],[5,1,1,1,1,2,1,2,242,1]]
CONNECTICUT[[7],[5000,1000,2000000,0,50000,750000,6000],[1,1,1,142,1,1,1]]
NEVADA[[7],[5000,1000,200000,1000000,30000,40000,297000,5000000,0,10000],[4,2,1,1,1,1,1,1,148,3]]

Convert pre-existing centroids

The following example shows how one can convert pre-existing T-Digest centroids for long-term storage. The V column holds the value of each centroid, and the W column is its weight (relative count). The tdigest() aggregate function is then applied to convert the data in table DT into the internal representation, and percentile_tdigest() is used to demonstrate how ot find the 50-tile value.

let DT=datatable(V:real, W:long) [
    1.0, 1,
    2.0, 2
];
DT
| summarize TD=tdigest(V, W)
| project P50=percentile_tdigest(TD, 50)
P50
2

1.41 - variance() (aggregation function)

Learn how to use the variance() aggregation function to calculate the sample variance of the expression across the group.

Calculates the variance of expr across the group, considering the group as a sample.

The following formula is used:

Image showing a variance sample formula.

Syntax

variance(expr)

Parameters

NameTypeRequiredDescription
exprreal✔️The expression used for the variance calculation.

Returns

Returns the variance value of expr across the group.

Example

range x from 1 to 5 step 1
| summarize make_list(x), variance(x) 

Output

list_xvariance_x
[ 1, 2, 3, 4, 5]2.5

1.42 - varianceif() (aggregation function)

Learn how to use the varianceif() function to calculate the variance in an expression where the predicate evaluates to true.

Calculates the variance of expr in records for which predicate evaluates to true.

Syntax

varianceif(expr, predicate)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression to use for the variance calculation.
predicatestring✔️If predicate evaluates to true, the expr calculated value will be added to the variance.

Returns

Returns the variance value of expr in records for which predicate evaluates to true.

Example

range x from 1 to 100 step 1
| summarize varianceif(x, x%2 == 0)

Output

varianceif_x
850

1.43 - variancep() (aggregation function)

Learn how to use the variancep() aggregation function to calculate the population variance of an expression across the group.

Calculates the variance of expr across the group, considering the group as a population.

The following formula is used:

Image showing a variance sample formula.

Syntax

variancep(expr)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression to use for the variance calculation.

Returns

Returns the variance value of expr across the group.

Example

range x from 1 to 5 step 1
| summarize make_list(x), variancep(x) 

Output

list_xvariance_x
[ 1, 2, 3, 4, 5]2

2 - Best practices for KQL queries

2.1 - Best practices for Kusto Query Language queries

This article describes Query best practices.

Here are several best practices to follow to make your query run faster.

In short

ActionUseDon’t useNotes
Reduce the amount of data being queriedUse mechanisms such as the where operator to reduce the amount of data being processed.For more information on efficient ways to reduce the amount of data being processed, see Reduce the amount of data being processed.
Avoid using redundant qualified referencesWhen referencing local entities, use the unqualified name.For more information, see Avoid using redundant qualified references.
datetime columnsUse the datetime data type.Don’t use the long data type.In queries, don’t use Unix time conversion functions, such as unixtime_milliseconds_todatetime(). Instead, use update policies to convert Unix time to the datetime data type during ingestion.
String operatorsUse the has operator.Don’t use containsWhen looking for full tokens, has works better, since it doesn’t look for substrings.
Case-sensitive operatorsUse ==.Don’t use =~.Use case-sensitive operators when possible.
Use in.Don’t use in~.
Use contains_cs.Don’t use contains.Using has/has_cs is preferred to contains/contains_cs.
Searching textLook in a specific column.Don’t use *.* does a full text search across all columns.
Extract fields from dynamic objects across millions of rowsMaterialize your column at ingestion time if most of your queries extract fields from dynamic objects across millions of rows.With this method you only pay once for column extraction.
Lookup for rare keys/values in dynamic objectsUse `MyTablewhere DynamicColumn has “Rare value”where DynamicColumn.SomeKey == “Rare value”`.
let statement with a value that you use more than onceUse the materialize() function.For more information on how to use materialize(), see materialize(). For more information, see Optimize queries that use named expressions.
Apply type conversions on more than one billion recordsReshape your query to reduce the amount of data fed into the conversion.Don’t convert large amounts of data if it can be avoided.
New queriesUse limit [small number] or count at the end.Running unbound queries over unknown datasets can yield a return of gigabytes of results, resulting in a slow response and a busy environment.
Case-insensitive comparisonsUse Col =~ "lowercasestring".Don’t use tolower(Col) == "lowercasestring".
Compare data already in lowercase (or uppercase)Col == "lowercasestring" (or Col == "UPPERCASESTRING").Avoid using case insensitive comparisons.
Filtering on columnsFilter on a table column.Don’t filter on a calculated column.
Use `Twhere predicate(Expression)`Don’t use `T
summarize operatorUse the hint.shufflekey=<key> when the group by keys of the summarize operator have high cardinality.High cardinality is ideally more than one million.
join operatorSelect the table with the fewest rows as the first one (left-most in query).
Use in instead of left semi join for filtering by a single column.
Join across clustersRun the query on the “right” side of the join across remote environments, such as clusters or Eventhouses, where most of the data is located.
Join when left side is small and right side is largeUse hint.strategy=broadcast.Small refers to up to 100 megabytes (MB) of data.
Join when right side is small and left side is largeUse the lookup operator instead of the join operatorIf the right side of the lookup is larger than several tens of MB, the query fails.
Join when both sides are too largeUse hint.shufflekey=<key>.Use when the join key has high cardinality.
Extract values on column with strings sharing the same format or patternUse the parse operator.Don’t use several extract() statements.For example, values like "Time = <time>, ResourceId = <resourceId>, Duration = <duration>, ....".
extract() functionUse when parsed strings don’t all follow the same format or pattern.Extract the required values by using a REGEX.
materialize() functionPush all possible operators that reduce the materialized dataset and still keep the semantics of the query.For example, filters, or project only required columns. For more information, see Optimize queries that use named expressions.
Use materialized viewsUse materialized views for storing commonly used aggregations. Prefer using the materialized_view() function to query materialized part only.materialized_view('MV')

Reduce the amount of data being processed

A query’s performance depends directly on the amount of data it needs to process. The less data is processed, the quicker the query (and the fewer resources it consumes). Therefore, the most important best-practice is to structure the query in such a way that reduces the amount of data being processed.

In order of importance:

  • Only reference tables whose data is needed by the query. For example, when using the union operator with wildcard table references, it’s better from a performance point-of-view to only reference a handful of tables, instead of using a wildcard (*) to reference all tables and then filter data out using a predicate on the source table name.

  • Take advantage of a table’s data scope if the query is relevant only for a specific scope. The table() function provides an efficient way to eliminate data by scoping it according to the caching policy (the DataScope parameter).

  • Apply the where query operator immediately following table references.

  • When using the where query operator, the order in which you place the predicates, whether you use a single where operator, or multiple consecutive where operators, can have a significant effect on the query performance.

  • Apply predicates that act upon datetime table columns first. Kusto includes an efficient index on such columns, often completely eliminating whole data shards without needing to access those shards.

  • Then apply predicates that act upon string and dynamic columns, especially such predicates that apply at the term-level. Order the predicates by the selectivity. For example, searching for a user ID when there are millions of users is highly selective and usually involves a term search, for which the index is very efficient.

  • Then apply predicates that are selective and are based on numeric columns.

  • Last, for queries that scan a table column’s data (for example, for predicates such as contains "@!@!", that have no terms and don’t benefit from indexing), order the predicates such that the ones that scan columns with less data are first. Doing so reduces the need to decompress and scan large columns.

Avoid using redundant qualified references

Reference entities such as tables and materialized views by name.

For example, the table T can be referenced as simply T (the unqualified name), or by using a database qualifier (for example, database("DB").T when the table is in a database called DB), or by using a fully qualified name (for example, cluster("<serviceURL>").database("DB").T).

For example, the table T can be referenced as simply T (the unqualified name), or by using a database qualifier (for example, database("DB").T when the table is in a database called DB), or by using a fully qualified name (for example, cluster("X.Y.kusto.windows.net").database("DB").T).

It’s a best practice to avoid using name qualifications when they’re redundant, for the following reasons:

  1. Unqualified names are easier to identify (for a human reader) as belonging to the database-in-scope.

  2. Referencing database-in-scope entities is always at least as fast, and in some cases much faster, then entities that belong to other databases. This is especially true when those databases are in a different cluster. This is especially true when those databases are in a different Eventhouse. Avoiding qualified names helps the reader to do the right thing.

2.2 - Named expressions

Learn how to optimally use named expressions.

This article discusses how to optimize repeat use of named expressions in a query.

In Kusto Query Language, you can bind names to complex expressions in several different ways:

When you reference these named expressions in a query, the following steps occur:

  1. The calculation within the named expression is evaluated. This calculation produces either a scalar or tabular value.
  2. The named expression is replaced with the calculated value.

If the same bound name is used multiple times, then the underlying calculation will be repeated multiple times. When is this a concern?

  • When the calculations consume many resources and are used many times.
  • When the calculation is non-deterministic, but the query assumes all invocations to return the same value.

Mitigation

To mitigate these concerns, you can materialize the calculation results in memory during the query. Depending on the way the named calculation is defined, you’ll use different materialization strategies:

Tabular functions

Use the following strategies for tabular functions:

  • let statements and function parameters: Use the materialize() function.
  • as operator: Set the hint.materialized hint value to true.

For example, the following query uses the non-deterministic tabular sample operator:

Behavior without using the materialize function

range x from 1 to 100 step 1
| sample 1
| as T
| union T

Output

x
63
92

Behavior using the materialize function

range x from 1 to 100 step 1
| sample 1
| as hint.materialized=true T
| union T

Output

x
95
95

Scalar functions

Non-deterministic scalar functions can be forced to calculate exactly once by using toscalar().

For example, the following query uses the non-deterministic function, rand():

let x = () {rand(1000)};
let y = () {toscalar(rand(1000))};
print x, x, y, y

Output

print_0print_1print_2print_3
1661377070

3 - Data types

3.1 - Null values

Learn how to use and understand null values.

All scalar data types in Kusto have a special value that represents a missing value. This value is called the null value, or null.

Null literals

The null value of a scalar type T is represented in the query language by the null literal T(null).

The following query returns a single row full of null values:

print bool(null), datetime(null), dynamic(null), guid(null), int(null), long(null), real(null), double(null), timespan(null)

Predicates on null values

The scalar function isnull() can be used to determine if a scalar value is the null value. The corresponding function isnotnull() can be used to determine if a scalar value isn’t the null value.

Equality and inequality of null values

  • Equality (==): Applying the equality operator to two null values yields bool(null). Applying the equality operator to a null value and a non-null value yields bool(false).
  • Inequality (!=): Applying the inequality operator to two null values yields bool(null). Applying the inequality operator to a null value and a non-null value yields bool(true).

For example:

datatable(val:int)[5, int(null)]
| extend IsBiggerThan3 = val > 3
| extend IsBiggerThan3OrNull = val > 3 or isnull(val)
| extend IsEqualToNull = val == int(null)
| extend IsNotEqualToNull = val != int(null)

Output

valIsBiggerThan3IsBiggerThan3OrNullIsEqualToNullIsNotEqualToNull
5truetruefalsetrue
nullnulltruenullnull

Null values and aggregation functions

When applying the following operators to entities that include null values, the null values are ignored and don’t factor into the calculation:

Null values and the where operator

The where operator use Boolean expressions to determine if to emit each input record to the output. This operator treats null values as if they’re bool(false). Records for which the predicate returns the null value are dropped and don’t appear in the output.

For example:

datatable(ival:int, sval:string)[5, "a", int(null), "b"]
| where ival != 5

Output

ivalsval
nullb

Null values and binary operators

Binary operators are scalar operators that accept two scalar values and produce a third value. For example, greater-than (>) and Boolean AND (&&) are binary operators.

For all binary operators, except as noted in Exceptions to this rule, the rule is as follows:

If one or both of the values input to the binary operator are null values, then the output of the binary operator is also the null value. In other words, the null value is “sticky”.

Exceptions to this rule

  • For the equality (==) and inequality (!=) operators, if one of the values is null and the other value isn’t null, then the result is either bool(false) or bool(true), respectively.
  • For the logical AND (&&) operator, if one of the values is bool(false), the result is also bool(false).
  • For the logical OR (||) operator, if one of the values is bool(true), the result is also bool(true).

For example:

datatable(val:int)[5, int(null)]
| extend Add = val + 10
| extend Multiply = val * 10

Output

valAddMultiply
51550
nullnullnull

Null values and the logical NOT (!) operator

The logical NOT operator not() yields the value bool(null) if the argument is the null value.

Null values and the in operator

  • The in operator behaves like a logical OR of equality comparisons.
  • The !in operator behaves like a logical AND of inequality comparisons.

Null values and data ingestion

For most data types, a missing value in the data source produces a null value in the corresponding table cell. However, columns of type string and CSV (or CSV-like) data formats are an exception to this rule, and a missing value produces an empty string.

For example:

.create table T(a:string, b:int)

.ingest inline into table T
[,]
[ , ]
[a,1]

T
| project a, b, isnull_a=isnull(a), isempty_a=isempty(a), stlen_a=strlen(a), isnull_b=isnull(b)

Output

abisnull_aisempty_astrlen_aisnull_b
  falsetrue0true
  falsefalse1true
a1falsefalse1false

3.2 - Scalar data types

This article describes Scalar data types.

Every data value, like the value of an expression or a function parameter, has a data type which is either a scalar data type or a user-defined record. A scalar data type is one of the built-in predefined types in Supported data types. A user-defined record is an ordered sequence of name and scalar-data-type pairs, like the data type of a row in a table.

As in most languages, the data type determines what calculations and manipulations can be run against a value. For example, if you have a value that is of type string, you won’t be able to perform arithmetic calculations against it.

Supported data types

In Kusto Query Language, most of the data types follow standard conventions and have names you’ve probably seen before. The following table shows the full list:

TypeDescription
bool (boolean)true (1) or false (0).
datetime (date)An instant in time, typically expressed as a date and time of day.
decimalA 128-bit wide, decimal number.
dynamicAn array, a property bag, or a value of any of the other scalar data types.
guid (uuid, uniqueid)A 128-bit globally unique value.
intA signed, 32-bit wide, integer.
longA signed, 64-bit wide, integer.
real (double)A 64-bit wide, double-precision, floating-point number.
stringA sequence of zero or more Unicode characters.
timespan (time)A time interval.

While most of the data types are standard, you might be less familiar with types like dynamic or timespan, and guid.

  • Dynamic has a structure similar to JSON, but with one key difference: It can store Kusto Query Language-specific data types that traditional JSON can’t, such as a nested dynamic value, or timespan.

  • Timespan is a data type that refers to a measure of time such as hours, days, or seconds. Don’t confuse timespan with datetime, which evaluates to an actual date and time, not a measure of time. The following table shows a list of timespan suffixes.

  • GUID is a datatype representing a 128-bit, globally unique identifier, which follows the standard format of [8]-[4]-[4]-[4]-[12], where each [number] represents the number of characters and each character can range from 0-9 or a-f.

Null values

All nonstring data types can be null. When a value is null, it indicates an absence or mismatch of data. For example, if you try to input the string abc into an integer column, it results in the null value. To check if an expression is null, use the isnull() function.

For more information, see Null values.

3.3 - The bool data type

This article describes the bool data type.

The bool data type can be: true (1), false (0), or null.

bool literals

To specify a bool literal, use one of the following syntax options:

SyntaxDescription
true or bool(true)Represents trueness.
false or bool(false)Represents falsehood.
bool(null)Represents the null value.

Boolean operators

The bool data type supports all of the logical operators: equality (==), inequality (!=), logical-and (and), and logical-or (or).

3.4 - The datetime data type

This article describes the datetime data type.

The datetime data type represents an instant in time, typically expressed as a date and time of day. Values range from 00:00:00 (midnight), January 1, 0001 Anno Domini (Common Era) through 11:59:59 P.M., December 31, 9999 A.D. (C.E.) in the Gregorian calendar.

Time values are measured in 100-nanosecond units called ticks, and a particular date is the number of ticks since 12:00 midnight, January 1, 0001 A.D. (C.E.) in the GregorianCalendar calendar (excluding ticks that would be added by leap seconds). For example, a ticks value of 31241376000000000 represents the date, Friday, January 01, 0100 12:00:00 midnight. This is sometimes called “a moment in linear time”.

datetime literals

To specify a datetime literal, use one of the following syntax options:

SyntaxDescriptionExample
datetime(year.month.day hour:minute:second.milliseconds)A date and time in UTC format.datetime(2015-12-31 23:59:59.9)
datetime(year.month.day)A date in UTC format.datetime(2015-12-31)
datetime()Returns the current time.
datetime(null)Represents the null value.

The now() and ago() special functions

Kusto provides two special functions, now() and ago(), to allow queries to reference the time at which the query starts execution.

Supported formats

There are several formats for datetime that are supported as datetime() literals and the todatetime() function.

ISO 8601

FormatExample
%Y-%m-%dT%H:%M:%s%z2014-05-25T08:20:03.123456Z
%Y-%m-%dT%H:%M:%s2014-05-25T08:20:03.123456
%Y-%m-%dT%H:%M2014-05-25T08:20
%Y-%m-%d %H:%M:%s%z2014-11-08 15:55:55.123456Z
%Y-%m-%d %H:%M:%s2014-11-08 15:55:55
%Y-%m-%d %H:%M2014-11-08 15:55
%Y-%m-%d2014-11-08

RFC 822

FormatExample
%w, %e %b %r %H:%M:%s %ZSat, 8 Nov 14 15:05:02 GMT
%w, %e %b %r %H:%M:%sSat, 8 Nov 14 15:05:02
%w, %e %b %r %H:%MSat, 8 Nov 14 15:05
%w, %e %b %r %H:%M %ZSat, 8 Nov 14 15:05 GMT
%e %b %r %H:%M:%s %Z8 Nov 14 15:05:02 GMT
%e %b %r %H:%M:%s8 Nov 14 15:05:02
%e %b %r %H:%M8 Nov 14 15:05
%e %b %r %H:%M %Z8 Nov 14 15:05 GMT

RFC 850

FormatExample
%w, %e-%b-%r %H:%M:%s %ZSaturday, 08-Nov-14 15:05:02 GMT
%w, %e-%b-%r %H:%M:%sSaturday, 08-Nov-14 15:05:02
%w, %e-%b-%r %H:%M %ZSaturday, 08-Nov-14 15:05 GMT
%w, %e-%b-%r %H:%MSaturday, 08-Nov-14 15:05
%e-%b-%r %H:%M:%s %Z08-Nov-14 15:05:02 GMT
%e-%b-%r %H:%M:%s08-Nov-14 15:05:02
%e-%b-%r %H:%M %Z08-Nov-14 15:05 GMT
%e-%b-%r %H:%M08-Nov-14 15:05

Sortable

FormatExample
%Y-%n-%e %H:%M:%s2014-11-08 15:05:25
%Y-%n-%e %H:%M:%s %Z2014-11-08 15:05:25 GMT
%Y-%n-%e %H:%M2014-11-08 15:05
%Y-%n-%e %H:%M %Z2014-11-08 15:05 GMT
%Y-%n-%eT%H:%M:%s2014-11-08T15:05:25
%Y-%n-%eT%H:%M:%s %Z2014-11-08T15:05:25 GMT
%Y-%n-%eT%H:%M2014-11-08T15:05
%Y-%n-%eT%H:%M %Z2014-11-08T15:05 GMT

3.5 - The decimal data type

This article describes the decimal data type.

The decimal data type represents a 128-bit wide, decimal number.

decimal literals

To specify a decimal literal, use one of the following syntax options:

|Syntax|Description|Example| |–|–| |decimal(number)|A decimal number represented by one or more digits, followed by a decimal point, and then one or more digits.|decimal(1.0)| |decimal(numbereexponent)|A decimal number represented by scientific notation.|decimal(1e5) is equivalent to 100,000| |decimal(null)|Represents the null value.||

3.6 - The dynamic data type

This article describes The dynamic data type.

The dynamic scalar data type can be any of the following values:

  • An array of dynamic values, holding zero or more values with zero-based indexing.
  • A property bag that maps unique string values to dynamic values. The property bag has zero or more such mappings (called “slots”), indexed by the unique string values. The slots are unordered.
  • A value of any of the primitive scalar data types: bool, datetime, guid, int, long, real, string, and timespan.
  • Null. For more information, see Null values.

Dynamic literals

To specify a dynamic literal, use one of the following syntax options:

SyntaxDescriptionExample
dynamic([value [, …]])An array of dynamic or other scalar literals.dynamic([1, 2, "hello"])
dynamic({key = value [, …]})A property bag, or object. The value for a key can be a nested property bag.dynamic({"a":1, "b":{"a":2}})
dynamic(value)A dynamic value holding the value of the inner scalar data type.dynamic(4)
dynamic(null)Represents the null value.

Dynamic object accessors

To subscript a dictionary, use either the dot notation (dict.key) or the brackets notation (dict["key"]). When the subscript is a string constant, both options are equivalent.

In the examples below dict and arr are columns of dynamic type:

ExpressionAccessor expression typeMeaningComments
dict[col]Entity name (column)Subscripts a dictionary using the values of the column col as the keyColumn must be of type string
arr[index]Entity index (column)Subscripts an array using the values of the column index as the indexColumn must be of type integer or boolean
arr[-index]Entity index (column)Retrieves the ‘index’-th value from the end of the arrayColumn must be of type integer or boolean
arr[(-1)]Entity indexRetrieves the last value in the array
arr[toint(indexAsString)]Function callCasts the values of column indexAsString to int and use them to subscript an array
dict[[‘where’]]Keyword used as entity name (column)Subscripts a dictionary using the values of column where as the keyEntity names that are identical to some query language keywords must be quoted
dict.[‘where’] or dict[‘where’]ConstantSubscripts a dictionary using where string as the key

Accessing a sub-object of a dynamic value yields another dynamic value, even if the sub-object has a different underlying type. Use the gettype function to discover the actual underlying type of the value, and any of the cast function listed below to cast it to the actual type.

Casting dynamic objects

After subscripting a dynamic object, you must cast the value to a simple type.

ExpressionValueType
Xparse_json(’[100,101,102]’)array
X[0]parse_json(‘100’)dynamic
toint(X[1])101int
Yparse_json(’{“a1”:100, “a b c”:“2015-01-01”}’)dictionary
Y.a1parse_json(‘100’)dynamic
Y[“a b c”]parse_json(“2015-01-01”)dynamic
todate(Y[“a b c”])datetime(2015-01-01)datetime

Cast functions are:

  • tolong()
  • todouble()
  • todatetime()
  • totimespan()
  • tostring()
  • toguid()
  • parse_json()

Building dynamic objects

Several functions enable you to create new dynamic objects:

  • bag_pack() creates a property bag from name/value pairs.
  • pack_array() creates an array from list of values (can be list of columns, for each row it will create an array from the specified columns).
  • range() creates an array with an arithmetic series of numbers.
  • zip() pairs “parallel” values from two arrays into a single array.
  • repeat() creates an array with a repeated value.

Additionally, there are several aggregate functions which create dynamic arrays to hold aggregated values:

  • buildschema() returns the aggregate schema of multiple dynamic values.
  • make_bag() returns a property bag of dynamic values within the group.
  • make_bag_if() returns a property bag of dynamic values within the group (with a predicate).
  • make_list() returns an array holding all values, in sequence.
  • make_list_if() returns an array holding all values, in sequence (with a predicate).
  • make_list_with_nulls() returns an array holding all values, in sequence, including null values.
  • make_set() returns an array holding all unique values.
  • make_set_if() returns an array holding all unique values (with a predicate).

Operators and functions over dynamic types

For a complete list of scalar dynamic/array functions, see dynamic/array functions.

Operator or functionUsage with dynamic data types
value in arrayTrue if there’s an element of array that == value
where City in ('London', 'Paris', 'Rome')
value !in arrayTrue if there’s no element of array that == value
array_length(array)Null if it isn’t an array
bag_has_key(bag,key)Checks whether a dynamic bag column contains a given key.
bag_keys(bag)Enumerates all the root keys in a dynamic property-bag object.
bag_merge(bag1,…,bagN)Merges dynamic property-bags into a dynamic property-bag with all properties merged.
bag_set_key(bag,key,value)Sets a given key to a given value in a dynamic property-bag.
extract_json(path,object), extract_json(path,object)Use path to navigate into object.
parse_json(source)Turns a JSON string into a dynamic object.
range(from,to,step)An array of values.
mv-expand listColumnReplicates a row for each value in a list in a specified cell.
summarize buildschema(column)Infers the type schema from column content.
summarize make_bag(column)Merges the property bag (dictionary) values in the column into one property bag, without key duplication.
summarize make_bag_if(column,predicate)Merges the property bag (dictionary) values in the column into one property bag, without key duplication (with predicate).
summarize make_list(column)Flattens groups of rows and puts the values of the column in an array.
summarize make_list_if(column,predicate)Flattens groups of rows and puts the values of the column in an array (with predicate).
summarize make_list_with_nulls(column)Flattens groups of rows and puts the values of the column in an array, including null values.
summarize make_set(column)Flattens groups of rows and puts the values of the column in an array, without duplication.

Indexing for dynamic data

Every field is indexed during data ingestion. The scope of the index is a single data shard.

To index dynamic columns, the ingestion process enumerates all “atomic” elements within the dynamic value (property names, values, array elements) and forwards them to the index builder. Otherwise, dynamic fields have the same inverted term index as string fields.

Examples

Dynamic property bag

The following query creates a dynamic property bag.

print o=dynamic({"a":123, "b":"hello", "c":[1,2,3], "d":{}})
| extend a=o.a, b=o.b, c=o.c, d=o.d

For convenience, dynamic literals that appear in the query text itself may also include other Kusto literals with types: datetime, timespan, real, long, guid, bool, and dynamic. This extension over JSON isn’t available when parsing strings (such as when using the parse_json function or when ingesting data), but it enables you to do the following:

print d=dynamic({"a": datetime(1970-05-11)})

To parse a string value that follows the JSON encoding rules into a dynamic value, use the parse_json function. For example:

  • parse_json('[43, 21, 65]') - an array of numbers
  • parse_json('{"name":"Alan", "age":21, "address":{"street":432,"postcode":"JLK32P"}}') - a dictionary
  • parse_json('21') - a single value of dynamic type containing a number
  • parse_json('"21"') - a single value of dynamic type containing a string
  • parse_json('{"a":123, "b":"hello", "c":[1,2,3], "d":{}}') - gives the same value as o in the example above.

Ingest data into dynamic columns

The following example shows how you can define a table that holds a dynamic column (as well as a datetime column) and then ingest single record into it. It also demonstrates how you can encode JSON strings in CSV files.

// dynamic is just like any other type:
.create table Logs (Timestamp:datetime, Trace:dynamic)

// Everything between the "[" and "]" is parsed as a CSV line would be:
// 1. Since the JSON string includes double-quotes and commas (two characters
//    that have a special meaning in CSV), we must CSV-quote the entire second field.
// 2. CSV-quoting means adding double-quotes (") at the immediate beginning and end
//    of the field (no spaces allowed before the first double-quote or after the second
//    double-quote!)
// 3. CSV-quoting also means doubling-up every instance of a double-quotes within
//    the contents.

.ingest inline into table Logs
  [2015-01-01,"{""EventType"":""Demo"", ""EventValue"":""Double-quote love!""}"]

Output

TimestampTrace
2015-01-01 00:00:00.0000000{“EventType”:“Demo”,“EventValue”:“Double-quote love!”}

3.7 - The guid data type

This article describes The guid data type.

The guid data type represents a 128-bit globally unique value.

guid literals

To specify a guid literal, use one of the following syntax options:

SyntaxDescriptionExample
guid(id)A guid ID string.guid(74be27de-1e4e-49d9-b579-fe0b331d3642)
guid(null)Represents the null value.

3.8 - The int data type

This article describes the int data type.

The int data type represents a signed, 32-bit wide, integer.

int literals

To specify an int literal, use one of the following syntax options:

|Syntax|Description|Example| |–|–| |int(number)|A positive integer.|int(2)| |int(-number)|A negative integer.|int(-2)| |int(null)|Represents the null value.||

3.9 - The long data type

This article describes the long data type.

The long data type represents a signed, 64-bit wide, integer.

By default, integers and integers represented with hexadecimal syntax are of type long.

long literals

To specify a long literal, use one of the following syntax options:

|Syntax|Description|Example| |–|–| |number|An integer. You don’t need to wrap the integer with long() because integers are by default of type long.|12| |0xhex|An integer represented with hexadecimal syntax.|0xf is equivalent to 15| |long(-number)|A negative integer.|long(-1)| |long(null)|Represents the null value.||

3.10 - The real data type

This article describes the real data type.

The real data type represents a 64-bit wide, double-precision, floating-point number.

By default, decimal numbers and numbers represented with scientific notation are of type real.

real literals

To specify a real literal, use one of the following syntax options:

SyntaxDescriptionExample
numberA real number represented by one or more digits, followed by a decimal point, and then one or more digits.1.0
numbereexponentA real number represented by scientific notation.1e5
real(null)Represents the null value.
real(nan)Not-a-number (NaN), such as when dividing a 0.0 by another 0.0.
real(+inf)Positive infinity, such as when dividing 1.0 by 0.0.
real(-inf)Negative infinity, such as when dividing -1.0 by 0.0.

3.11 - The string data type

Learn about the string data type.

The string data type represents a sequence of zero or more Unicode characters.

For information on string query operators, see String operators.

string literals

A string literal is a string enclosed in quotes. You can use double quotes or single quotes to encode string literals in query text. With double quotes, you must escape nested double quote characters with a backslash (\). With single quotes, you must escape nested single quote characters, and you don’t need to escape double quotes.

Use the backslash character to escape the enclosing quote characters, tab characters (\t), newline characters (\n), and the backslash itself (\\).

Verbatim string literals

Verbatim string literals are string literals prepended with the @ character, which serves as a verbatim identifier. In this form, the backslash character (\) stands for itself and isn’t an escape character. In verbatim string literals, double quotes are escaped with double quotes and single quotes are escaped with single quotes.

For an example, see Verbatim string.

Multi-line string literals

Indicate a multi-line string literals by a “triple-backtick chord” (```) at the beginning and end of the literal.

For an example, see Multi-line string literal.

Concatenation of separated string literals

In a Kusto query, when two or more adjacent string literals have no separation between them, they’re automatically combined to form a new string literal. Similarly, if the string literals are separated only by whitespace or comments, they’re also combined to form a new string literal.

For an example, see Concatenated string literals.

Obfuscated string literals

Queries are stored for telemetry and analysis. To safeguard sensitive information like passwords and secrets, you can mark a string as an obfuscated string literal. These marked strings are logged in obfuscated form replaced with asterisks (*) in the query text.

An obfuscated string literal is created by prepending an h or an H character in front of a standard or verbatim string literal.

For an example, see Obfuscated string literal.

Examples

String literal with quotes

The following example demonstrates how to use quotes within string literals encompassed by single quotes and double quotes. For more information, see String literals.

print
    s1 = 'string with "double quotes"',
    s2 = "string with 'single quotes'"

Output

s1s2
string with “double quotes”string with ‘single quotes’

String literal with backslash escaping

The following example creates a regular expression pattern using backslashes to escape special characters. For more information, see String literals.

print pattern = '\\n.*(>|\'|=|\")[a-zA-Z0-9/+]{86}=='

Output

pattern
\n.*(>|’|=|")[a-zA-Z0-9/+]{86}==

String literal with Unicode

The following example shows that a backslash is needed to include a Unicode character in a string literal.

print space = "Hello\u00A0World"

Output

space
Hello World

Verbatim string literal

The following example creates a path in which the backslashes are part of the path instead of escape characters. To do this, the string @ sign is prepended to the string, creating a verbatim string literal.

print myPath = @'C:\Folder\filename.txt'

Output

myPath
C:\Folder\filename.txt

Multi-line string literal

The following example shows the syntax for a multi-line string literal, which uses newlines and tabs to style a code block. For more information, see Multi-line string literals.

print program = ```
  public class Program {
    public static void Main() {
      System.Console.WriteLine("Hello!");
    }
  }```

Output

program
public class Program { public static void Main() { System.Console.WriteLine(“Hello!”); } }

Concatenated string literals

The following expressions all yield a string of length 13. For more information, see Concatenation of separated string literals.

print 
    none = strlen("Hello"', '@"world!"),
    whitespace = strlen("Hello" ', ' @"world!"),
    whitespaceAndComment = strlen("Hello" 
        // Comment
        ', '@"world!"
    );

Output

nonewhitespacewhitespaceAndComment
131313

Obfuscated string literal

In the following query output, the h string is visible in your results. However, in tracing or telemetry, the h string is stored in an obfuscated form and substituted with asterisks in the log. For more information, see Obfuscated string literals.

print blob="https://contoso.blob.core.windows.net/container/blob.txt?"
    h'sv=2012-02-12&se=2013-04-13T0...'

Output

blob
https://contoso.blob.core.windows.net/container/blob.txt?sv=2012-02-12&se=2013-04-13T0

3.12 - The timespan data type

This article describes The timespan data type.

The timespan data type represents a time interval.

timespan literals

To specify a timespan literal, use one of the following syntax options:

SyntaxDescriptionExampleLength of time
ndA time interval represented by one or more digits followed by d for days.2d2 days
nhA time interval represented by one or more digits followed by h for hours.1.5h1.5 hours
nmA time interval represented by one or more digits followed by m for minutes.30m30 minutes
nsA time interval represented by one or more digits followed by s for seconds.10s10 seconds
nmsA time interval represented by one or more digits followed by ms for milliseconds.100ms100 milliseconds
nmicrosecondA time interval represented by one or more digits followed by microsecond.10microsecond10 microseconds
ntickA time interval represented by one or more digits followed by tick to indicate nanoseconds.1tick100 ns
timespan(n seconds)A time interval in seconds.timespan(15 seconds)15 seconds
timespan(n)A time interval in days.timespan(2)2 days
timespan(days.hours:minutes:seconds.milliseconds)A time interval in days, hours, minutes, and seconds passed.timespan(0.12:34:56.7)0d+12h+34m+56.7s
timespan(null)Represents the null value.

timespan operators

Two values of type timespan may be added, subtracted, and divided. The last operation returns a value of type real representing the fractional number of times one value can fit the other.

Examples

The following example calculates how many seconds are in a day in several ways:

print
    result1 = 1d / 1s,
    result2 = time(1d) / time(1s),
    result3 = 24 * 60 * time(00:01:00) / time(1s)

This example converts the number of seconds in a day (represented by an integer value) to a timespan unit:

print 
    seconds = 86400
| extend t = seconds * 1s

4 - Entities

4.1 - Columns

This article describes Columns.

Columns are named entities that have a scalar data type. Columns are referenced in the query relative to the tabular data stream that is in context of the specific operator referencing them.Every table in Kusto, and every tabular data stream, is a rectangular grid of columns and rows. The columns of a table or a tabular data stream are ordered, so a column also has a specific position in the table’s collection of columns.

Reference columns in queries

In queries, columns are generally referenced by name only. They can only appear in expressions, and the query operator under which the expression appears determines the table or tabular data stream. The column’s name doesn’t need to be scoped further.

For example, in the following query we have an unnamed tabular data stream that is defined through the datatable operator and has a single column, c. The tabular data stream is filtered by a predicate on the value of that column, and produces a new unnamed tabular data stream with the same columns but fewer rows. The as operator then names the tabular data stream, and its value is returned as the results of the query. Notice how column c is referenced by name without referencing its container:

datatable (c:int) [int(-1), 0, 1, 2, 3]
| where c*c >= 2
| as Result

4.2 - Databases

This article describes Databases.

Databases are named entities that hold tables and stored functions. Kusto follows a relation model of storing the data where the upper-level entity is a database.

A single cluster can host several databases, in which each database hosts its own collection of tables, stored functions, and external tables. Each database has its own set of permissions that follow the Role Based Access Control (RBAC) model.

A single Eventhouse can host several databases, in which each database hosts its own collection of tables, stored functions, and external tables. Each database has its own set of permissions that follow the Role Based Access Control (RBAC) model.

A database hosts its own collection of tables, stored functions, and external tables. Each database has its own set of permissions that follow the Role Based Access Control (RBAC) model.

4.3 - Entities

This article describes Entities.

Kusto queries execute in the context of a Kusto database. Data in the database is arranged in tables, which the query may reference, and within the table it is organized as a rectangular grid of columns and rows. Additionally, queries may reference stored functions in the database, which are query fragments made available for reuse.

  • Clusters are entities that hold databases. Clusters have no name, but they can be referenced by using the cluster() special function with the cluster’s URI. For example, cluster("https://help.kusto.windows.net") is a reference to a cluster that holds the Samples database.

  • Databases are named entities that hold tables and stored functions. All Kusto queries run in the context of some database, and the entities of that database may be referenced by the query with no qualifications. Additionally, other databases may be referenced using the database() special function. For example, cluster("https://help.kusto.windows.net").database("Samples") is a universal reference to a specific database.

  • Tables are named entities that hold data. A table has an ordered set of columns, and zero or more rows of data, each row holding one data value for each of the columns of the table. Tables may be referenced by name only if they are in the database in context of the query, or by qualifying them with a database reference otherwise. For example, cluster("https://help.kusto.windows.net").database("Samples").StormEvents is a universal reference to a particular table in the Samples database. Tables may also be referenced by using the table() special function.

  • Columns are named entities that have a scalar data type. Columns are referenced in the query relative to the tabular data stream that is in context of the specific operator referencing them.

  • Stored functions are named entities that allow reuse of Kusto queries or query parts.

  • Views are virtual tables based on functions (stored or defined in an ad-hoc fashion).

  • External tables are entities that reference data stored outside Kusto database. External tables are used for exporting data from Kusto to external storage as well as for querying external data without ingesting it into Kusto.

4.4 - Entity names

This article describes Entity names.

Kusto entities are referenced in a query by name. Entities that can be referenced by their name include databases, tables, columns, and stored functions, but not clusters. The name you assign an entity is called an identifier. In addition to entities, you can also assign an identifier to query parameters, or variables set through a let statement.

Kusto entities are referenced in a query by name. Entities that can be referenced by their name include databases, tables, columns, and stored functions. The name you assign an entity is called an identifier. In addition to entities, you can also assign an identifier to query parameters, or variables set through a let statement.

An entity’s name is unique to the entity type in the context of its container. For example, two tables in the same database can’t have the same name, but a database and a table can have the same name because they’re different entity types. Similarly, a table and a stored function may have the same name.

Pretty names

In addition to the entity’s name, some entities may have a pretty name. Similar to the use of entity names, pretty names can be used to reference an entity in queries. But unlike entity names, pretty names aren’t necessarily unique in the context of their container. When a container has multiple entities with the same pretty name, the pretty name can’t be used to reference the entity.

Pretty names allow middle-tier applications to map automatically created entity names (such as UUIDs) to names that are human-readable for display and referencing purposes.

For an example on how to assign a pretty name, see .alter database prettyname command.

Identifier naming rules

An identifier is the name you assign to entities, query parameters, or variable set through a let statement. Valid identifiers must follow these rules:

  • Identifiers are case-sensitive. Database names are case-insensitive, and therefore an exception to this rule.
  • Identifiers must be between 1 and 1024 characters long.
  • Identifiers may contain letters, digits, and underscores (_).
  • Identifiers may contain certain special characters: spaces, dots (.), and dashes (-). For information on how to reference identifiers with special characters, see Reference identifiers in queries.

Avoid naming identifiers as language keywords or literals

In KQL, there are keywords and literals that have similar naming rules as identifiers. You can have identifiers with the same name as keywords or literals. However, we recommend that you avoid doing so as referencing them in queries requires special quoting.

To avoid using an identifier that might also be a language keyword or literal, such as where, summarize, and 1day, you can choose your entity name according to the following conventions, which aren’t applicable to language keywords:

  • Use a name that starts with a capital letter (A to Z).

  • Use a name that starts or ends with a single underscore (_).

    [!NOTE] KQL reserves all identifiers that start or end with a sequence of two underscore characters (__); users can’t define such names for their own use.

For information on how to reference these identifiers, see Reference identifiers in queries.

Reference identifiers in queries

The following table provides an explanation on how to reference identifiers in queries.

Identifier typeIdentifierReferenceExplanation
NormalentityentityIdentifiers (entity) that don’t include special characters or map to some language keyword don’t need to be enclosed in quotation marks.
Special characterentity-name['entity-name']Identifiers names that include special characters (such as -) must be enclosed using [' and '] or using [" and "].
language keywordwhere["where"]Identifier names that are language keywords must be enclosed using [' and '] or [" and "].
literal1day["1day"]Identifier names that are literals must be enclosed using [' and '] or [" and "].

4.5 - Entity references

This article describes Entity references.

Kusto entities are referenced in a query by name. Entities that can be referenced by their name include databases, tables, columns, and stored functions, but not clusters.

Kusto entities are referenced in a query by name. Entities that can be referenced by their name include databases, tables, columns, and stored functions.

If the entity’s container is unambiguous in the current context, use the entity name without additional qualifications. For example, when running a query against a database called DB, you may reference a table called T in that database by its name, T.

If the entity’s container isn’t available from the context, or you want to reference an entity from a container different than the container in context, use the entity’s qualified name. The name is the concatenation of the entity name to the container’s, and potentially its container’s, and so on. In this way, a query running against database DB may refer to a table T1 in a different database DB1, by using database("DB1").T1.

If the query wants to reference a table from another cluster it can do so, for example, by using cluster("https://C2.kusto.windows.net/").database("DB2").T2.

Entity references can also use the entity pretty name, as long as it’s unique in the context of the entity’s container. For more information, see entity pretty names.

Wildcard matching for entity names

In some contexts, you may use a wildcard (*) to match all or part of an entity name. For example, the following query references all tables in the current database, and all tables in database DB whose name starts with a T:

union *, database("DB1").T*

Such names are system-reserved.

4.6 - External tables

This article describes External tables.

An external table is a schema entity that references data stored external to a Kusto database.

Similar to tables, an external table has a well-defined schema (an ordered list of column name and data type pairs). Unlike tables where data is ingested into your cluster, external tables operate on data stored and managed outside your cluster.

Supported external data stores are:

  • Files stored in Azure Blob Storage or in Azure Data Lake. Most commonly the data is stored in some standard format such as CSV, JSON, Parquet, AVRO, etc. For the list of supported formats, refer to supported formats.
  • SQL table (SQL Server, MySql, PostgreSql, and Cosmos DB).

See the following ways of creating external tables:

An external table can be referenced by its name using the external_table() function.

Use the following commands to manage external tables:

For more information about how to query external tables, and ingested and uningested data, see Query data in Azure Data Lake using Azure Data Explorer.

To accelerate queries over external delta tables, see Query acceleration policy.

4.7 - Fact and dimension tables

This article describes Fact and dimension tables.

When designing the schema for a database, think of tables as broadly belonging to one of two categories.

Fact tables

Fact tables are tables whose records are immutable “facts”, such as service logs and measurement information. Records are progressively appended into the table in a streaming fashion or in large chunks. The records stay there until they’re removed because of cost or because they’ve lost their value. Records are otherwise never updated.

Entity data is sometimes held in fact tables, where the entity data changes slowly. For example, data about some physical entity, such as a piece of office equipment that infrequently changes location. Since data in Kusto is immutable, the common practice is to have each table hold two columns:

  • An identity (string) column that identifies the entity
  • A last-modified (datetime) timestamp column

Only the last record for each entity identity is then retrieved.

Dimension tables

Dimension tables:

  • Hold reference data, such as lookup tables from an entity identifier to its properties
  • Hold snapshot-like data in tables whose entire contents change in a single transaction

Dimension tables aren’t regularly ingested with new data. Instead, the entire data content is updated at once, using operations such as .set-or-replace, .move extents, or .rename tables.

Sometimes, dimension tables might be derived from fact tables. This process can be done via a materialized view on the fact table, with a query on the table that takes the last record for each entity.

Differentiate fact and dimension tables

There are processes in Kusto that differentiate between fact tables and dimension tables. One of them is continuous export.

These mechanisms are guaranteed to process data in fact tables precisely once. They rely on the database cursor mechanism.

For example, every execution of a continuous export job, exports all records that were ingested since the last update of the database cursor. Continuous export jobs must differentiate between fact tables and dimension tables. Fact tables only process newly ingested data, and dimension tables are used as lookups. As such, the entire table must be taken into account.

There’s no way to “mark” a table as being a “fact table” or a “dimension table”. The way data is ingested into the table, and how the table is used, is what identifies its type.

The way data is ingested into the table, and how the table is used, is what identifies its type.

4.8 - Stored functions

This article describes Stored functions.

Functions are reusable queries or query parts. Functions can be stored as database entities, similar to tables, called stored functions. Alternatively, functions can be created in an ad-hoc fashion with a let statement, called query-defined functions. For more information, see user-defined functions.

To create and manage stored functions, see the Stored functions management overview.

For more information on working with functions in Log Analytics, see Functions in Azure Monitor log queries.

4.9 - Tables

This article describes Tables.

Tables are named entities that hold data. A table has an ordered set of columns, and zero or more rows of data. Each row holds one data value for each of the columns of the table. The order of rows in the table is unknown, and doesn’t in general affect queries, except for some tabular operators (such as the top operator) that are inherently undetermined. For information on how to create and manage tables, see managing tables.

Tables occupy the same namespace as stored functions. If a stored function and a table both have the same name, the stored function will be chosen.

References tables in queries

The simplest way to reference a table is by using its name. This reference can be done for all tables that are in the database in context. For example, the following query counts the records of the current database’s StormEvents table:

StormEvents
| count

An equivalent way to write the query above is by escaping the table name:

["StormEvents"]
| count

Tables may also be referenced by explicitly noting the database they are in. Then you can author queries that combine data from multiple databases. For example, the following query will work with any database in context, as long as the caller has access to the target database:

cluster("https://help.kusto.windows.net").database("Samples").StormEvents
| count

It’s also possible to reference a table by using the table() special function, as long as the argument to that function evaluates to a constant. For example:

let counter=(TableName:string) { table(TableName) | count };
counter("StormEvents")

4.10 - Views

Learn how to define and use a view.

A view is a virtual table based on the result-set of a Kusto Query Language (KQL) query.

Like real tables, views organize data with rows and columns, and participate in tasks that involve wildcard table name resolution, such as union * and search * scenarios. However, unlike real tables, views don’t maintain dedicated data storage. Rather, they dynamically represent the result of a query.

How to define a view

Views are defined through user-defined functions, which come in two forms: query-defined functions and stored functions. To qualify as a view, a function must accept no arguments and yield a tabular expression as its output.

To define a query-defined function as a view, specify the view keyword before the function definition. For an example, see Query-defined view.

To define a stored function as a view, set the view property to true when you create the function. For an example, see Stored view. For more information, see the .create function command.

Examples

Query-defined view

The following query defines two functions: T_view and T_notview. The query results demonstrate that only T_view is resolved by the wildcard reference in the union operation.

let T_view = view () { print x=1 };
let T_notview = () { print x=2 };
union T*

Stored view

The following query defines a stored view. This view behaves like any other stored function, yet can partake in wildcard scenarios.

.create function 
    with (view=true, docstring='Simple demo view', folder='Demo')  
    MyView() { StormEvents | take 100 }

5 - Functions

5.1 - bartlett_test_fl()

This article describes the bartlett_test_fl() user-defined function.

The bartlett_test_fl() function is a user-defined tabular function that performs the Bartlett Test.

Syntax

T | invoke bartlett_test_fl()(data1, data2, test_statistic,p_value)

Parameters

NameTypeRequiredDescription
data1string✔️The name of the column containing the first set of data to be used for the test.
data2string✔️The name of the column containing the second set of data to be used for the test.
test_statisticstring✔️The name of the column to store test statistic value for the results.
p_valuestring✔️The name of the column to store p-value for the results.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let bartlett_test_fl = (tbl:(*), data1:string, data2:string, test_statistic:string, p_value:string)
{
    let kwargs = bag_pack('data1', data1, 'data2', data2, 'test_statistic', test_statistic, 'p_value', p_value);
    let code = ```if 1:
        from scipy import stats
        data1 = kargs["data1"]
        data2 = kargs["data2"]
        test_statistic = kargs["test_statistic"]
        p_value = kargs["p_value"]
        def func(row):
            statistics = stats.bartlett(row[data1], row[data2])
            return statistics[0], statistics[1]
        result = df
        result[[test_statistic, p_value]]  = df.apply(func, axis=1, result_type = "expand")
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Stats", docstring = "Bartlett Test")
bartlett_test_fl(tbl:(*), data1:string, data2:string, test_statistic:string, p_value:string)
{
    let kwargs = bag_pack('data1', data1, 'data2', data2, 'test_statistic', test_statistic, 'p_value', p_value);
    let code = ```if 1:
        from scipy import stats
        data1 = kargs["data1"]
        data2 = kargs["data2"]
        test_statistic = kargs["test_statistic"]
        p_value = kargs["p_value"]
        def func(row):
            statistics = stats.bartlett(row[data1], row[data2])
            return statistics[0], statistics[1]
        result = df
        result[[test_statistic, p_value]]  = df.apply(func, axis=1, result_type = "expand")
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let bartlett_test_fl = (tbl:(*), data1:string, data2:string, test_statistic:string, p_value:string)
{
    let kwargs = bag_pack('data1', data1, 'data2', data2, 'test_statistic', test_statistic, 'p_value', p_value);
    let code = ```if 1:
        from scipy import stats
        data1 = kargs["data1"]
        data2 = kargs["data2"]
        test_statistic = kargs["test_statistic"]
        p_value = kargs["p_value"]
        def func(row):
            statistics = stats.bartlett(row[data1], row[data2])
            return statistics[0], statistics[1]
        result = df
        result[[test_statistic, p_value]]  = df.apply(func, axis=1, result_type = "expand")
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
};
// Example query that uses the function
datatable(id:string, sample1:dynamic, sample2:dynamic) [
'Test #1', dynamic([23.64, 20.57, 20.42]), dynamic([27.1, 22.12, 33.56]),
'Test #2', dynamic([20.85, 21.89, 23.41]), dynamic([35.09, 30.02, 26.52]),
'Test #3', dynamic([20.13, 20.5, 21.7, 22.02]), dynamic([32.2, 32.79, 33.9, 34.22])
]
| extend test_stat= 0.0, p_val = 0.0
| invoke bartlett_test_fl('sample1', 'sample2', 'test_stat', 'p_val')

Stored

datatable(id:string, sample1:dynamic, sample2:dynamic) [
'Test #1', dynamic([23.64, 20.57, 20.42]), dynamic([27.1, 22.12, 33.56]),
'Test #2', dynamic([20.85, 21.89, 23.41]), dynamic([35.09, 30.02, 26.52]),
'Test #3', dynamic([20.13, 20.5, 21.7, 22.02]), dynamic([32.2, 32.79, 33.9, 34.22])
]
| extend test_stat= 0.0, p_val = 0.0
| invoke bartlett_test_fl('sample1', 'sample2', 'test_stat', 'p_val')

Output

idsample1sample2test_statp_val
Test #1[23.64, 20.57, 20.42][27.1, 22.12, 33.56]1.76607962244257230.183868001738637
Test #2[20.85, 21.89, 23.41][35.09, 30.02, 26.52]1.92117106168960140.16572762069132516
Test #3[20.13, 20.5, 21.7, 22.02][32.2, 32.79, 33.9, 34.22]0.00269857138292344540.958570306268548

| Test #3 | [20.13, 20.5, 21.7, 22.02] | [32.2, 32.79, 33.9, 34.22] | 0.0026985713829234454 | 0.958570306268548 |

5.2 - binomial_test_fl()

This article describes the binomial_test_fl() user-defined function.

The function binomial_test_fl() is a UDF (user-defined function) that performs the binomial test.

Syntax

T | invoke binomial_test_fl(successes, trials [,success_prob [, alt_hypotheis ]])

Parameters

NameTypeRequiredDescription
successesstring✔️The name of the column containing the number of success results.
trialsstring✔️The name of the column containing the total number of trials.
p_valuestring✔️The name of the column to store the results.
success_probrealThe success probability. The default is 0.5.
alt_hypotheisstringThe alternate hypothesis can be two-sided, greater, or less. The default is two-sided.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let binomial_test_fl = (tbl:(*), successes:string, trials:string, p_value:string, success_prob:real=0.5, alt_hypotheis:string='two-sided')
{
    let kwargs = bag_pack('successes', successes, 'trials', trials, 'p_value', p_value, 'success_prob', success_prob, 'alt_hypotheis', alt_hypotheis);
    let code = ```if 1:
        from scipy import stats
        
        successes = kargs["successes"]
        trials = kargs["trials"]
        p_value = kargs["p_value"]
        success_prob = kargs["success_prob"]
        alt_hypotheis = kargs["alt_hypotheis"]
        
        def func(row, prob, h1):
            pv = stats.binom_test(row[successes], row[trials], p=prob, alternative=h1)
            return pv
        result = df
        result[p_value] = df.apply(func, axis=1, args=(success_prob, alt_hypotheis), result_type="expand")
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Stats", docstring = "Binomial test")
binomial_test_fl(tbl:(*), successes:string, trials:string, p_value:string, success_prob:real=0.5, alt_hypotheis:string='two-sided')
{
    let kwargs = bag_pack('successes', successes, 'trials', trials, 'p_value', p_value, 'success_prob', success_prob, 'alt_hypotheis', alt_hypotheis);
    let code = ```if 1:
        from scipy import stats
        
        successes = kargs["successes"]
        trials = kargs["trials"]
        p_value = kargs["p_value"]
        success_prob = kargs["success_prob"]
        alt_hypotheis = kargs["alt_hypotheis"]
        
        def func(row, prob, h1):
            pv = stats.binom_test(row[successes], row[trials], p=prob, alternative=h1)
            return pv
        result = df
        result[p_value] = df.apply(func, axis=1, args=(success_prob, alt_hypotheis), result_type="expand")
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let binomial_test_fl = (tbl:(*), successes:string, trials:string, p_value:string, success_prob:real=0.5, alt_hypotheis:string='two-sided')
{
    let kwargs = bag_pack('successes', successes, 'trials', trials, 'p_value', p_value, 'success_prob', success_prob, 'alt_hypotheis', alt_hypotheis);
    let code = ```if 1:
        from scipy import stats
        
        successes = kargs["successes"]
        trials = kargs["trials"]
        p_value = kargs["p_value"]
        success_prob = kargs["success_prob"]
        alt_hypotheis = kargs["alt_hypotheis"]
        
        def func(row, prob, h1):
            pv = stats.binom_test(row[successes], row[trials], p=prob, alternative=h1)
            return pv
        result = df
        result[p_value] = df.apply(func, axis=1, args=(success_prob, alt_hypotheis), result_type="expand")
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
};
datatable(id:string, x:int, n:int) [
'Test #1', 3, 5,
'Test #2', 5, 5,
'Test #3', 3, 15
]
| extend p_val=0.0
| invoke binomial_test_fl('x', 'n', 'p_val', success_prob=0.2, alt_hypotheis='greater')

Stored

datatable(id:string, x:int, n:int) [
'Test #1', 3, 5,
'Test #2', 5, 5,
'Test #3', 3, 15
]
| extend p_val=0.0
| invoke binomial_test_fl('x', 'n', 'p_val', success_prob=0.2, alt_hypotheis='greater')

Output

idxnp_val
Test #1350.05792
Test #2550.00032
Test #33150.601976790745087

5.3 - comb_fl()

This article describes comb_fl() user-defined function.

Calculate C(n, k)

The function comb_fl() is a user-defined function (UDF) that calculates C(n, k), the number of combinations for selection of k items out of n, without order. It’s based on the native gamma() function to calculate factorial. For more information, see facorial_fl(). For a selection of k items with order, use perm_fl().

Syntax

comb_fl(n, k)

Parameters

NameTypeRequiredDescription

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let comb_fl=(n:int, k:int)
{
    let fact_n = gamma(n+1);
    let fact_nk = gamma(n-k+1);
    let fact_k = gamma(k+1);
    tolong(fact_n/fact_nk/fact_k)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Stats", docstring = "Calculate number of combinations for selection of k items out of n items without order")
comb_fl(n:int, k:int)
{
    let fact_n = gamma(n+1);
    let fact_nk = gamma(n-k+1);
    let fact_k = gamma(k+1);
    tolong(fact_n/fact_nk/fact_k)
}

Example

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let comb_fl=(n:int, k:int)
{
    let fact_n = gamma(n+1);
    let fact_nk = gamma(n-k+1);
    let fact_k = gamma(k+1);
    tolong(fact_n/fact_nk/fact_k)
};
range n from 3 to 10 step 3
| extend k = n-2
| extend cnk = comb_fl(n, k)

Stored

range n from 3 to 10 step 3
| extend k = n-2
| extend cnk = comb_fl(n, k)

Output

nkcnk
313
6415
9736

5.4 - dbscan_dynamic_fl()

This article describes the dbscan_dynamic_fl() user-defined function.

The function dbscan_dynamic_fl() is a UDF (user-defined function) that clusterizes a dataset using the DBSCAN algorithm. This function is similar to dbscan_fl() just the features are supplied by a single numerical array column and not by multiple scalar columns.

Syntax

T | invoke dbscan_fl(features_col, cluster_col, epsilon, min_samples, metric, metric_params)

Parameters

NameTypeRequiredDescription
features_colstring✔️The name of the column containing the numeric array of features to be used for clustering.
cluster_colstring✔️The name of the column to store the output cluster ID for each record.
epsilonreal✔️The maximum distance between two samples to be considered as neighbors.
min_samplesintThe number of samples in a neighborhood for a point to be considered as a core point.
metricstringThe metric to use when calculating distance between points.
metric_paramsdynamicExtra keyword arguments for the metric function.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let dbscan_dynamic_fl=(tbl:(*), features_col:string, cluster_col:string, epsilon:double, min_samples:int=10, metric:string='minkowski', metric_params:dynamic=dynamic({'p': 2}))
{
    let kwargs = bag_pack('features_col', features_col, 'cluster_col', cluster_col, 'epsilon', epsilon, 'min_samples', min_samples,
                          'metric', metric, 'metric_params', metric_params);
    let code = ```if 1:

        from sklearn.cluster import DBSCAN
        from sklearn.preprocessing import StandardScaler

        features_col = kargs["features_col"]
        cluster_col = kargs["cluster_col"]
        epsilon = kargs["epsilon"]
        min_samples = kargs["min_samples"]
        metric = kargs["metric"]
        metric_params = kargs["metric_params"]

        df1 = df[features_col].apply(np.array)
        mat = np.vstack(df1.values)
        
        # Scale the dataframe
        scaler = StandardScaler()
        mat = scaler.fit_transform(mat)

        # see https://docs.scipy.org/doc/scipy/reference/spatial.distance.html for the various distance metrics

        dbscan = DBSCAN(eps=epsilon, min_samples=min_samples, metric=metric, metric_params=metric_params) # 'minkowski', 'chebyshev'
        labels = dbscan.fit_predict(mat)

        result = df
        result[cluster_col] = labels
    ```;
    tbl
    | evaluate python(typeof(*),code, kwargs)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\ML", docstring = "DBSCAN clustering of features passed as a single column containing numerical array")
dbscan_dynamic_fl(tbl:(*), features_col:string, cluster_col:string, epsilon:double, min_samples:int=10, metric:string='minkowski', metric_params:dynamic=dynamic({'p': 2}))
{
    let kwargs = bag_pack('features_col', features_col, 'cluster_col', cluster_col, 'epsilon', epsilon, 'min_samples', min_samples,
                          'metric', metric, 'metric_params', metric_params);
    let code = ```if 1:

        from sklearn.cluster import DBSCAN
        from sklearn.preprocessing import StandardScaler

        features_col = kargs["features_col"]
        cluster_col = kargs["cluster_col"]
        epsilon = kargs["epsilon"]
        min_samples = kargs["min_samples"]
        metric = kargs["metric"]
        metric_params = kargs["metric_params"]

        df1 = df[features_col].apply(np.array)
        mat = np.vstack(df1.values)
        
        # Scale the dataframe
        scaler = StandardScaler()
        mat = scaler.fit_transform(mat)

        # see https://docs.scipy.org/doc/scipy/reference/spatial.distance.html for the various distance metrics

        dbscan = DBSCAN(eps=epsilon, min_samples=min_samples, metric=metric, metric_params=metric_params) # 'minkowski', 'chebyshev'
        labels = dbscan.fit_predict(mat)

        result = df
        result[cluster_col] = labels
    ```;
    tbl
    | evaluate python(typeof(*),code, kwargs)
}

Example

The following example uses the invoke operator to run the function.

Clustering of artificial dataset with three clusters

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let dbscan_dynamic_fl=(tbl:(*), features_col:string, cluster_col:string, epsilon:double, min_samples:int=10, metric:string='minkowski', metric_params:dynamic=dynamic({'p': 2}))
{
    let kwargs = bag_pack('features_col', features_col, 'cluster_col', cluster_col, 'epsilon', epsilon, 'min_samples', min_samples,
                          'metric', metric, 'metric_params', metric_params);
    let code = ```if 1:

        from sklearn.cluster import DBSCAN
        from sklearn.preprocessing import StandardScaler

        features_col = kargs["features_col"]
        cluster_col = kargs["cluster_col"]
        epsilon = kargs["epsilon"]
        min_samples = kargs["min_samples"]
        metric = kargs["metric"]
        metric_params = kargs["metric_params"]

        df1 = df[features_col].apply(np.array)
        mat = np.vstack(df1.values)
        
        # Scale the dataframe
        scaler = StandardScaler()
        mat = scaler.fit_transform(mat)

        # see https://docs.scipy.org/doc/scipy/reference/spatial.distance.html for the various distance metrics

        dbscan = DBSCAN(eps=epsilon, min_samples=min_samples, metric=metric, metric_params=metric_params) # 'minkowski', 'chebyshev'
        labels = dbscan.fit_predict(mat)

        result = df
        result[cluster_col] = labels
    ```;
    tbl
    | evaluate python(typeof(*),code, kwargs)
};
union 
(range x from 1 to 100 step 1 | extend x=rand()+3, y=rand()+2),
(range x from 101 to 200 step 1 | extend x=rand()+1, y=rand()+4),
(range x from 201 to 300 step 1 | extend x=rand()+2, y=rand()+6)
| project Features=pack_array(x, y), cluster_id=int(null)
| invoke dbscan_dynamic_fl("Features", "cluster_id", epsilon=0.6, min_samples=4, metric_params=dynamic({'p':2}))
| extend x=toreal(Features[0]), y=toreal(Features[1])
| render scatterchart with(series=cluster_id)

Stored

union 
(range x from 1 to 100 step 1 | extend x=rand()+3, y=rand()+2),
(range x from 101 to 200 step 1 | extend x=rand()+1, y=rand()+4),
(range x from 201 to 300 step 1 | extend x=rand()+2, y=rand()+6)
| project Features=pack_array(x, y), cluster_id=int(null)
| invoke dbscan_dynamic_fl("Features", "cluster_id", epsilon=0.6, min_samples=4, metric_params=dynamic({'p':2}))
| extend x=toreal(Features[0]), y=toreal(Features[1])
| render scatterchart with(series=cluster_id)

Screenshot of scatterchart of DBSCAN clustering of artificial dataset with three clusters.

5.5 - dbscan_fl()

This article describes the dbscan_fl() user-defined function.

The function dbscan_fl() is a UDF (user-defined function) that clusterizes a dataset using the DBSCAN algorithm.

Syntax

T | invoke dbscan_fl(features, cluster_col, epsilon, min_samples, metric, metric_params)

Parameters

NameTypeRequiredDescription
featuresdynamic✔️An array containing the names of the features columns to use for clustering.
cluster_colstring✔️The name of the column to store the output cluster ID for each record.
epsilonreal✔️The maximum distance between two samples to be considered as neighbors.
min_samplesintThe number of samples in a neighborhood for a point to be considered as a core point.
metricstringThe metric to use when calculating distance between points.
metric_paramsdynamicExtra keyword arguments for the metric function.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let dbscan_fl=(tbl:(*), features:dynamic, cluster_col:string, epsilon:double, min_samples:int=10,
                       metric:string='minkowski', metric_params:dynamic=dynamic({'p': 2}))
{
    let kwargs = bag_pack('features', features, 'cluster_col', cluster_col, 'epsilon', epsilon, 'min_samples', min_samples,
                          'metric', metric, 'metric_params', metric_params);
    let code = ```if 1:

        from sklearn.cluster import DBSCAN
        from sklearn.preprocessing import StandardScaler

        features = kargs["features"]
        cluster_col = kargs["cluster_col"]
        epsilon = kargs["epsilon"]
        min_samples = kargs["min_samples"]
        metric = kargs["metric"]
        metric_params = kargs["metric_params"]

        df1 = df[features]
        mat = df1.values
        
        # Scale the dataframe
        scaler = StandardScaler()
        mat = scaler.fit_transform(mat)

        # see https://docs.scipy.org/doc/scipy/reference/spatial.distance.html for the various distance metrics

        dbscan = DBSCAN(eps=epsilon, min_samples=min_samples, metric=metric, metric_params=metric_params) # 'minkowski', 'chebyshev'
        labels = dbscan.fit_predict(mat)

        result = df
        result[cluster_col] = labels
    ```;
    tbl
    | evaluate python(typeof(*),code, kwargs)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\ML", docstring = "DBSCAN clustering")
dbscan_fl(tbl:(*), features:dynamic, cluster_col:string, epsilon:double, min_samples:int=10,
                       metric:string='minkowski', metric_params:dynamic=dynamic({'p': 2}))
{
    let kwargs = bag_pack('features', features, 'cluster_col', cluster_col, 'epsilon', epsilon, 'min_samples', min_samples,
                          'metric', metric, 'metric_params', metric_params);
    let code = ```if 1:

        from sklearn.cluster import DBSCAN
        from sklearn.preprocessing import StandardScaler

        features = kargs["features"]
        cluster_col = kargs["cluster_col"]
        epsilon = kargs["epsilon"]
        min_samples = kargs["min_samples"]
        metric = kargs["metric"]
        metric_params = kargs["metric_params"]

        df1 = df[features]
        mat = df1.values
        
        # Scale the dataframe
        scaler = StandardScaler()
        mat = scaler.fit_transform(mat)

        # see https://docs.scipy.org/doc/scipy/reference/spatial.distance.html for the various distance metrics

        dbscan = DBSCAN(eps=epsilon, min_samples=min_samples, metric=metric, metric_params=metric_params) # 'minkowski', 'chebyshev'
        labels = dbscan.fit_predict(mat)

        result = df
        result[cluster_col] = labels
    ```;
    tbl
    | evaluate python(typeof(*),code, kwargs)
}

Example

The following example uses the invoke operator to run the function.

Clustering of artificial dataset with three clusters

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let dbscan_fl=(tbl:(*), features:dynamic, cluster_col:string, epsilon:double, min_samples:int=10,
                       metric:string='minkowski', metric_params:dynamic=dynamic({'p': 2}))
{
    let kwargs = bag_pack('features', features, 'cluster_col', cluster_col, 'epsilon', epsilon, 'min_samples', min_samples,
                          'metric', metric, 'metric_params', metric_params);
    let code = ```if 1:

        from sklearn.cluster import DBSCAN
        from sklearn.preprocessing import StandardScaler

        features = kargs["features"]
        cluster_col = kargs["cluster_col"]
        epsilon = kargs["epsilon"]
        min_samples = kargs["min_samples"]
        metric = kargs["metric"]
        metric_params = kargs["metric_params"]

        df1 = df[features]
        mat = df1.values
        
        # Scale the dataframe
        scaler = StandardScaler()
        mat = scaler.fit_transform(mat)

        # see https://docs.scipy.org/doc/scipy/reference/spatial.distance.html for the various distance metrics

        dbscan = DBSCAN(eps=epsilon, min_samples=min_samples, metric=metric, metric_params=metric_params) # 'minkowski', 'chebyshev'
        labels = dbscan.fit_predict(mat)

        result = df
        result[cluster_col] = labels
    ```;
    tbl
    | evaluate python(typeof(*),code, kwargs)
};
union 
(range x from 1 to 100 step 1 | extend x=rand()+3, y=rand()+2),
(range x from 101 to 200 step 1 | extend x=rand()+1, y=rand()+4),
(range x from 201 to 300 step 1 | extend x=rand()+2, y=rand()+6)
| extend cluster_id=int(null)
| invoke dbscan_fl(pack_array("x", "y"), "cluster_id", epsilon=0.6, min_samples=4, metric_params=dynamic({'p':2}))
| render scatterchart with(series=cluster_id)

Stored

union 
(range x from 1 to 100 step 1 | extend x=rand()+3, y=rand()+2),
(range x from 101 to 200 step 1 | extend x=rand()+1, y=rand()+4),
(range x from 201 to 300 step 1 | extend x=rand()+2, y=rand()+6)
| extend cluster_id=int(null)
| invoke dbscan_fl(pack_array("x", "y"), "cluster_id", epsilon=0.6, min_samples=4, metric_params=dynamic({'p':2}))
| render scatterchart with(series=cluster_id)

Screenshot of scatterchart of DBSCAN clustering of artificial dataset with three clusters.

5.6 - detect_anomalous_new_entity_fl()

Learn how to use the detect_anomalous_new_entity_fl() function to detect the appearance of anomalous new entities.

Detect the appearance of anomalous new entities in timestamped data.

The function detect_anomalous_new_entity_fl() is a UDF (user-defined function) that detects the appearance of anomalous new entities - such as IP addresses or users - in timestamped data, such as traffic logs. In cybersecurity context, such events might be suspicious and indicate a potential attack or compromise.

The anomaly model is based on a Poisson distribution representing the number of new entities appearing per time bin (such as day) for each scope. Poisson distribution parameter is estimated based on the rate of appearance of new entities in training period, with added decay factor reflecting the fact that recent appearances are more important than old ones. Thus we calculate the probability to encounter a new entity in defined detection period per some scope - such as a subscription or an account. The model output is controlled by several optional parameters, such as minimal threshold for anomaly, decay rate parameter, and others.

The model’s direct output is an anomaly score based on the inverse of estimated probability to encounter a new entity. The score is monotonous in the range of [0, 1], with 1 representing something anomalous. In addition to the anomaly score, there’s a binary flag for detected anomaly (controlled by a minimal threshold parameter), and other explanatory fields.

Syntax

detect_anomalous_new_entity_fl(entityColumnName, scopeColumnName, timeColumnName, startTraining, startDetection, endDetection, [maxEntitiesThresh], [minTrainingDaysThresh], [decayParam], [anomalyScoreThresh])

Parameters

NameTypeRequiredDescription
entityColumnNamestring✔️The name of the input table column containing the names or IDs of the entities for which anomaly model is calculated.
scopeColumnNamestring✔️The name of the input table column containing the partition or scope, so that a different anomaly model is built for each scope.
timeColumnNamestring✔️The name of the input table column containing the timestamps, that are used to define the training and detection periods.
startTrainingdatetime✔️The beginning of the training period for the anomaly model. Its end is defined by the beginning of detection period.
startDetectiondatetime✔️The beginning of the detection period for anomaly detection.
endDetectiondatetime✔️The end of the detection period for anomaly detection.
maxEntitiesThreshintThe maximum number of existing entities in scope to calculate anomalies. If the number of entities is above the threshold, the scope is considered too noisy and anomalies aren’t calculated. The default value is 60.
minTrainingDaysThreshintThe minimum number of days in training period that a scope exists to calculate anomalies. If it is below threshold, the scope is considered too new and unknown, so anomalies aren’t calculated. The default value is 14.
decayParamrealThe decay rate parameter for anomaly model, a number in range (0,1]. Lower values mean faster decay, so more importance is given to later appearances in training period. A value of 1 means no decay, so a simple average is used for Poisson distribution parameter estimation. The default value is 0.95.
anomalyScoreThreshrealThe minimum value of anomaly score for which an anomaly is detected, a number in range [0, 1]. Higher values mean that only more significant cases are considered anomalous, so fewer anomalies are detected (higher precision, lower recall). The default value is 0.9.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let detect_anomalous_new_entity_fl = (T:(*), entityColumnName:string, scopeColumnName:string
                                        , timeColumnName:string, startTraining:datetime, startDetection:datetime, endDetection:datetime
                                        , maxEntitiesThresh:int = 60, minTrainingDaysThresh:int = 14, decayParam:real = 0.95, anomalyScoreThresh:real = 0.9)
{
//pre-process the input data by adding standard column names and dividing to datasets
let timePeriodBinSize = 'day';      // we assume a reasonable bin for time is day, so the probability model is built per that bin size
let processedData = (
    T
    | extend scope      = column_ifexists(scopeColumnName, '')
    | extend entity     = column_ifexists(entityColumnName, '')
    | extend sliceTime  = todatetime(column_ifexists(timeColumnName, ''))
    | where isnotempty(scope) and isnotempty(entity) and isnotempty(sliceTime)
    | extend dataSet = case((sliceTime >= startTraining and sliceTime < startDetection), 'trainSet'
                           , sliceTime >= startDetection and sliceTime <= endDetection,  'detectSet'
                                                                                       , 'other')
    | where dataSet in ('trainSet', 'detectSet')
);
// summarize the data by scope and entity. this will be used to create a distribution of entity appearances based on first seen data
let entityData = (
    processedData
    | summarize countRowsEntity = count(), firstSeenEntity = min(sliceTime), lastSeenEntity = max(sliceTime), firstSeenSet = arg_min(sliceTime, dataSet) 
        by scope, entity
    | extend firstSeenSet = dataSet
    | project-away dataSet
);
// aggregate entity data per scope and get the number of entities appearing over time
let aggregatedCandidateScopeData = (
    entityData
    | summarize countRowsScope = sum(countRowsEntity), countEntitiesScope = dcount(entity), countEntitiesScopeInTrain = dcountif(entity, firstSeenSet == 'trainSet')
        , firstSeenScope = min(firstSeenEntity), lastSeenScope = max(lastSeenEntity), hasNewEntities = iff(dcountif(entity,firstSeenSet == 'detectSet') > 0, 1, 0) 
            by scope
    | extend slicesInTrainingScope = datetime_diff(timePeriodBinSize, startDetection, firstSeenScope)
    | where countEntitiesScopeInTrain <= maxEntitiesThresh and slicesInTrainingScope >= minTrainingDaysThresh and lastSeenScope >= startDetection and hasNewEntities == 1
);
let modelData = (
    entityData
    | join kind = inner (aggregatedCandidateScopeData) on scope 
    | where firstSeenSet == 'trainSet'
    | summarize countAddedEntities = dcount(entity), firstSeenScope = min(firstSeenScope), slicesInTrainingScope = max(slicesInTrainingScope), countEntitiesScope = max(countEntitiesScope)
        by scope, firstSeenSet, firstSeenEntity
    | extend diffInDays = datetime_diff(timePeriodBinSize, startDetection, firstSeenEntity)
// adding exponentially decaying weights to counts
    | extend decayingWeight = pow(base = decayParam, exponent = diffInDays)
    | extend decayingValue = countAddedEntities * decayingWeight
    | summarize   newEntityProbability = round(1 - exp(-1.0 * sum(decayingValue)/max(diffInDays)), 4)
                , countKnownEntities = sum(countAddedEntities), lastNewEntityTimestamp = max(firstSeenEntity), slicesOnScope = max(slicesInTrainingScope)///for explainability
        by scope, firstSeenSet
// anomaly score is based on probability to get no new entities, calculated using Poisson distribution (P(X=0) = exp(-avg)) with added decay on average
    | extend newEntityAnomalyScore = round(1 - newEntityProbability, 4)
    | extend isAnomalousNewEntity = iff(newEntityAnomalyScore >= anomalyScoreThresh, 1, 0)
);
let resultsData = (
    processedData
    | where dataSet == 'detectSet'
    | join kind = inner (modelData) on scope
	| project-away scope1
    | where isAnomalousNewEntity == 1
    | summarize arg_min(sliceTime, *) by scope, entity
    | extend anomalyType = strcat('newEntity_', entityColumnName), anomalyExplainability = strcat('The ', entityColumnName, ' ', entity, ' wasn\'t seen on ', scopeColumnName, ' ', scope, ' during the last ',  slicesOnScope, ' ', timePeriodBinSize, 's. Previously, ', countKnownEntities
        , ' entities were seen, the last one of them appearing at ', format_datetime(lastNewEntityTimestamp, 'yyyy-MM-dd HH:mm'), '.')
    | join kind = leftouter (entityData | where firstSeenSet == 'trainSet' | extend entityFirstSeens = strcat(entity, ' : ', format_datetime(firstSeenEntity, 'yyyy-MM-dd HH:mm')) | sort by scope, firstSeenEntity asc | summarize anomalyState = make_list(entityFirstSeens) by scope) on scope
    | project-away scope1
);
resultsData
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (docstring = "Detect new and anomalous entity (such as username or IP) per scope (such as subscription or account)", skipvalidation = "true", folder = 'KCL') 
    detect_anomalous_new_entity_fl(T:(*), entityColumnName:string, scopeColumnName:string
                                        , timeColumnName:string, startTraining:datetime, startDetection:datetime, endDetection:datetime
                                        , maxEntitiesThresh:int = 60, minTrainingDaysThresh:int = 14, decayParam:real = 0.95, anomalyScoreThresh:real = 0.9)
{
//pre-process the input data by adding standard column names and dividing to datasets
let timePeriodBinSize = 'day';      // we assume a reasonable bin for time is day, so the probability model is built per that bin size
let processedData = (
    T
    | extend scope      = column_ifexists(scopeColumnName, '')
    | extend entity     = column_ifexists(entityColumnName, '')
    | extend sliceTime  = todatetime(column_ifexists(timeColumnName, ''))
    | where isnotempty(scope) and isnotempty(entity) and isnotempty(sliceTime)
    | extend dataSet = case((sliceTime >= startTraining and sliceTime < startDetection), 'trainSet'
                           , sliceTime >= startDetection and sliceTime <= endDetection,  'detectSet'
                                                                                       , 'other')
    | where dataSet in ('trainSet', 'detectSet')
);
// summarize the data by scope and entity. this will be used to create a distribution of entity appearances based on first seen data
let entityData = (
    processedData
    | summarize countRowsEntity = count(), firstSeenEntity = min(sliceTime), lastSeenEntity = max(sliceTime), firstSeenSet = arg_min(sliceTime, dataSet) 
        by scope, entity
    | extend firstSeenSet = dataSet
    | project-away dataSet
);
// aggregate entity data per scope and get the number of entities appearing over time
let aggregatedCandidateScopeData = (
    entityData
    | summarize countRowsScope = sum(countRowsEntity), countEntitiesScope = dcount(entity), countEntitiesScopeInTrain = dcountif(entity, firstSeenSet == 'trainSet')
        , firstSeenScope = min(firstSeenEntity), lastSeenScope = max(lastSeenEntity), hasNewEntities = iff(dcountif(entity,firstSeenSet == 'detectSet') > 0, 1, 0) 
            by scope
    | extend slicesInTrainingScope = datetime_diff(timePeriodBinSize, startDetection, firstSeenScope)
    | where countEntitiesScopeInTrain <= maxEntitiesThresh and slicesInTrainingScope >= minTrainingDaysThresh and lastSeenScope >= startDetection and hasNewEntities == 1
);
let modelData = (
    entityData
    | join kind = inner (aggregatedCandidateScopeData) on scope 
    | where firstSeenSet == 'trainSet'
    | summarize countAddedEntities = dcount(entity), firstSeenScope = min(firstSeenScope), slicesInTrainingScope = max(slicesInTrainingScope), countEntitiesScope = max(countEntitiesScope)
        by scope, firstSeenSet, firstSeenEntity
    | extend diffInDays = datetime_diff(timePeriodBinSize, startDetection, firstSeenEntity)
// adding exponentially decaying weights to counts of 
    | extend decayingWeight = pow(base = decayParam, exponent = diffInDays)
    | extend decayingValue = countAddedEntities * decayingWeight
    | summarize   newEntityProbability = round(1 - exp(-1.0 * sum(decayingValue)/max(diffInDays)), 4)
                , countKnownEntities = sum(countAddedEntities), lastNewEntityTimestamp = max(firstSeenEntity), slicesOnScope = max(slicesInTrainingScope)///for explainability
        by scope, firstSeenSet
// anomaly score is based on probability to get no new entities, calculated using Poisson distribution (P(X=0) = exp(-avg)) with added decay on average
    | extend newEntityAnomalyScore = round(1 - newEntityProbability, 4)
    | extend isAnomalousNewEntity = iff(newEntityAnomalyScore >= anomalyScoreThresh, 1, 0)
);
let resultsData = (
    processedData
    | where dataSet == 'detectSet'
    | join kind = inner (modelData) on scope
    | project-away scope1
    | where isAnomalousNewEntity == 1
    | summarize arg_min(sliceTime, *) by scope, entity
    | extend anomalyType = strcat('newEntity_', entityColumnName), anomalyExplainability = strcat('The ', entityColumnName, ' ', entity, ' wasn\'t seen on ', scopeColumnName, ' ', scope, ' during the last ',  slicesOnScope, ' ', timePeriodBinSize, 's. Previously, ', countKnownEntities
        , ' entities were seen, the last one of them appearing at ', format_datetime(lastNewEntityTimestamp, 'yyyy-MM-dd HH:mm'), '.')
    | join kind = leftouter (entityData | where firstSeenSet == 'trainSet' | extend entityFirstSeens = strcat(entity, ' : ', format_datetime(firstSeenEntity, 'yyyy-MM-dd HH:mm')) | sort by scope, firstSeenEntity asc | summarize anomalyState = make_list(entityFirstSeens) by scope) on scope
    | project-away scope1
);
resultsData
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let detect_anomalous_new_entity_fl = (T:(*), entityColumnName:string, scopeColumnName:string
                                        , timeColumnName:string, startTraining:datetime, startDetection:datetime, endDetection:datetime
                                        , maxEntitiesThresh:int = 60, minTrainingDaysThresh:int = 14, decayParam:real = 0.95, anomalyScoreThresh:real = 0.9)
{
//pre-process the input data by adding standard column names and dividing to datasets
let timePeriodBinSize = 'day';      // we assume a reasonable bin for time is day, so the probability model is built per that bin size
let processedData = (
    T
    | extend scope      = column_ifexists(scopeColumnName, '')
    | extend entity     = column_ifexists(entityColumnName, '')
    | extend sliceTime  = todatetime(column_ifexists(timeColumnName, ''))
    | where isnotempty(scope) and isnotempty(entity) and isnotempty(sliceTime)
    | extend dataSet = case((sliceTime >= startTraining and sliceTime < startDetection), 'trainSet'
                           , sliceTime >= startDetection and sliceTime <= endDetection,  'detectSet'
                                                                                       , 'other')
    | where dataSet in ('trainSet', 'detectSet')
);
// summarize the data by scope and entity. this will be used to create a distribution of entity appearances based on first seen data
let entityData = (
    processedData
    | summarize countRowsEntity = count(), firstSeenEntity = min(sliceTime), lastSeenEntity = max(sliceTime), firstSeenSet = arg_min(sliceTime, dataSet) 
        by scope, entity
    | extend firstSeenSet = dataSet
    | project-away dataSet
);
// aggregate entity data per scope and get the number of entities appearing over time
let aggregatedCandidateScopeData = (
    entityData
    | summarize countRowsScope = sum(countRowsEntity), countEntitiesScope = dcount(entity), countEntitiesScopeInTrain = dcountif(entity, firstSeenSet == 'trainSet')
        , firstSeenScope = min(firstSeenEntity), lastSeenScope = max(lastSeenEntity), hasNewEntities = iff(dcountif(entity,firstSeenSet == 'detectSet') > 0, 1, 0) 
            by scope
    | extend slicesInTrainingScope = datetime_diff(timePeriodBinSize, startDetection, firstSeenScope)
    | where countEntitiesScopeInTrain <= maxEntitiesThresh and slicesInTrainingScope >= minTrainingDaysThresh and lastSeenScope >= startDetection and hasNewEntities == 1
);
let modelData = (
    entityData
    | join kind = inner (aggregatedCandidateScopeData) on scope 
    | where firstSeenSet == 'trainSet'
    | summarize countAddedEntities = dcount(entity), firstSeenScope = min(firstSeenScope), slicesInTrainingScope = max(slicesInTrainingScope), countEntitiesScope = max(countEntitiesScope)
        by scope, firstSeenSet, firstSeenEntity
    | extend diffInDays = datetime_diff(timePeriodBinSize, startDetection, firstSeenEntity)
// adding exponentially decaying weights to counts
    | extend decayingWeight = pow(base = decayParam, exponent = diffInDays)
    | extend decayingValue = countAddedEntities * decayingWeight
    | summarize   newEntityProbability =  round(1 - exp(-1.0 * sum(decayingValue)/max(diffInDays)), 4)
                , countKnownEntities = sum(countAddedEntities), lastNewEntityTimestamp = max(firstSeenEntity), slicesOnScope = max(slicesInTrainingScope)///for explainability
        by scope, firstSeenSet
// anomaly score is based on probability to get no new entities, calculated using Poisson distribution (P(X=0) = exp(-avg)) with added decay on average
    | extend newEntityAnomalyScore = round(1 - newEntityProbability, 4)
    | extend isAnomalousNewEntity = iff(newEntityAnomalyScore >= anomalyScoreThresh, 1, 0)
);
let resultsData = (
    processedData
    | where dataSet == 'detectSet'
    | join kind = inner (modelData) on scope
    | project-away scope1
    | where isAnomalousNewEntity == 1
    | summarize arg_min(sliceTime, *) by scope, entity
    | extend anomalyType = strcat('newEntity_', entityColumnName), anomalyExplainability = strcat('The ', entityColumnName, ' ', entity, ' wasn\'t seen on ', scopeColumnName, ' ', scope, ' during the last ',  slicesOnScope, ' ', timePeriodBinSize, 's. Previously, ', countKnownEntities
        , ' entities were seen, the last one of them appearing at ', format_datetime(lastNewEntityTimestamp, 'yyyy-MM-dd HH:mm'), '.')
    | join kind = leftouter (entityData | where firstSeenSet == 'trainSet' | extend entityFirstSeens = strcat(entity, ' : ', format_datetime(firstSeenEntity, 'yyyy-MM-dd HH:mm')) | sort by scope, firstSeenEntity asc | summarize anomalyState = make_list(entityFirstSeens) by scope) on scope
    | project-away scope1
);
resultsData
};
// synthetic data generation
let detectPeriodStart   = datetime(2022-04-30 05:00:00.0000000);
let trainPeriodStart    = datetime(2022-03-01 05:00);
let names               = pack_array("Admin", "Dev1", "Dev2", "IT-support");
let countNames          = array_length(names);
let testData            = range t from 1 to 24*60 step 1
    | extend timeSlice      = trainPeriodStart + 1h * t
    | extend countEvents    = round(2*rand() + iff((t/24)%7>=5, 10.0, 15.0) - (((t%24)/10)*((t%24)/10)), 2) * 100 // generate a series with weekly seasonality
    | extend userName       = tostring(names[toint(rand(countNames))])
    | extend deviceId       = hash_md5(rand())
    | extend accountName    = iff(((rand() < 0.2) and (timeSlice < detectPeriodStart)), 'testEnvironment', 'prodEnvironment')
    | extend userName       = iff(timeSlice == detectPeriodStart, 'H4ck3r', userName)
    | extend deviceId       = iff(timeSlice == detectPeriodStart, 'abcdefghijklmnoprtuvwxyz012345678', deviceId)
    | sort by timeSlice desc
;
testData
| invoke detect_anomalous_new_entity_fl(entityColumnName    = 'userName'  //principalName for positive, deviceId for negative
                                , scopeColumnName           = 'accountName'
                                , timeColumnName            = 'timeSlice'
                                , startTraining             = trainPeriodStart
                                , startDetection            = detectPeriodStart
                                , endDetection              = detectPeriodStart
                            )

Stored

let detectPeriodStart   = datetime(2022-04-30 05:00:00.0000000);
let trainPeriodStart    = datetime(2022-03-01 05:00);
let names               = pack_array("Admin", "Dev1", "Dev2", "IT-support");
let countNames          = array_length(names);
let testData            = range t from 1 to 24*60 step 1
    | extend timeSlice      = trainPeriodStart + 1h * t
    | extend countEvents    = round(2*rand() + iff((t/24)%7>=5, 10.0, 15.0) - (((t%24)/10)*((t%24)/10)), 2) * 100 // generate a series with weekly seasonality
    | extend userName       = tostring(names[toint(rand(countNames))])
    | extend deviceId       = hash_md5(rand())
    | extend accountName    = iff(((rand() < 0.2) and (timeSlice < detectPeriodStart)), 'testEnvironment', 'prodEnvironment')
    | extend userName       = iff(timeSlice == detectPeriodStart, 'H4ck3r', userName)
    | extend deviceId       = iff(timeSlice == detectPeriodStart, 'abcdefghijklmnoprtuvwxyz012345678', deviceId)
    | sort by timeSlice desc
;
testData
| invoke detect_anomalous_new_entity_fl(entityColumnName    = 'userName'
                                , scopeColumnName           = 'accountName'
                                , timeColumnName            = 'timeSlice'
                                , startTraining             = trainPeriodStart
                                , startDetection            = detectPeriodStart
                                , endDetection              = detectPeriodStart
                            )

Output

scopeentitysliceTimettimeSlicecountEventsuserNamedeviceIdaccountNamedataSetfirstSeenSetnewEntityProbabilitycountKnownEntitieslastNewEntityTimestampslicesOnScopenewEntityAnomalyScoreisAnomalousNewEntityanomalyTypeanomalyExplainabilityanomalyState
prodEnvironmentH4ck3r2022-04-30 05:00:00.000000014402022-04-30 05:00:00.00000001687H4ck3rabcdefghijklmnoprtuvwxyz012345678prodEnvironmentdetectSettrainSet0.003142022-03-01 09:00:00.0000000600.99691newEntity_userNameThe userName H4ck3r wasn’t seen on accountName prodEnvironment during the last 60 days. Previously, four entities were seen, the last one of them appearing at 2022-03-01 09:00.[“IT-support : 2022-03-01 07:00”, “Admin : 2022-03-01 08:00”, “Dev2 : 2022-03-01 09:00”, “Dev1 : 2022-03-01 14:00”]

The output of running the function is the first-seen row in test dataset for each entity per scope, filtered for new entities (meaning they didn’t appear during the training period) that were tagged as anomalous (meaning that entity anomaly score was above anomalyScoreThresh). Some other fields are added for clarity:

  • dataSet: current dataset (is always detectSet).
  • firstSeenSet: dataset in which the scope was first seen (should be ’trainSet’).
  • newEntityProbability: probability to see any new entity based on Poisson model estimation.
  • countKnownEntities: existing entities on scope.
  • lastNewEntityTimestamp: last time a new entity was seen before the anomalous one.
  • slicesOnScope: count of slices per scope.
  • newEntityAnomalyScore: anomaly score was the new entity in range [0, 1], higher values meaning more anomaly.
  • isAnomalousNewEntity: binary flag for anomalous new entities
  • anomalyType: shows the type of anomaly (helpful when running several anomaly detection logics together).
  • anomalyExplainability: textual wrapper for generated anomaly and its explanation.
  • anomalyState: bag of existing entities on scope with their first seen times.

Running this function on user per account with default parameters gets a previously unseen and anomalous user (‘H4ck3r’) with high anomaly score of 0.9969, meaning that this is unexpected (due to small numbers of existing users in training period).

When we run the function with default parameters on deviceId as entity, we won’t see an anomaly, due to large number of existing devices which makes it expected. However, if we lower the parameter anomalyScoreThresh to 0.0001 and raise the parameter to maxEntitiesThresh to 10000, we’ll effectively decrease precision in favor of recall, and detect an anomaly (with a low anomaly score) on device ‘abcdefghijklmnoprtuvwxyz012345678’.

The output shows the anomalous entities together with explanation fields in standardized format. These fields are useful for investigating the anomaly and for running anomalous entity detection on several entities or running other algorithms together.

The suggested usage in cybersecurity context is running the function on meaningful entities - such as usernames or IP addresses - per meaningful scopes - such as subscription on accounts. A detected anomalous new entity means that its appearance isn’t expected on the scope, and might be suspicious.

5.7 - factorial_fl()

This article describes factorial_fl() user-defined function.

Calculate factorial.

The function factorial_fl() is a UDF (user-defined function) that calculates factorial of positive integers (n!). It’s a simple wrapper of the native gamma() function.

Syntax

factorial_fl(n)

Parameters

NameTypeRequiredDescription
nint✔️The input integer for which to calculate the factorial.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let factorial_fl=(n:int)
{
    gamma(n+1)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Stats", docstring = "Calculate factorial")
factorial_fl(n:int)
{
    gamma(n+1)
}

Example

Query-defined

let factorial_fl=(n:int)
{
    gamma(n+1)
};
range x from 1 to 10 step 3
| extend fx = factorial_fl(x)

Stored

range x from 1 to 10 step 3
| extend fx = factorial_fl(x)

Output

xfx
11
424
75040
103628799

5.8 - Functions

This article describes Functions.

Functions are reusable queries or query parts. Kusto supports two kinds of functions:

  • Built-in functions are hard-coded functions defined by Kusto that can’t be modified by users.

  • User-defined functions, which are divided into two types:

    • Stored functions: user-defined functions that are stored and managed database schema entities, similar to tables. For more information, see Stored functions. To create a stored function, use the .create function command.

    • Query-defined functions: user-defined functions that are defined and used within the scope of a single query. The definition of such functions is done through a let statement. For more information on how to create query-defined functions, see Create a user defined function.

    For more information on user-defined functions, see User-defined functions.

5.9 - Functions library

This article describes user-defined functions that extend query environment capabilities.

The following article contains a categorized list of UDF (user-defined functions).

The user-defined functions code is given in the articles. It can be used within a let statement embedded in a query or can be persisted in a database using .create function.

Cybersecurity functions

Function NameDescription
detect_anomalous_new_entity_fl()Detect the appearance of anomalous new entities in timestamped data.
detect_anomalous_spike_fl()Detect the appearance of anomalous spikes in numeric variables in timestamped data.
graph_blast_radius_fl()Calculate the Blast Radius (list and score) of source nodes over path or edge data.
graph_exposure_perimeter_fl()Calculate the Exposure Perimeter (list and score) of target nodes over path or edge data.
graph_path_discovery_fl()Discover valid paths between relevant endpoints (sources and targets) over graph data (edge and nodes).

General functions

Function NameDescription
geoip_fl()Retrieves geographic information of ip address.
get_packages_version_fl()Returns version information of the Python engine and the specified packages.

Machine learning functions

Function NameDescription
dbscan_fl()Clusterize using the DBSCAN algorithm, features are in separate columns.
dbscan_dynamic_fl()Clusterize using the DBSCAN algorithm, features are in a single dynamic column.
kmeans_fl()Clusterize using the K-Means algorithm, features are in separate columns.
kmeans_dynamic_fl()Clusterize using the K-Means algorithm, features are in a single dynamic column.
predict_fl()Predict using an existing trained machine learning model.
predict_onnx_fl()Predict using an existing trained machine learning model in ONNX format.

Plotly functions

The following section contains functions for rendering interactive Plotly charts.

Function NameDescription
plotly_anomaly_fl()Render anomaly chart using a Plotly template.
plotly_gauge_fl()Render gauge chart using a Plotly template.
plotly_scatter3d_fl()Render 3D scatter chart using a Plotly template.

PromQL functions

The following section contains common PromQL functions. These functions can be used for analysis of metrics ingested to your database by the Prometheus monitoring system. All functions assume that metrics in your database are structured using the Prometheus data model.

Function NameDescription
series_metric_fl()Select and retrieve time series stored with the Prometheus data model.
series_rate_fl()Calculate the average rate of counter metric increase per second.

Series processing functions

Function NameDescription
quantize_fl()Quantize metric columns.
series_clean_anomalies_fl()Replace anomalies in a series by interpolated value.
series_cosine_similarity_fl()Calculate the cosine similarity of two numerical vectors.
series_dbl_exp_smoothing_fl()Apply a double exponential smoothing filter on series.
series_dot_product_fl()Calculate the dot product of two numerical vectors.
series_downsample_fl()Downsample time series by an integer factor.
series_exp_smoothing_fl()Apply a basic exponential smoothing filter on series.
series_fit_lowess_fl()Fit a local polynomial to series using LOWESS method.
series_fit_poly_fl()Fit a polynomial to series using regression analysis.
series_fbprophet_forecast_fl()Forecast time series values using the Prophet algorithm.
series_lag_fl()Apply a lag filter on series.
series_monthly_decompose_anomalies_fl()Detect anomalies in a series with monthly seasonality.
series_moving_avg_fl()Apply a moving average filter on series.
series_moving_var_fl()Apply a moving variance filter on series.
series_mv_ee_anomalies_fl()Multivariate Anomaly Detection for series using elliptical envelope model.
series_mv_if_anomalies_fl()Multivariate Anomaly Detection for series using isolation forest model.
series_mv_oc_anomalies_fl()Multivariate Anomaly Detection for series using one class SVM model.
series_rolling_fl()Apply a rolling aggregation function on series.
series_shapes_fl()Detects positive/negative trend or jump in series.
series_uv_anomalies_fl()Detect anomalies in time series using the Univariate Anomaly Detection Cognitive Service API.
series_uv_change_points_fl()Detect change points in time series using the Univariate Anomaly Detection Cognitive Service API.
time_weighted_avg_fl()Calculates the time weighted average of a metric using fill forward interpolation.
time_weighted_avg2_fl()Calculates the time weighted average of a metric using linear interpolation.
time_weighted_val_fl()Calculates the time weighted value of a metric using linear interpolation.
time_window_rolling_avg_fl()Calculates the rolling average of a metric over a constant duration time window.

Statistical and probability functions

Function NameDescription
bartlett_test_fl()Perform the Bartlett test.
binomial_test_fl()Perform the binomial test.
comb_fl()Calculate C(n, k), the number of combinations for selection of k items out of n.
factorial_fl()Calculate n!, the factorial of n.
ks_test_fl()Perform a Kolmogorov Smirnov test.
levene_test_fl()Perform a Levene test.
normality_test_fl()Performs the Normality Test.
mann_whitney_u_test_fl()Perform a Mann-Whitney U Test.
pair_probabilities_fl()Calculate various probabilities and related metrics for a pair of categorical variables.
pairwise_dist_fl()Calculate pairwise distances between entities based on multiple nominal and numerical variables.
percentiles_linear_fl()Calculate percentiles using linear interpolation between closest ranks
perm_fl()Calculate P(n, k), the number of permutations for selection of k items out of n.
two_sample_t_test_fl()Perform the two sample t-test.
wilcoxon_test_fl()Perform the Wilcoxon Test.

Text analytics

Function NameDescription
log_reduce_fl()Find common patterns in textual logs and output a summary table.
log_reduce_full_fl()Find common patterns in textual logs and output a full table.
log_reduce_predict_fl()Apply a trained model to find common patterns in textual logs and output a summary table.
log_reduce_predict_full_fl()Apply a trained model to find common patterns in textual logs and output a full table.
log_reduce_train_fl()Find common patterns in textual logs and output a model.

5.10 - geoip_fl()

Learn how to use the geoip_fl() user-defined function.

geoip_fl() is a user-defined function that retrieves geographic information of ip address.

Syntax

T | invoke geoip_fl(ip_col, country_col, state_col, city_col, longitude_col, latitude_col)

Parameters

NameTypeRequiredDescription
ip_colstring✔️The name of the column containing the IP addresses to resolve.
country_colstring✔️The name of the column to store the retrieved country.
state_colstring✔️The name of the column to store the retrieved state.
city_colstring✔️The name of the column to store the retrieved city.
longitude_colreal✔️The name of the column to store the retrieved longitude.
latitude_colreal✔️The name of the column to store the retrieved latitude.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let geoip_fl=(tbl:(*), ip_col:string, country_col:string, state_col:string, city_col:string, longitude_col:string, latitude_col:string)
{
    let kwargs = bag_pack('ip_col', ip_col, 'country_col', country_col, 'state_col', state_col, 'city_col', city_col, 'longitude_col', longitude_col, 'latitude_col', latitude_col);
    let code= ```if 1:
        from sandbox_utils import Zipackage
        Zipackage.install('geoip2.zip')
        import geoip2.database

        ip_col = kargs['ip_col']
        country_col = kargs['country_col']
        state_col = kargs['state_col']
        city_col = kargs['city_col']
        longitude_col = kargs['longitude_col']
        latitude_col = kargs['latitude_col']
        result=df
        reader = geoip2.database.Reader(r'C:\\Temp\\GeoLite2-City.mmdb')

        def geodata(ip):
            try:
                gd = reader.city(ip)
                geo = pd.Series((gd.country.name, gd.subdivisions.most_specific.name, gd.city.name, gd.location.longitude, gd.location.latitude))
            except:
                geo = pd.Series((None, None, None, None, None))
            return geo
        
        result[[country_col, state_col, city_col, longitude_col, latitude_col]] = result[ip_col].apply(geodata)
        
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs,
        external_artifacts =
        pack('geoip2.zip', 'https://artifactswestus.blob.core.windows.net/public/geoip2-4.6.0.zip',
             'GeoLite2-City.mmdb', 'https://artifactswestus.blob.core.windows.net/public/GeoLite2-City-20230221.mmdb')
        )
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = 'Packages\\Utils', docstring = 'Retrieve geographics of ip address')
geoip_fl(tbl:(*), ip_col:string, country_col:string, state_col:string, city_col:string, longitude_col:string, latitude_col:string)
{
    let kwargs = bag_pack('ip_col', ip_col, 'country_col', country_col, 'state_col', state_col, 'city_col', city_col, 'longitude_col', longitude_col, 'latitude_col', latitude_col);
    let code= ```if 1:
        from sandbox_utils import Zipackage
        Zipackage.install('geoip2.zip')
        import geoip2.database

        ip_col = kargs['ip_col']
        country_col = kargs['country_col']
        state_col = kargs['state_col']
        city_col = kargs['city_col']
        longitude_col = kargs['longitude_col']
        latitude_col = kargs['latitude_col']
        result=df
        reader = geoip2.database.Reader(r'C:\\Temp\\GeoLite2-City.mmdb')

        def geodata(ip):
            try:
                gd = reader.city(ip)
                geo = pd.Series((gd.country.name, gd.subdivisions.most_specific.name, gd.city.name, gd.location.longitude, gd.location.latitude))
            except:
                geo = pd.Series((None, None, None, None, None))
            return geo
        
        result[[country_col, state_col, city_col, longitude_col, latitude_col]] = result[ip_col].apply(geodata)
        
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs,
        external_artifacts =
        pack('geoip2.zip', 'https://artifactswestus.blob.core.windows.net/public/geoip2-4.6.0.zip',
             'GeoLite2-City.mmdb', 'https://artifactswestus.blob.core.windows.net/public/GeoLite2-City-20230221.mmdb')
        )
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let geoip_fl=(tbl:(*), ip_col:string, country_col:string, state_col:string, city_col:string, longitude_col:string, latitude_col:string)
{
    let kwargs = bag_pack('ip_col', ip_col, 'country_col', country_col, 'state_col', state_col, 'city_col', city_col, 'longitude_col', longitude_col, 'latitude_col', latitude_col);
    let code= ```if 1:
        from sandbox_utils import Zipackage
        Zipackage.install('geoip2.zip')
        import geoip2.database

        ip_col = kargs['ip_col']
        country_col = kargs['country_col']
        state_col = kargs['state_col']
        city_col = kargs['city_col']
        longitude_col = kargs['longitude_col']
        latitude_col = kargs['latitude_col']
        result=df
        reader = geoip2.database.Reader(r'C:\\Temp\\GeoLite2-City.mmdb')

        def geodata(ip):
            try:
                gd = reader.city(ip)
                geo = pd.Series((gd.country.name, gd.subdivisions.most_specific.name, gd.city.name, gd.location.longitude, gd.location.latitude))
            except:
                geo = pd.Series((None, None, None, None, None))
            return geo
        
        result[[country_col, state_col, city_col, longitude_col, latitude_col]] = result[ip_col].apply(geodata)
        
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs,
        external_artifacts =
        pack('geoip2.zip', 'https://artifactswestus.blob.core.windows.net/public/geoip2-4.6.0.zip',
             'GeoLite2-City.mmdb', 'https://artifactswestus.blob.core.windows.net/public/GeoLite2-City-20230221.mmdb')
        )
};
datatable(ip:string) [
'8.8.8.8',
'20.53.203.50',
'20.81.111.85',
'20.103.85.33',
'20.84.181.62',
'205.251.242.103',
]
| extend country='', state='', city='', longitude=real(null), latitude=real(null)
| invoke geoip_fl('ip','country', 'state', 'city', 'longitude', 'latitude')

Stored

datatable(ip:string) [
'8.8.8.8',
'20.53.203.50',
'20.81.111.85',
'20.103.85.33',
'20.84.181.62',
'205.251.242.103',
]
| extend country='', state='', city='', longitude=real(null), latitude=real(null)
| invoke geoip_fl('ip','country', 'state', 'city', 'longitude', 'latitude')

Output

ipcountrystatecitylongitudelatitude
20.103.85.33NetherlandsNorth HollandAmsterdam4.888352.3716
20.53.203.50AustraliaNew South WalesSydney151.2006-33.8715
20.81.111.85United StatesVirginiaTappahannock-76.854537.9273
20.84.181.62United StatesIowaDes Moines-93.612441.6021
205.251.242.103United StatesVirginiaAshburn-77.490339.0469
8.8.8.8United StatesCaliforniaLos Angeles-118.244134.0544

5.11 - get_packages_version_fl()

Learn how to use the get_packages_version_fl() user-defined function.

get_packages_version_fl() is a user-defined function that retrieves the versions of the Python engine and packages of the inline python() plugin.

The function accepts a dynamic array containing the names of the packages to check, and returns their respective versions and the Python engine version.

Syntax

T | invoke get_packages_version_fl(packages)

Parameters

NameTypeRequiredDescription
packagesdynamicA dynamic array containing the names of the packages. Default is empty list to retrieve only the Python engine version.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let get_packages_version_fl = (packages:dynamic=dynamic([]))
{
    let kwargs = pack('packages', packages);
    let code =
    ```if 1:
        import importlib
        import sys
        
        packages = kargs["packages"]
        result = pd.DataFrame(columns=["name", "ver"])
        for i in range(len(packages)):
            result.loc[i, "name"] = packages[i]
            try:
                m = importlib.import_module(packages[i])
                result.loc[i, "ver"] = m.__version__ if hasattr(m, "__version__") else "missing __version__ attribute"
            except Exception as ex:
                result.loc[i, "ver"] = "ERROR: " + (ex.msg if hasattr(ex, "msg") else "exception, no msg")
        id = result.shape[0]
        result.loc[id, "name"] = "Python"
        result.loc[id, "ver"] = sys.version
    ```;
    print 1
    | evaluate python(typeof(name:string , ver:string), code, kwargs)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Utils", docstring = "Returns version information of the Python engine and the specified packages")
get_packages_version_fl(packages:dynamic=dynamic([]))
{
    let kwargs = pack('packages', packages);
    let code =
    ```if 1:
        import importlib
        import sys
        
        packages = kargs["packages"]
        result = pd.DataFrame(columns=["name", "ver"])
        for i in range(len(packages)):
            result.loc[i, "name"] = packages[i]
            try:
                m = importlib.import_module(packages[i])
                result.loc[i, "ver"] = m.__version__ if hasattr(m, "__version__") else "missing __version__ attribute"
            except Exception as ex:
                result.loc[i, "ver"] = "ERROR: " + (ex.msg if hasattr(ex, "msg") else "exception, no msg")
        id = result.shape[0]
        result.loc[id, "name"] = "Python"
        result.loc[id, "ver"] = sys.version
    ```;
    print 1
    | evaluate python(typeof(name:string , ver:string), code, kwargs)
}

Example

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let get_packages_version_fl = (packages:dynamic=dynamic([]))
{
    let kwargs = pack('packages', packages);
    let code =
    ```if 1:
        import importlib
        import sys
        
        packages = kargs["packages"]
        result = pd.DataFrame(columns=["name", "ver"])
        for i in range(len(packages)):
            result.loc[i, "name"] = packages[i]
            try:
                m = importlib.import_module(packages[i])
                result.loc[i, "ver"] = m.__version__ if hasattr(m, "__version__") else "missing __version__ attribute"
            except Exception as ex:
                result.loc[i, "ver"] = "ERROR: " + (ex.msg if hasattr(ex, "msg") else "exception, no msg")
        id = result.shape[0]
        result.loc[id, "name"] = "Python"
        result.loc[id, "ver"] = sys.version
    ```;
    print 1
    | evaluate python(typeof(name:string , ver:string), code, kwargs)
};
get_packages_version_fl(pack_array('numpy', 'scipy', 'pandas', 'statsmodels', 'sklearn', 'onnxruntime', 'plotly'))

Stored

get_packages_version_fl(pack_array('numpy', 'scipy', 'pandas', 'statsmodels', 'sklearn', 'onnxruntime', 'plotly'))

Output

namever
numpy1.23.4
onnxruntime1.13.1
pandas1.5.1
plotly5.11.0
Python3.10.8 (tags/v3.10.8:aaaf517, Oct 11 2022, 16:50:30) [MSC v.1933 64 bit (AMD64)]
scipy1.9.3
sklearn1.1.3
statsmodels0.13.2

5.12 - kmeans_dynamic_fl()

This article describes the kmeans_dynamic_fl() user-defined function.

The function kmeans_dynamic_fl() is a UDF (user-defined function) that clusterizes a dataset using the k-means algorithm. This function is similar to kmeans_fl() just the features are supplied by a single numerical array column and not by multiple scalar columns.

Syntax

T | invoke kmeans_dynamic_fl(k, features_col, cluster_col)

Parameters

NameTypeRequiredDescription
kint✔️The number of clusters.
features_colstring✔️The name of the column containing the numeric array of features to be used for clustering.
cluster_colstring✔️The name of the column to store the output cluster ID for each record.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let kmeans_dynamic_fl=(tbl:(*),k:int, features_col:string, cluster_col:string)
{
    let kwargs = bag_pack('k', k, 'features_col', features_col, 'cluster_col', cluster_col);
    let code = ```if 1:

        from sklearn.cluster import KMeans

        k = kargs["k"]
        features_col = kargs["features_col"]
        cluster_col = kargs["cluster_col"]

        df1 = df[features_col].apply(np.array)
        matrix = np.vstack(df1.values)
        kmeans = KMeans(n_clusters=k, random_state=0)
        kmeans.fit(matrix)
        result = df
        result[cluster_col] = kmeans.labels_
    ```;
    tbl
    | evaluate python(typeof(*),code, kwargs)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\ML", docstring = "K-Means clustering of features passed as a single column containing numerical array")
kmeans_dynamic_fl(tbl:(*),k:int, features_col:string, cluster_col:string)
{
    let kwargs = bag_pack('k', k, 'features_col', features_col, 'cluster_col', cluster_col);
    let code = ```if 1:

        from sklearn.cluster import KMeans

        k = kargs["k"]
        features_col = kargs["features_col"]
        cluster_col = kargs["cluster_col"]

        df1 = df[features_col].apply(np.array)
        matrix = np.vstack(df1.values)
        kmeans = KMeans(n_clusters=k, random_state=0)
        kmeans.fit(matrix)
        result = df
        result[cluster_col] = kmeans.labels_
    ```;
    tbl
    | evaluate python(typeof(*),code, kwargs)
}

Example

The following example uses the invoke operator to run the function.

Clustering of artificial dataset with three clusters

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let kmeans_dynamic_fl=(tbl:(*),k:int, features_col:string, cluster_col:string)
{
    let kwargs = bag_pack('k', k, 'features_col', features_col, 'cluster_col', cluster_col);
    let code = ```if 1:

        from sklearn.cluster import KMeans

        k = kargs["k"]
        features_col = kargs["features_col"]
        cluster_col = kargs["cluster_col"]

        df1 = df[features_col].apply(np.array)
        matrix = np.vstack(df1.values)
        kmeans = KMeans(n_clusters=k, random_state=0)
        kmeans.fit(matrix)
        result = df
        result[cluster_col] = kmeans.labels_
    ```;
    tbl
    | evaluate python(typeof(*),code, kwargs)
};
union 
(range x from 1 to 100 step 1 | extend x=rand()+3, y=rand()+2),
(range x from 101 to 200 step 1 | extend x=rand()+1, y=rand()+4),
(range x from 201 to 300 step 1 | extend x=rand()+2, y=rand()+6)
| project Features=pack_array(x, y), cluster_id=int(null)
| invoke kmeans_dynamic_fl(3, "Features", "cluster_id")
| extend x=toreal(Features[0]), y=toreal(Features[1])
| render scatterchart with(series=cluster_id)

Stored

union 
(range x from 1 to 100 step 1 | extend x=rand()+3, y=rand()+2),
(range x from 101 to 200 step 1 | extend x=rand()+1, y=rand()+4),
(range x from 201 to 300 step 1 | extend x=rand()+2, y=rand()+6)
| project Features=pack_array(x, y), cluster_id=int(null)
| invoke kmeans_dynamic_fl(3, "Features", "cluster_id")
| extend x=toreal(Features[0]), y=toreal(Features[1])
| render scatterchart with(series=cluster_id)

Screenshot of scatterchart of K-Means clustering of artificial dataset with three clusters.

5.13 - kmeans_fl()

This article describes the kmeans_fl() user-defined function.

The function kmeans_fl() is a UDF (user-defined function) that clusterizes a dataset using the k-means algorithm.

Syntax

T | invoke kmeans_fl(k, features, cluster_col)

Parameters

NameTypeRequiredDescription
kint✔️The number of clusters.
featuresdynamic✔️An array containing the names of the features columns to use for clustering.
cluster_colstring✔️The name of the column to store the output cluster ID for each record.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let kmeans_fl=(tbl:(*), k:int, features:dynamic, cluster_col:string)
{
    let kwargs = bag_pack('k', k, 'features', features, 'cluster_col', cluster_col);
    let code = ```if 1:

        from sklearn.cluster import KMeans

        k = kargs["k"]
        features = kargs["features"]
        cluster_col = kargs["cluster_col"]

        km = KMeans(n_clusters=k)
        df1 = df[features]
        km.fit(df1)
        result = df
        result[cluster_col] = km.labels_
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create function with (folder = "Packages\\ML", docstring = "K-Means clustering")
kmeans_fl(tbl:(*), k:int, features:dynamic, cluster_col:string)
{
    let kwargs = bag_pack('k', k, 'features', features, 'cluster_col', cluster_col);
    let code = ```if 1:

        from sklearn.cluster import KMeans

        k = kargs["k"]
        features = kargs["features"]
        cluster_col = kargs["cluster_col"]

        km = KMeans(n_clusters=k)
        df1 = df[features]
        km.fit(df1)
        result = df
        result[cluster_col] = km.labels_
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
}

Example

The following example uses the invoke operator to run the function.

Clusterize artificial dataset with three clusters

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let kmeans_fl=(tbl:(*), k:int, features:dynamic, cluster_col:string)
{
    let kwargs = bag_pack('k', k, 'features', features, 'cluster_col', cluster_col);
    let code = ```if 1:

        from sklearn.cluster import KMeans

        k = kargs["k"]
        features = kargs["features"]
        cluster_col = kargs["cluster_col"]

        km = KMeans(n_clusters=k)
        df1 = df[features]
        km.fit(df1)
        result = df
        result[cluster_col] = km.labels_
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
};
OccupancyDetection
| extend cluster_id=int(null)
union
(range x from 1 to 100 step 1 | extend x=rand()+3, y=rand()+2),
(range x from 101 to 200 step 1 | extend x=rand()+1, y=rand()+4),
(range x from 201 to 300 step 1 | extend x=rand()+2, y=rand()+6)
| invoke kmeans_fl(3, bag_pack("x", "y"), "cluster_id")
| render scatterchart with(series=cluster_id)

Stored

union 
(range x from 1 to 100 step 1 | extend x=rand()+3, y=rand()+2),
(range x from 101 to 200 step 1 | extend x=rand()+1, y=rand()+4),
(range x from 201 to 300 step 1 | extend x=rand()+2, y=rand()+6)
| invoke kmeans_fl(3, bag_pack("x", "y"), "cluster_id")
| render scatterchart with(series=cluster_id)

Screenshot of scatterchart of K-Means clustering of artificial dataset with three clusters.

5.14 - ks_test_fl()

This article describes the ks_test_fl() user-defined function.

The function ks_test_fl() is a UDF (user-defined function) that performs the Kolmogorov Smirnov Test.

Syntax

T | invoke ks_test_fl(data1, data2, test_statistic,p_value)

Parameters

NameTypeRequiredDescription
data1string✔️The name of the column containing the first set of data to be used for the test.
data2string✔️The name of the column containing the second set of data to be used for the test.
test_statisticstring✔️The name of the column to store test statistic value for the results.
p_valuestring✔️The name of the column to store p-value for the results.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let ks_test_fl = (tbl:(*), data1:string, data2:string, test_statistic:string, p_value:string)
{
    let kwargs = bag_pack('data1', data1, 'data2', data2, 'test_statistic', test_statistic, 'p_value', p_value);
    let code = ```if 1:
        from scipy import stats
        data1 = kargs["data1"]
        data2 = kargs["data2"]
        test_statistic = kargs["test_statistic"]
        p_value = kargs["p_value"]
        def func(row):
            statistics = stats.ks_2samp(row[data1], row[data2])
            return statistics[0], statistics[1]
        result = df
        result[[test_statistic, p_value]]  = df.apply(func, axis=1, result_type = "expand")
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Stats", docstring = "Kolmogorov Smirnov Test")
ks_test_fl(tbl:(*), data1:string, data2:string, test_statistic:string, p_value:string)
{
    let kwargs = bag_pack('data1', data1, 'data2', data2, 'test_statistic', test_statistic, 'p_value', p_value);
    let code = ```if 1:
        from scipy import stats
        data1 = kargs["data1"]
        data2 = kargs["data2"]
        test_statistic = kargs["test_statistic"]
        p_value = kargs["p_value"]
        def func(row):
            statistics = stats.ks_2samp(row[data1], row[data2])
            return statistics[0], statistics[1]
        result = df
        result[[test_statistic, p_value]]  = df.apply(func, axis=1, result_type = "expand")
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let ks_test_fl = (tbl:(*), data1:string, data2:string, test_statistic:string, p_value:string)
{
    let kwargs = bag_pack('data1', data1, 'data2', data2, 'test_statistic', test_statistic, 'p_value', p_value);
    let code = ```if 1:
        from scipy import stats
        data1 = kargs["data1"]
        data2 = kargs["data2"]
        test_statistic = kargs["test_statistic"]
        p_value = kargs["p_value"]
        def func(row):
            statistics = stats.ks_2samp(row[data1], row[data2])
            return statistics[0], statistics[1]
        result = df
        result[[test_statistic, p_value]]  = df.apply(func, axis=1, result_type = "expand")
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
};
datatable(id:string, sample1:dynamic, sample2:dynamic) [
'Test #1', dynamic([23.64, 20.57, 20.42]), dynamic([27.1, 22.12, 33.56]),
'Test #2', dynamic([20.85, 21.89, 23.41]), dynamic([35.09, 30.02, 26.52]),
'Test #3', dynamic([20.13, 20.5, 21.7, 22.02]), dynamic([32.2, 32.79, 33.9, 34.22])
]
| extend test_stat= 0.0, p_val = 0.0
| invoke ks_test_fl('sample1', 'sample2', 'test_stat', 'p_val')

Stored

datatable(id:string, sample1:dynamic, sample2:dynamic) [
'Test #1', dynamic([23.64, 20.57, 20.42]), dynamic([27.1, 22.12, 33.56]),
'Test #2', dynamic([20.85, 21.89, 23.41]), dynamic([35.09, 30.02, 26.52]),
'Test #3', dynamic([20.13, 20.5, 21.7, 22.02]), dynamic([32.2, 32.79, 33.9, 34.22])
]
| extend test_stat= 0.0, p_val = 0.0
| invoke ks_test_fl('sample1', 'sample2', 'test_stat', 'p_val')

Output

idsample1sample2test_statp_val
Test #1[23.64, 20.57, 20.42][27.1, 22.12, 33.56]0.666666666666666740.3197243332709643
Test #2[20.85, 21.89, 23.41][35.09, 30.02, 26.52]10.03262165165202116
Test #3[20.13, 20.5, 21.7, 22.02][32.2, 32.79, 33.9, 34.22]10.01106563701580386

5.15 - levene_test_fl()

This article describes the levene_test_fl() user-defined function.

The function levene_test_fl() is a UDF (user-defined function) that performs the Levene Test.

Syntax

T | invoke levene_test_fl(data1, data2, test_statistic,p_value)

Parameters

NameTypeRequiredDescription
data1string✔️The name of the column containing the first set of data to be used for the test.
data2string✔️The name of the column containing the second set of data to be used for the test.
test_statisticstring✔️The name of the column to store test statistic value for the results.
p_valuestring✔️The name of the column to store p-value for the results.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

<!-- let levene_test_fl = (tbl:(*), data1:string, data2:string, test_statistic:string, p_value:string)
{
    let kwargs = bag_pack('data1', data1, 'data2', data2, 'test_statistic', test_statistic, 'p_value', p_value);
    let code = ```if 1:
        from scipy import stats
        data1 = kargs["data1"]
        data2 = kargs["data2"]
        test_statistic = kargs["test_statistic"]
        p_value = kargs["p_value"]
        def func(row):
            statistics = stats.levene(row[data1], row[data2])
            return statistics[0], statistics[1]
        result = df
        result[[test_statistic, p_value]]  = df.apply(func, axis=1, result_type = "expand")
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Stats", docstring = "Levene Test")
levene_test_fl(tbl:(*), data1:string, data2:string, test_statistic:string, p_value:string)
{
    let kwargs = bag_pack('data1', data1, 'data2', data2, 'test_statistic', test_statistic, 'p_value', p_value);
    let code = ```if 1:
        from scipy import stats
        data1 = kargs["data1"]
        data2 = kargs["data2"]
        test_statistic = kargs["test_statistic"]
        p_value = kargs["p_value"]
        def func(row):
            statistics = stats.levene(row[data1], row[data2])
            return statistics[0], statistics[1]
        result = df
        result[[test_statistic, p_value]]  = df.apply(func, axis=1, result_type = "expand")
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

<!-- let levene_test_fl = (tbl:(*), data1:string, data2:string, test_statistic:string, p_value:string)
{
    let kwargs = bag_pack('data1', data1, 'data2', data2, 'test_statistic', test_statistic, 'p_value', p_value);
    let code = ```if 1:
        from scipy import stats
        data1 = kargs["data1"]
        data2 = kargs["data2"]
        test_statistic = kargs["test_statistic"]
        p_value = kargs["p_value"]
        def func(row):
            statistics = stats.levene(row[data1], row[data2])
            return statistics[0], statistics[1]
        result = df
        result[[test_statistic, p_value]]  = df.apply(func, axis=1, result_type = "expand")
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
};
datatable(id:string, sample1:dynamic, sample2:dynamic) [
'Test #1', dynamic([23.64, 20.57, 20.42]), dynamic([27.1, 22.12, 33.56]),
'Test #2', dynamic([20.85, 21.89, 23.41]), dynamic([35.09, 30.02, 26.52]),
'Test #3', dynamic([20.13, 20.5, 21.7, 22.02]), dynamic([32.2, 32.79, 33.9, 34.22])
]
| extend test_stat= 0.0, p_val = 0.0
| invoke levene_test_fl('sample1', 'sample2', 'test_stat', 'p_val')

Stored

datatable(id:string, sample1:dynamic, sample2:dynamic) [
'Test #1', dynamic([23.64, 20.57, 20.42]), dynamic([27.1, 22.12, 33.56]),
'Test #2', dynamic([20.85, 21.89, 23.41]), dynamic([35.09, 30.02, 26.52]),
'Test #3', dynamic([20.13, 20.5, 21.7, 22.02]), dynamic([32.2, 32.79, 33.9, 34.22])
]
| extend test_stat= 0.0, p_val = 0.0
| invoke levene_test_fl('sample1', 'sample2', 'test_stat', 'p_val')

Output

idsample1sample2test_statp_val
Test #1[23.64, 20.57, 20.42][27.1, 22.12, 33.56]1.55873959873673870.27993504690044563
Test #2[20.85, 21.89, 23.41][35.09, 30.02, 26.52]1.64024957881304820.26950872948841353
Test #3[20.13, 20.5, 21.7, 22.02][32.2, 32.79, 33.9, 34.22]0.00329896907216423950.95606240301049072

5.16 - log_reduce_fl()

Learn how to use the log_reduce_fl() function to find common patterns in semi-structured textual columns.

The function log_reduce_fl() finds common patterns in semi-structured textual columns, such as log lines, and clusters the lines according to the extracted patterns. It outputs a summary table containing the found patterns sorted top down by their respective frequency.

Syntax

T | invoke log_reduce_fl(reduce_col [, use_logram [, use_drain [, custom_regexes [, custom_regexes_policy [, delimiters [, similarity_th [, tree_depth [, trigram_th [, bigram_th ]]]]]]]]])

Parameters

The following parameters description is a summary. For more information, see More about the algorithm section.

NameTypeRequiredDescription
reduce_colstring✔️The name of the string column the function is applied to.
use_logramboolEnable or disable the Logram algorithm. Default value is true.
use_drainboolEnable or disable the Drain algorithm. Default value is true.
custom_regexesdynamicA dynamic array containing pairs of regular expression and replacement symbols to be searched in each input row, and replaced with their respective matching symbol. Default value is dynamic([]). The default regex table replaces numbers, IP addresses, and GUIDs.
custom_regexes_policystringEither ‘prepend’, ‘append’ or ‘replace’. Controls whether custom_regexes are prepend/append/replace the default ones. Default value is ‘prepend’.
delimitersdynamicA dynamic array containing delimiter strings. Default value is dynamic([" "]), defining space as the only single character delimiter.
similarity_threalSimilarity threshold, used by the Drain algorithm. Increasing similarity_th results in more refined databases. Default value is 0.5. If Drain is disabled, then this parameter has no effect.
tree_depthintIncreasing tree_depth improves the runtime of the Drain algorithm, but might reduce its accuracy. Default value is 4. If Drain is disabled, then this parameter has no effect.
trigram_thintDecreasing trigram_th increases the chances of Logram to replace tokens with wildcards. Default value is 10. If Logram is disabled, then this parameter has no effect.
bigram_thintDecreasing bigram_th increases the chances of Logram to replace tokens with wildcards. Default value is 15. If Logram is disabled, then this parameter has no effect.

More about the algorithm

The function runs multiples passes over the rows to be reduced to common patterns. The following list explains the passes:

  • Regular expression replacements: In this pass, each line is independently matched to a set of regular expressions, and each matched expression is replaced by a replacement symbol. The default regular expressions replace IP addresses, numbers, and GUIDs with /<IP>, <GUID> and /<NUM>. The user can prepend/append more regular expressions to those, or replace it with new ones or empty list by modifying custom_regexes and custom_regexes_policy. For example, to replace whole numbers with <WNUM> set custom_regexes=pack_array(’/^\d+$/’, ‘<WNUM>’); to cancel regular expressions replacement set custom_regexes_policy=‘replace’. For each line, the function keeps list of the original expressions (before replacements) to be output as parameters of the generic replacement tokens.

  • Tokenization: similar to the previous step, each line is processed independently and broken into tokens based on set of delimiters. For example, to define breaking to tokens by either comma, period or semicolon set delimiters=pack_array(’,’, ‘.’, ‘;’).

  • Apply Logram algorithm: this pass is optional, pending use_logram is true. We recommend using Logram when large scale is required, and when parameters can appear in the first tokens of the log entry. OTOH, disable it when the log entries are short, as the algorithm tends to replace tokens with wildcards too often in such cases. The Logram algorithm considers 3-tuples and 2-tuples of tokens. If a 3-tuple of tokens is common in the log lines (it appears more than trigram_th times), then it’s likely that all three tokens are part of the pattern. If the 3-tuple is rare, then it’s likely that it contains a variable that should be replaced by a wildcard. For rare 3-tuples, we consider the frequency with which 2-tuples contained in the 3-tuple appear. If a 2-tuple is common (it appears more than bigram_th times), then the remaining token is likely to be a parameter, and not part of the pattern.
    The Logram algorithm is easy to parallelize. It requires two passes on the log corpus: the first one to count the frequency of each 3-tuple and 2-tuple, and the second one to apply the logic previously described to each entry. To parallelize the algorithm, we only need to partition the log entries, and unify the frequency counts of different workers.

  • Apply Drain algorithm: this pass is optional, pending use_drain is true. Drain is a log parsing algorithm based on a truncated depth prefix tree. Log messages are split according to their length, and for each length the first tree_depth tokens of the log message are used to build a prefix tree. If no match for the prefix tokens was found, a new branch is created. If a match for the prefix was found, we search for the most similar pattern among the patterns contained in the tree leaf. Pattern similarity is measured by the ratio of matched nonwildcard tokens out of all tokens. If the similarity of the most similar pattern is above the similarity threshold (the parameter similarity_th), then the log entry is matched to the pattern. For that pattern, the function replaces all nonmatching tokens by wildcards. If the similarity of the most similar pattern is below the similarity threshold, a new pattern containing the log entry is created.
    We set default tree_depth to 4 based on testing various logs. Increasing this depth can improve runtime but might degrade patterns accuracy; decreasing it’s more accurate but slower, as each node performs many more similarity tests.
    Usually, Drain efficiently generalizes and reduces patterns (though it’s hard to be parallelized). However, as it relies on a prefix tree, it might not be optimal in log entries containing parameters in the first tokens. This can be resolved in most cases by applying Logram first.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let log_reduce_fl=(tbl:(*), reduce_col:string,
              use_logram:bool=True, use_drain:bool=True, custom_regexes: dynamic = dynamic([]), custom_regexes_policy: string = 'prepend',
              delimiters:dynamic = dynamic(' '), similarity_th:double=0.5, tree_depth:int = 4, trigram_th:int=10, bigram_th:int=15)
{
    let default_regex_table = pack_array('(/|)([0-9]+\\.){3}[0-9]+(:[0-9]+|)(:|)', '<IP>', 
                                         '([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})', '<GUID>', 
                                         '(?<=[^A-Za-z0-9])(\\-?\\+?\\d+)(?=[^A-Za-z0-9])|[0-9]+$', '<NUM>');
    let kwargs = bag_pack('reduced_column', reduce_col, 'delimiters', delimiters,'output_column', 'LogReduce', 'parameters_column', '', 
                          'trigram_th', trigram_th, 'bigram_th', bigram_th, 'default_regexes', default_regex_table, 
                          'custom_regexes', custom_regexes, 'custom_regexes_policy', custom_regexes_policy, 'tree_depth', tree_depth, 'similarity_th', similarity_th, 
                          'use_drain', use_drain, 'use_logram', use_logram, 'save_regex_tuples_in_output', True, 'regex_tuples_column', 'RegexesColumn', 
                          'output_type', 'summary');
    let code = ```if 1:
        from log_cluster import log_reduce
        result = log_reduce.log_reduce(df, kargs)
    ```;
    tbl
    | extend LogReduce=''
    | evaluate python(typeof(Count:int, LogReduce:string, example:string), code, kwargs)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = 'Packages\\Text', docstring = 'Find common patterns in textual logs, output a summary table')
log_reduce_fl(tbl:(*), reduce_col:string,
              use_logram:bool=True, use_drain:bool=True, custom_regexes: dynamic = dynamic([]), custom_regexes_policy: string = 'prepend',
              delimiters:dynamic = dynamic(' '), similarity_th:double=0.5, tree_depth:int = 4, trigram_th:int=10, bigram_th:int=15)
{
    let default_regex_table = pack_array('(/|)([0-9]+\\.){3}[0-9]+(:[0-9]+|)(:|)', '<IP>', 
                                         '([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})', '<GUID>', 
                                         '(?<=[^A-Za-z0-9])(\\-?\\+?\\d+)(?=[^A-Za-z0-9])|[0-9]+$', '<NUM>');
    let kwargs = bag_pack('reduced_column', reduce_col, 'delimiters', delimiters,'output_column', 'LogReduce', 'parameters_column', '', 
                          'trigram_th', trigram_th, 'bigram_th', bigram_th, 'default_regexes', default_regex_table, 
                          'custom_regexes', custom_regexes, 'custom_regexes_policy', custom_regexes_policy, 'tree_depth', tree_depth, 'similarity_th', similarity_th, 
                          'use_drain', use_drain, 'use_logram', use_logram, 'save_regex_tuples_in_output', True, 'regex_tuples_column', 'RegexesColumn', 
                          'output_type', 'summary');
    let code = ```if 1:
        from log_cluster import log_reduce
        result = log_reduce.log_reduce(df, kargs)
    ```;
    tbl
    | extend LogReduce=''
    | evaluate python(typeof(Count:int, LogReduce:string, example:string), code, kwargs)
}

Example

The following example uses the invoke operator to run the function. This example uses Apache Hadoop distributed file system logs.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let log_reduce_fl=(tbl:(*), reduce_col:string,
              use_logram:bool=True, use_drain:bool=True, custom_regexes: dynamic = dynamic([]), custom_regexes_policy: string = 'prepend',
              delimiters:dynamic = dynamic(' '), similarity_th:double=0.5, tree_depth:int = 4, trigram_th:int=10, bigram_th:int=15)
{
    let default_regex_table = pack_array('(/|)([0-9]+\\.){3}[0-9]+(:[0-9]+|)(:|)', '<IP>', 
                                         '([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})', '<GUID>', 
                                         '(?<=[^A-Za-z0-9])(\\-?\\+?\\d+)(?=[^A-Za-z0-9])|[0-9]+$', '<NUM>');
    let kwargs = bag_pack('reduced_column', reduce_col, 'delimiters', delimiters,'output_column', 'LogReduce', 'parameters_column', '', 
                          'trigram_th', trigram_th, 'bigram_th', bigram_th, 'default_regexes', default_regex_table, 
                          'custom_regexes', custom_regexes, 'custom_regexes_policy', custom_regexes_policy, 'tree_depth', tree_depth, 'similarity_th', similarity_th, 
                          'use_drain', use_drain, 'use_logram', use_logram, 'save_regex_tuples_in_output', True, 'regex_tuples_column', 'RegexesColumn', 
                          'output_type', 'summary');
    let code = ```if 1:
        from log_cluster import log_reduce
        result = log_reduce.log_reduce(df, kargs)
    ```;
    tbl
    | extend LogReduce=''
    | evaluate python(typeof(Count:int, LogReduce:string, example:string), code, kwargs)
};
//
// Finding common patterns in HDFS logs, a commonly used benchmark for log parsing
//
HDFS_log
| take 100000
| invoke log_reduce_fl(reduce_col="data")

Stored

//
// Finding common patterns in HDFS logs, a commonly used benchmark for log parsing
//
HDFS_log
| take 100000
| invoke log_reduce_fl(reduce_col="data")

Output

CountLogReduceExample
55356081110<NUM> <NUM> INFO dfs.FSNamesystem: BLOCK* NameSystem.delete: blk_<NUM> is added to invalidSet of <IP> 081110 220623 26 INFO dfs.FSNamesystem: BLOCK* NameSystem.delete: blk_1239016582509138045 is added to invalidSet of 10.251.123.195:50010
10278081110<NUM> <NUM> INFO dfs.FSNamesystem: BLOCK* NameSystem.addStoredBlock: blockMap updated: <IP> is added to blk_<NUM> size <NUM> 081110 215858 27 INFO dfs.FSNamesystem: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.250.11.85:50010 is added to blk_5080254298708411681 size 67108864
10256081110<NUM> <NUM> INFO dfs.DataNode$PacketResponder: PacketResponder <NUM> for block blk_<NUM> terminating 081110 215858 15496 INFO dfs.DataNode$PacketResponder: PacketResponder 2 for block blk_-7746692545918257727 terminating
10256081110<NUM> <NUM> INFO dfs.DataNode$PacketResponder: Received block blk_<NUM> of size <NUM> from <IP> 081110 215858 15485 INFO dfs.DataNode$PacketResponder: Received block blk_5080254298708411681 of size 67108864 from /10.251.43.21
9140081110<NUM> <NUM> INFO dfs.DataNode$DataXceiver: Receiving block blk_<NUM> src: <IP> dest: <IP> 081110 215858 15494 INFO dfs.DataNode$DataXceiver: Receiving block blk_-7037346755429293022 src: /10.251.43.21:45933 dest: /10.251.43.21:50010
3047081110<NUM> <NUM> INFO dfs.FSNamesystem: BLOCK* NameSystem.allocateBlock: /user/root/rand3/temporary/task<NUM><NUM>m<NUM>_<NUM>/part-<NUM>. <> 081110 215858 26 INFO dfs.FSNamesystem: BLOCK NameSystem.allocateBlock: /user/root/rand3/_temporary/task_200811101024_0005_m_001805_0/part-01805. blk-7037346755429293022
1402081110<NUM> <NUM> INFO <>: <> block blk_<NUM> <> <> 081110 215957 15556 INFO dfs.DataNode$DataTransfer: 10.250.15.198:50010:Transmitted block blk_-3782569120714539446 to /10.251.203.129:50010
177081110<NUM> <NUM> INFO <>: <> <> <> <*> 081110 215859 13 INFO dfs.DataBlockScanner: Verification succeeded for blk_-7244926816084627474
36081110<NUM> <NUM> INFO <>: <> <> <> for block <*> 081110 215924 15636 INFO dfs.DataNode$BlockReceiver: Receiving empty packet for block blk_3991288654265301939
12081110<NUM> <NUM> INFO dfs.FSNamesystem: BLOCK* <> <> <> <> <> <> <> <> 081110 215953 19 INFO dfs.FSNamesystem: BLOCK* ask 10.250.15.198:50010 to replicate blk_-3782569120714539446 to datanode(s) 10.251.203.129:50010
12081110<NUM> <NUM> INFO <>: <> <> <> <> <> block blk_<NUM> <> <> 081110 215955 18 INFO dfs.DataNode: 10.250.15.198:50010 Starting thread to transfer block blk_-3782569120714539446 to 10.251.203.129:50010
12081110<NUM> <NUM> INFO dfs.DataNode$DataXceiver: Received block blk_<NUM> src: <IP> dest: <IP> of size <NUM> 081110 215957 15226 INFO dfs.DataNode$DataXceiver: Received block blk_-3782569120714539446 src: /10.250.15.198:51013 dest: /10.250.15.198:50010 of size 14474705
6081110<NUM> <NUM> <> dfs.FSNamesystem: BLOCK NameSystem.addStoredBlock: <> <> <> <> <> <> <> <> size <NUM> 081110 215924 27 WARN dfs.FSNamesystem: BLOCK* NameSystem.addStoredBlock: Redundant addStoredBlock request received for blk_2522553781740514003 on 10.251.202.134:50010 size 67108864
6081110<NUM> <NUM> INFO dfs.DataNode$DataXceiver: <> <> <> <> <>: <> <> <> <> <> 081110 215936 15714 INFO dfs.DataNode$DataXceiver: writeBlock blk_720939897861061328 received exception java.io.IOException: Couldn’t read from stream
3081110<NUM> <NUM> INFO dfs.FSNamesystem: BLOCK* NameSystem.addStoredBlock: <> <> <> <> <> <> <> size <NUM> <> <> <> <> <> <> <> <>. 081110 220635 28 INFO dfs.FSNamesystem: BLOCK NameSystem.addStoredBlock: addStoredBlock request received for blk_-81196479666306310 on 10.250.17.177:50010 size 53457811 But it doesn’t belong to any file.
1081110<NUM> <NUM> <> <>: <> <> <> <> <> <> <>. <> <> <> <> <>. 081110 220631 19 WARN dfs.FSDataset: Unexpected error trying to delete block blk_-2012154052725261337. BlockInfo not found in volumeMap.

5.17 - log_reduce_full_fl()

This article describes the log_reduce_full_fl() user-defined function.

The function log_reduce_full_fl() finds common patterns in semi structured textual columns, such as log lines, and clusters the lines according to the extracted patterns. The function’s algorithm and most of the parameters are identical to log_reduce_fl(). However, log_reduce_fl() outputs a patterns summary table, whereas this function outputs a full table containing the pattern and parameters per each line.

Syntax

T | invoke log_reduce_full_fl(reduce_col [, pattern_col [, parameters_col [, use_logram [, use_drain [, custom_regexes [, custom_regexes_policy [, delimiters [, similarity_th [, tree_depth [, trigram_th [, bigram_th ]]]]]]]]]]])

Parameters

The following parameters description is a summary. For more information, see More about the algorithm section.

NameTypeRequiredDescription
reduce_colstring✔️The name of the string column the function is applied to.
pattern_colstring✔️The name of the string column to populate the pattern.
parameters_colstring✔️The name of the string column to populate the pattern’s parameters.
use_logramboolEnable or disable the Logram algorithm. Default value is true.
use_drainboolEnable or disable the Drain algorithm. Default value is true.
custom_regexesdynamicA dynamic array containing pairs of regular expression and replacement symbols to be searched in each input row, and replaced with their respective matching symbol. Default value is dynamic([]). The default regex table replaces numbers, IPs and GUIDs.
custom_regexes_policystringEither ‘prepend’, ‘append’ or ‘replace’. Controls whether custom_regexes are prepend/append/replace the default ones. Default value is ‘prepend’.
delimitersdynamicA dynamic array containing delimiter strings. Default value is dynamic([" "]), defining space as the only single character delimiter.
similarity_threalSimilarity threshold, used by the Drain algorithm. Increasing similarity_th results in more refined clusters. Default value is 0.5. If Drain is disabled, then this parameter has no effect.
tree_depthintIncreasing tree_depth improves the runtime of the Drain algorithm, but might reduce its accuracy. Default value is 4. If Drain is disabled, then this parameter has no effect.
trigram_thintDecreasing trigram_th increases the chances of Logram to replace tokens with wildcards. Default value is 10. If Logram is disabled, then this parameter has no effect.
bigram_thintDecreasing bigram_th increases the chances of Logram to replace tokens with wildcards. Default value is 15. If Logram is disabled, then this parameter has no effect.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let log_reduce_full_fl=(tbl:(*), reduce_col:string, pattern_col:string, parameters_col:string,
                   use_logram:bool=True, use_drain:bool=True, custom_regexes: dynamic = dynamic([]), custom_regexes_policy: string = 'prepend',
                   delimiters:dynamic = dynamic(' '), similarity_th:double=0.5, tree_depth:int = 4, trigram_th:int=10, bigram_th:int=15)
{
    let default_regex_table = pack_array('(/|)([0-9]+\\.){3}[0-9]+(:[0-9]+|)(:|)', '<IP>', 
                                         '([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})', '<GUID>', 
                                         '(?<=[^A-Za-z0-9])(\\-?\\+?\\d+)(?=[^A-Za-z0-9])|[0-9]+$', '<NUM>');
    let kwargs = bag_pack('reduced_column', reduce_col, 'delimiters', delimiters,'output_column', pattern_col, 'parameters_column', parameters_col, 
                          'trigram_th', trigram_th, 'bigram_th', bigram_th, 'default_regexes', default_regex_table, 
                          'custom_regexes', custom_regexes, 'custom_regexes_policy', custom_regexes_policy, 'tree_depth', tree_depth, 'similarity_th', similarity_th, 
                          'use_drain', use_drain, 'use_logram', use_logram, 'save_regex_tuples_in_output', True, 'regex_tuples_column', 'RegexesColumn', 
                          'output_type', 'full');
    let code = ```if 1:
        from log_cluster import log_reduce
        result = log_reduce.log_reduce(df, kargs)
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = 'Packages\\Text', docstring = 'Find common patterns in textual logs, output a full table')
log_reduce_full_fl(tbl:(*), reduce_col:string, pattern_col:string, parameters_col:string,
                   use_logram:bool=True, use_drain:bool=True, custom_regexes: dynamic = dynamic([]), custom_regexes_policy: string = 'prepend',
                   delimiters:dynamic = dynamic(' '), similarity_th:double=0.5, tree_depth:int = 4, trigram_th:int=10, bigram_th:int=15)
{
    let default_regex_table = pack_array('(/|)([0-9]+\\.){3}[0-9]+(:[0-9]+|)(:|)', '<IP>', 
                                         '([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})', '<GUID>', 
                                         '(?<=[^A-Za-z0-9])(\\-?\\+?\\d+)(?=[^A-Za-z0-9])|[0-9]+$', '<NUM>');
    let kwargs = bag_pack('reduced_column', reduce_col, 'delimiters', delimiters,'output_column', pattern_col, 'parameters_column', parameters_col, 
                          'trigram_th', trigram_th, 'bigram_th', bigram_th, 'default_regexes', default_regex_table, 
                          'custom_regexes', custom_regexes, 'custom_regexes_policy', custom_regexes_policy, 'tree_depth', tree_depth, 'similarity_th', similarity_th, 
                          'use_drain', use_drain, 'use_logram', use_logram, 'save_regex_tuples_in_output', True, 'regex_tuples_column', 'RegexesColumn', 
                          'output_type', 'full');
    let code = ```if 1:
        from log_cluster import log_reduce
        result = log_reduce.log_reduce(df, kargs)
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let log_reduce_full_fl=(tbl:(*), reduce_col:string, pattern_col:string, parameters_col:string,
                   use_logram:bool=True, use_drain:bool=True, custom_regexes: dynamic = dynamic([]), custom_regexes_policy: string = 'prepend',
                   delimiters:dynamic = dynamic(' '), similarity_th:double=0.5, tree_depth:int = 4, trigram_th:int=10, bigram_th:int=15)
{
    let default_regex_table = pack_array('(/|)([0-9]+\\.){3}[0-9]+(:[0-9]+|)(:|)', '<IP>', 
                                         '([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})', '<GUID>', 
                                         '(?<=[^A-Za-z0-9])(\\-?\\+?\\d+)(?=[^A-Za-z0-9])|[0-9]+$', '<NUM>');
    let kwargs = bag_pack('reduced_column', reduce_col, 'delimiters', delimiters,'output_column', pattern_col, 'parameters_column', parameters_col, 
                          'trigram_th', trigram_th, 'bigram_th', bigram_th, 'default_regexes', default_regex_table, 
                          'custom_regexes', custom_regexes, 'custom_regexes_policy', custom_regexes_policy, 'tree_depth', tree_depth, 'similarity_th', similarity_th, 
                          'use_drain', use_drain, 'use_logram', use_logram, 'save_regex_tuples_in_output', True, 'regex_tuples_column', 'RegexesColumn', 
                          'output_type', 'full');
    let code = ```if 1:
        from log_cluster import log_reduce
        result = log_reduce.log_reduce(df, kargs)
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
};
//
// Finding common patterns in HDFS logs, a commonly used benchmark for log parsing
//
HDFS_log
| take 100000
| extend Patterns="", Parameters=""
| invoke log_reduce_full_fl(reduce_col="data", pattern_col="Patterns", parameters_col="Parameters")
| take 10

Stored

//
// Finding common patterns in HDFS logs, a commonly used benchmark for log parsing
//
HDFS_log
| take 100000
| extend Patterns="", Parameters=""
| invoke log_reduce_full_fl(reduce_col="data", pattern_col="Patterns", parameters_col="Parameters")
| take 10

Output

dataPatternsParameters
08111021585815485 INFO dfs.DataNode$PacketResponder: Received block blk_5080254298708411681 of size 67108864 from /10.251.43.21 081110 <NUM> <NUM> INFO dfs.DataNode$PacketResponder: Received block blk_<NUM> of size <NUM> from <IP> “{““parameter_0"”: ““215858"”, ““parameter_1"”: ““15485"”, ““parameter_2"”: ““5080254298708411681"”, ““parameter_3"”: ““67108864"”, ““parameter_4"”: “"/10.251.43.21"”}”
08111021585815494 INFO dfs.DataNode$DataXceiver: Receiving block blk_-7037346755429293022 src: /10.251.43.21:45933 dest: /10.251.43.21:50010 081110 <NUM> <NUM> INFO dfs.DataNode$DataXceiver: Receiving block blk_<NUM> src: <IP> dest: <IP> “{““parameter_0"”: ““215858"”, ““parameter_1"”: ““15494"”, ““parameter_2"”: “"-7037346755429293022"”, ““parameter_3"”: “"/10.251.43.21:45933"”, ““parameter_4"”: “"/10.251.43.21:50010"”}”
08111021585815496 INFO dfs.DataNode$PacketResponder: PacketResponder 2 for block blk_-7746692545918257727 terminating 081110 <NUM> <NUM> INFO dfs.DataNode$PacketResponder: PacketResponder <NUM> for block blk_<NUM> terminating “{““parameter_0"”: ““215858"”, ““parameter_1"”: ““15496"”, ““parameter_2"”: ““2"”, ““parameter_3"”: “"-7746692545918257727"”}”
08111021585815496 INFO dfs.DataNode$PacketResponder: Received block blk_-7746692545918257727 of size 67108864 from /10.251.107.227 081110 <NUM> <NUM> INFO dfs.DataNode$PacketResponder: Received block blk_<NUM> of size <NUM> from <IP> “{““parameter_0"”: ““215858"”, ““parameter_1"”: ““15496"”, ““parameter_2"”: “"-7746692545918257727"”, ““parameter_3"”: ““67108864"”, ““parameter_4"”: “"/10.251.107.227"”}”
08111021585815511 INFO dfs.DataNode$DataXceiver: Receiving block blk_-8578644687709935034 src: /10.251.107.227:39600 dest: /10.251.107.227:50010 081110 <NUM> <NUM> INFO dfs.DataNode$DataXceiver: Receiving block blk_<NUM> src: <IP> dest: <IP> “{““parameter_0"”: ““215858"”, ““parameter_1"”: ““15511"”, ““parameter_2"”: “"-8578644687709935034"”, ““parameter_3"”: “"/10.251.107.227:39600"”, ““parameter_4"”: “"/10.251.107.227:50010"”}”
08111021585815514 INFO dfs.DataNode$DataXceiver: Receiving block blk_722881101738646364 src: /10.251.75.79:58213 dest: /10.251.75.79:50010 081110 <NUM> <NUM> INFO dfs.DataNode$DataXceiver: Receiving block blk_<NUM> src: <IP> dest: <IP> “{““parameter_0"”: ““215858"”, ““parameter_1"”: ““15514"”, ““parameter_2"”: ““722881101738646364"”, ““parameter_3"”: “"/10.251.75.79:58213"”, ““parameter_4"”: “"/10.251.75.79:50010"”}”
08111021585815517 INFO dfs.DataNode$PacketResponder: PacketResponder 2 for block blk_-7110736255599716271 terminating 081110 <NUM> <NUM> INFO dfs.DataNode$PacketResponder: PacketResponder <NUM> for block blk_<NUM> terminating “{““parameter_0"”: ““215858"”, ““parameter_1"”: ““15517"”, ““parameter_2"”: ““2"”, ““parameter_3"”: “"-7110736255599716271"”}”
08111021585815517 INFO dfs.DataNode$PacketResponder: Received block blk_-7110736255599716271 of size 67108864 from /10.251.42.246 081110 <NUM> <NUM> INFO dfs.DataNode$PacketResponder: Received block blk_<NUM> of size <NUM> from <IP> “{““parameter_0"”: ““215858"”, ““parameter_1"”: ““15517"”, ““parameter_2"”: “"-7110736255599716271"”, ““parameter_3"”: ““67108864"”, ““parameter_4"”: “"/10.251.42.246"”}”
08111021585815533 INFO dfs.DataNode$DataXceiver: Receiving block blk_7257432994295824826 src: /10.251.26.8:41803 dest: /10.251.26.8:50010 081110 <NUM> <NUM> INFO dfs.DataNode$DataXceiver: Receiving block blk_<NUM> src: <IP> dest: <IP> “{““parameter_0"”: ““215858"”, ““parameter_1"”: ““15533"”, ““parameter_2"”: ““7257432994295824826"”, ““parameter_3"”: “"/10.251.26.8:41803"”, ““parameter_4"”: “"/10.251.26.8:50010"”}”
08111021585815533 INFO dfs.DataNode$DataXceiver: Receiving block blk_-7771332301119265281 src: /10.251.43.210:34258 dest: /10.251.43.210:50010 081110 <NUM> <NUM> INFO dfs.DataNode$DataXceiver: Receiving block blk_<NUM> src: <IP> dest: <IP> “{““parameter_0"”: ““215858"”, ““parameter_1"”: ““15533"”, ““parameter_2"”: “"-7771332301119265281"”, ““parameter_3"”: “"/10.251.43.210:34258"”, ““parameter_4"”: “"/10.251.43.210:50010"”}”

5.18 - log_reduce_predict_fl()

This article describes the log_reduce_predict_fl() user-defined function.

The function log_reduce_predict_fl() parses semi structured textual columns, such as log lines, and for each line it matches the respective pattern from a pretrained model or reports an anomaly if no matching pattern was found. The function’s’ output is similar to log_reduce_fl(), though the patterns are retrieved from a pretrained model that generated by log_reduce_train_fl().

Syntax

T | invoke log_reduce_predict_fl(models_tbl, model_name, reduce_col [, anomaly_str ])

Parameters

NameTypeRequiredDescription
models_tbltable✔️A table containing models generated by log_reduce_train_fl(). The table’s schema should be (name:string, timestamp: datetime, model:string).
model_namestring✔️The name of the model that will be retrieved from models_tbl. If the table contains few models matching the model name, the latest one is used.
reduce_colstring✔️The name of the string column the function is applied to.
anomaly_strstringThis string is output for lines that have no matched pattern in the model. Default value is “ANOMALY”.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let log_reduce_predict_fl=(tbl:(*), models_tbl: (name:string, timestamp: datetime, model:string), 
                      model_name:string, reduce_col:string, anomaly_str: string = 'ANOMALY')
{
    let model_str = toscalar(models_tbl | where name == model_name | top 1 by timestamp desc | project model);
    let kwargs = bag_pack('logs_col', reduce_col, 'output_patterns_col', 'LogReduce','output_parameters_col', '', 
                          'model', model_str, 'anomaly_str', anomaly_str, 'output_type', 'summary');
    let code = ```if 1:
        from log_cluster import log_reduce_predict
        result = log_reduce_predict.log_reduce_predict(df, kargs)
    ```;
    tbl
    | evaluate hint.distribution=per_node python(typeof(Count:int, LogReduce:string, example:string), code, kwargs)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = 'Packages\\Text', docstring = 'Apply a trained model to find common patterns in textual logs, output a summary table')
log_reduce_predict_fl(tbl:(*), models_tbl: (name:string, timestamp: datetime, model:string), 
                      model_name:string, reduce_col:string, anomaly_str: string = 'ANOMALY')
{
    let model_str = toscalar(models_tbl | where name == model_name | top 1 by timestamp desc | project model);
    let kwargs = bag_pack('logs_col', reduce_col, 'output_patterns_col', 'LogReduce','output_parameters_col', '', 
                          'model', model_str, 'anomaly_str', anomaly_str, 'output_type', 'summary');
    let code = ```if 1:
        from log_cluster import log_reduce_predict
        result = log_reduce_predict.log_reduce_predict(df, kargs)
    ```;
    tbl
    | evaluate hint.distribution=per_node python(typeof(Count:int, LogReduce:string, example:string), code, kwargs)
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let log_reduce_predict_fl=(tbl:(*), models_tbl: (name:string, timestamp: datetime, model:string), 
                      model_name:string, reduce_col:string, anomaly_str: string = 'ANOMALY')
{
    let model_str = toscalar(models_tbl | where name == model_name | top 1 by timestamp desc | project model);
    let kwargs = bag_pack('logs_col', reduce_col, 'output_patterns_col', 'LogReduce','output_parameters_col', '', 
                          'model', model_str, 'anomaly_str', anomaly_str, 'output_type', 'summary');
    let code = ```if 1:
        from log_cluster import log_reduce_predict
        result = log_reduce_predict.log_reduce_predict(df, kargs)
    ```;
    tbl
    | evaluate hint.distribution=per_node python(typeof(Count:int, LogReduce:string, example:string), code, kwargs)
};
HDFS_log_100k
| take 1000
| invoke log_reduce_predict_fl(models_tbl=ML_Models, model_name="HDFS_100K", reduce_col="data")

Stored

HDFS_log_100k
| take 1000
| invoke log_reduce_predict_fl(models_tbl=ML_Models, model_name="HDFS_100K", reduce_col="data")

Output

CountLogReduceexample
239081110<NUM> <NUM> INFO dfs.DataNode$DataXceiver: Receiving block blk_<NUM> src: <IP> dest: <IP> 081110 215858 15494 INFO dfs.DataNode$DataXceiver: Receiving block blk_-7037346755429293022 src: /10.251.43.21:45933 dest: /10.251.43.21:50010
231081110<NUM> <NUM> INFO dfs.DataNode$PacketResponder: Received block blk_<NUM> of size <NUM> from <IP> 081110 215858 15485 INFO dfs.DataNode$PacketResponder: Received block blk_5080254298708411681 of size 67108864 from /10.251.43.21
230081110<NUM> <NUM> INFO dfs.DataNode$PacketResponder: PacketResponder <NUM> for block blk_<NUM> terminating 081110 215858 15496 INFO dfs.DataNode$PacketResponder: PacketResponder 2 for block blk_-7746692545918257727 terminating
218081110<NUM> <NUM> INFO dfs.FSNamesystem: BLOCK* NameSystem.addStoredBlock: blockMap updated: <IP> is added to blk_<NUM> size <NUM> 081110 215858 27 INFO dfs.FSNamesystem: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.250.11.85:50010 is added to blk_5080254298708411681 size 67108864
79081110<NUM> <NUM> INFO dfs.FSNamesystem: BLOCK* NameSystem.allocateBlock: <>. <> 081110 215858 26 INFO dfs.FSNamesystem: BLOCK* NameSystem.allocateBlock: /user/root/rand3/_temporary/task_200811101024_0005_m_001805_0/part-01805. blk-7037346755429293022
3081110<NUM> <NUM> INFO dfs.DataBlockScanner: Verification succeeded for <*> 081110 215859 13 INFO dfs.DataBlockScanner: Verification succeeded for blk_-7244926816084627474

5.19 - log_reduce_predict_full_fl()

This article describes the log_reduce_predict_full_fl() user-defined function.

The function log_reduce_predict_full_fl() parses semi structured textual columns, such as log lines, and for each line it matches the respective pattern from a pretrained model or reports an anomaly if no matching pattern was found. The patterns are retrieved from a pretrained model, generated by log_reduce_train_fl(). The function is similar to log_reduce_predict_fl(), but unlike log_reduce_predict_fl() that outputs a patterns summary table, this function outputs a full table containing the pattern and parameters per each line.

Syntax

T | invoke log_reduce_predict_full_fl(models_tbl, model_name, reduce_col, pattern_col, parameters_col [, anomaly_str ])

Parameters

NameTypeRequiredDescription
models_tbltable✔️A table containing models generated by log_reduce_train_fl(). The table’s schema should be (name:string, timestamp: datetime, model:string).
model_namestring✔️The name of the model that will be retrieved from models_tbl. If the table contains few models matching the model name, the latest one is used.
reduce_colstring✔️The name of the string column the function is applied to.
pattern_colstring✔️The name of the string column to populate the pattern.
parameters_colstring✔️The name of the string column to populate the pattern’s parameters.
anomaly_strstringThis string is output for lines that have no matched pattern in the model. Default value is “ANOMALY”.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let log_reduce_predict_full_fl=(tbl:(*), models_tbl: (name:string, timestamp: datetime, model:string), 
                           model_name:string, reduce_col:string, pattern_col:string, parameters_col:string, 
                           anomaly_str: string = 'ANOMALY')
{
    let model_str = toscalar(models_tbl | where name == model_name | top 1 by timestamp desc | project model);
    let kwargs = bag_pack('logs_col', reduce_col, 'output_patterns_col', pattern_col,'output_parameters_col', 
                          parameters_col, 'model', model_str, 'anomaly_str', anomaly_str, 'output_type', 'full');
    let code = ```if 1:
        from log_cluster import log_reduce_predict
        result = log_reduce_predict.log_reduce_predict(df, kargs)
    ```;
    tbl
    | evaluate hint.distribution=per_node python(typeof(*), code, kwargs)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = 'Packages\\Text', docstring = 'Apply a trained model to find common patterns in textual logs, output a full table')
log_reduce_predict_full_fl(tbl:(*), models_tbl: (name:string, timestamp: datetime, model:string), 
                           model_name:string, reduce_col:string, pattern_col:string, parameters_col:string, 
                           anomaly_str: string = 'ANOMALY')
{
    let model_str = toscalar(models_tbl | where name == model_name | top 1 by timestamp desc | project model);
    let kwargs = bag_pack('logs_col', reduce_col, 'output_patterns_col', pattern_col,'output_parameters_col', 
                          parameters_col, 'model', model_str, 'anomaly_str', anomaly_str, 'output_type', 'full');
    let code = ```if 1:
        from log_cluster import log_reduce_predict
        result = log_reduce_predict.log_reduce_predict(df, kargs)
    ```;
    tbl
    | evaluate hint.distribution=per_node python(typeof(*), code, kwargs)
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let log_reduce_predict_full_fl=(tbl:(*), models_tbl: (name:string, timestamp: datetime, model:string), 
                           model_name:string, reduce_col:string, pattern_col:string, parameters_col:string, 
                           anomaly_str: string = 'ANOMALY')
{
    let model_str = toscalar(models_tbl | where name == model_name | top 1 by timestamp desc | project model);
    let kwargs = bag_pack('logs_col', reduce_col, 'output_patterns_col', pattern_col,'output_parameters_col', 
                          parameters_col, 'model', model_str, 'anomaly_str', anomaly_str, 'output_type', 'full');
    let code = ```if 1:
        from log_cluster import log_reduce_predict
        result = log_reduce_predict.log_reduce_predict(df, kargs)
    ```;
    tbl
    | evaluate hint.distribution=per_node python(typeof(*), code, kwargs)
};
HDFS_log_100k
| extend Patterns='', Parameters=''
| take 10
| invoke log_reduce_predict_full_fl(models_tbl=ML_Models, model_name="HDFS_100K", reduce_col="data", pattern_col="Patterns", parameters_col="Parameters")

Stored

HDFS_log_100k
| extend Patterns='', Parameters=''
| take 10
| invoke log_reduce_predict_full_fl(models_tbl=ML_Models, model_name="HDFS_100K", reduce_col="data", pattern_col="Patterns", parameters_col="Parameters")

Output

dataPatternsParameters
08111021585815485 INFO dfs.DataNode$PacketResponder: Received block blk_5080254298708411681 of size 67108864 from /10.251.43.21 081110 <NUM> <NUM> INFO dfs.DataNode$PacketResponder: Received block blk_<NUM> of size <NUM> from <IP> {“parameter_0”: “215858”, “parameter_1”: “15485”, “parameter_2”: “5080254298708411681”, “parameter_3”: “67108864”, “parameter_4”: “/10.251.43.21”}
08111021585815494 INFO dfs.DataNode$DataXceiver: Receiving block blk_-7037346755429293022 src: /10.251.43.21:45933 dest: /10.251.43.21:50010 081110 <NUM> <NUM> INFO dfs.DataNode$DataXceiver: Receiving block blk_<NUM> src: <IP> dest: <IP> {“parameter_0”: “215858”, “parameter_1”: “15494”, “parameter_2”: “-7037346755429293022”, “parameter_3”: “/10.251.43.21:45933”, “parameter_4”: “/10.251.43.21:50010”}
08111021585815496 INFO dfs.DataNode$PacketResponder: PacketResponder 2 for block blk_-7746692545918257727 terminating 081110 <NUM> <NUM> INFO dfs.DataNode$PacketResponder: PacketResponder <NUM> for block blk_<NUM> terminating {“parameter_0”: “215858”, “parameter_1”: “15496”, “parameter_2”: “2”, “parameter_3”: “-7746692545918257727”}
08111021585815496 INFO dfs.DataNode$PacketResponder: Received block blk_-7746692545918257727 of size 67108864 from /10.251.107.227 081110 <NUM> <NUM> INFO dfs.DataNode$PacketResponder: Received block blk_<NUM> of size <NUM> from <IP> {“parameter_0”: “215858”, “parameter_1”: “15496”, “parameter_2”: “-7746692545918257727”, “parameter_3”: “67108864”, “parameter_4”: “/10.251.107.227”}
08111021585815511 INFO dfs.DataNode$DataXceiver: Receiving block blk_-8578644687709935034 src: /10.251.107.227:39600 dest: /10.251.107.227:50010 081110 <NUM> <NUM> INFO dfs.DataNode$DataXceiver: Receiving block blk_<NUM> src: <IP> dest: <IP> {“parameter_0”: “215858”, “parameter_1”: “15511”, “parameter_2”: “-8578644687709935034”, “parameter_3”: “/10.251.107.227:39600”, “parameter_4”: “/10.251.107.227:50010”}
08111021585815514 INFO dfs.DataNode$DataXceiver: Receiving block blk_722881101738646364 src: /10.251.75.79:58213 dest: /10.251.75.79:50010 081110 <NUM> <NUM> INFO dfs.DataNode$DataXceiver: Receiving block blk_<NUM> src: <IP> dest: <IP> {“parameter_0”: “215858”, “parameter_1”: “15514”, “parameter_2”: “722881101738646364”, “parameter_3”: “/10.251.75.79:58213”, “parameter_4”: “/10.251.75.79:50010”}
08111021585815517 INFO dfs.DataNode$PacketResponder: PacketResponder 2 for block blk_-7110736255599716271 terminating 081110 <NUM> <NUM> INFO dfs.DataNode$PacketResponder: PacketResponder <NUM> for block blk_<NUM> terminating {“parameter_0”: “215858”, “parameter_1”: “15517”, “parameter_2”: “2”, “parameter_3”: “-7110736255599716271”}
08111021585815517 INFO dfs.DataNode$PacketResponder: Received block blk_-7110736255599716271 of size 67108864 from /10.251.42.246 081110 <NUM> <NUM> INFO dfs.DataNode$PacketResponder: Received block blk_<NUM> of size <NUM> from <IP> {“parameter_0”: “215858”, “parameter_1”: “15517”, “parameter_2”: “-7110736255599716271”, “parameter_3”: “67108864”, “parameter_4”: “/10.251.42.246”}
08111021585815533 INFO dfs.DataNode$DataXceiver: Receiving block blk_7257432994295824826 src: /10.251.26.8:41803 dest: /10.251.26.8:50010 081110 <NUM> <NUM> INFO dfs.DataNode$DataXceiver: Receiving block blk_<NUM> src: <IP> dest: <IP> {“parameter_0”: “215858”, “parameter_1”: “15533”, “parameter_2”: “7257432994295824826”, “parameter_3”: “/10.251.26.8:41803”, “parameter_4”: “/10.251.26.8:50010”}
08111021585815533 INFO dfs.DataNode$DataXceiver: Receiving block blk_-7771332301119265281 src: /10.251.43.210:34258 dest: /10.251.43.210:50010 081110 <NUM> <NUM> INFO dfs.DataNode$DataXceiver: Receiving block blk_<NUM> src: <IP> dest: <IP> {“parameter_0”: “215858”, “parameter_1”: “15533”, “parameter_2”: “-7771332301119265281”, “parameter_3”: “/10.251.43.210:34258”, “parameter_4”: “/10.251.43.210:50010”}

5.20 - log_reduce_train_fl()

This article describes the log_reduce_train_fl() user-defined function.

The function log_reduce_train_fl() finds common patterns in semi structured textual columns, such as log lines, and clusters the lines according to the extracted patterns. The function’s algorithm and most of the parameters are identical to log_reduce_fl(), but unlike log_reduce_fl() that outputs a patterns summary table, this function outputs the serialized model. The model can be used by the function log_reduce_predict_fl()/log_reduce_predict_full_fl() to predict the matched pattern for new log lines.

Syntax

T | invoke log_reduce_train_fl(reduce_col, model_name [, use_logram [, use_drain [, custom_regexes [, custom_regexes_policy [, delimiters [, similarity_th [, tree_depth [, trigram_th [, bigram_th ]]]]]]]]])

Parameters

The following parameters description is a summary. For more information, see More about the algorithm section.

NameTypeRequiredDescription
reduce_colstring✔️The name of the string column the function is applied to.
model_namestring✔️The name of the output model.
use_logramboolEnable or disable the Logram algorithm. Default value is true.
use_drainboolEnable or disable the Drain algorithm. Default value is true.
custom_regexesdynamicA dynamic array containing pairs of regular expression and replacement symbols to be searched in each input row, and replaced with their respective matching symbol. Default value is dynamic([]). The default regex table replaces numbers, IPs and GUIDs.
custom_regexes_policystringEither ‘prepend’, ‘append’ or ‘replace’. Controls whether custom_regexes are prepend/append/replace the default ones. Default value is ‘prepend’.
delimitersdynamicA dynamic array containing delimiter strings. Default value is dynamic([" "]), defining space as the only single character delimiter.
similarity_threalSimilarity threshold, used by the Drain algorithm. Increasing similarity_th results in more refined databases. Default value is 0.5. If Drain is disabled, then this parameter has no effect.
tree_depthintIncreasing tree_depth improves the runtime of the Drain algorithm, but might reduce its accuracy. Default value is 4. If Drain is disabled, then this parameter has no effect.
trigram_thintDecreasing trigram_th increases the chances of Logram to replace tokens with wildcards. Default value is 10. If Logram is disabled, then this parameter has no effect.
bigram_thintDecreasing bigram_th increases the chances of Logram to replace tokens with wildcards. Default value is 15. If Logram, then is disabled this parameter has no effect.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let log_reduce_train_fl=(tbl:(*), reduce_col:string, model_name:string,
              use_logram:bool=True, use_drain:bool=True, custom_regexes: dynamic = dynamic([]), custom_regexes_policy: string = 'prepend',
              delimiters:dynamic = dynamic(' '), similarity_th:double=0.5, tree_depth:int = 4, trigram_th:int=10, bigram_th:int=15)
{
    let default_regex_table = pack_array('(/|)([0-9]+\\.){3}[0-9]+(:[0-9]+|)(:|)', '<IP>', 
                                         '([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})', '<GUID>', 
                                         '(?<=[^A-Za-z0-9])(\\-?\\+?\\d+)(?=[^A-Za-z0-9])|[0-9]+$', '<NUM>');
    let kwargs = bag_pack('reduced_column', reduce_col, 'delimiters', delimiters,'output_column', 'LogReduce', 'parameters_column', '', 
                          'trigram_th', trigram_th, 'bigram_th', bigram_th, 'default_regexes', default_regex_table, 
                          'custom_regexes', custom_regexes, 'custom_regexes_policy', custom_regexes_policy, 'tree_depth', tree_depth, 'similarity_th', similarity_th, 
                          'use_drain', use_drain, 'use_logram', use_logram, 'save_regex_tuples_in_output', True, 'regex_tuples_column', 'RegexesColumn', 
                          'output_type', 'model');
    let code = ```if 1:
        from log_cluster import log_reduce
        result = log_reduce.log_reduce(df, kargs)
    ```;
    tbl
    | extend LogReduce=''
    | evaluate python(typeof(model:string), code, kwargs)
    | project name=model_name, timestamp=now(), model
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = 'Packages\\Text', docstring = 'Find common patterns in textual logs, output a model')
log_reduce_train_fl(tbl:(*), reduce_col:string, model_name:string,
              use_logram:bool=True, use_drain:bool=True, custom_regexes: dynamic = dynamic([]), custom_regexes_policy: string = 'prepend',
              delimiters:dynamic = dynamic(' '), similarity_th:double=0.5, tree_depth:int = 4, trigram_th:int=10, bigram_th:int=15)
{
    let default_regex_table = pack_array('(/|)([0-9]+\\.){3}[0-9]+(:[0-9]+|)(:|)', '<IP>', 
                                         '([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})', '<GUID>', 
                                         '(?<=[^A-Za-z0-9])(\\-?\\+?\\d+)(?=[^A-Za-z0-9])|[0-9]+$', '<NUM>');
    let kwargs = bag_pack('reduced_column', reduce_col, 'delimiters', delimiters,'output_column', 'LogReduce', 'parameters_column', '', 
                          'trigram_th', trigram_th, 'bigram_th', bigram_th, 'default_regexes', default_regex_table, 
                          'custom_regexes', custom_regexes, 'custom_regexes_policy', custom_regexes_policy, 'tree_depth', tree_depth, 'similarity_th', similarity_th, 
                          'use_drain', use_drain, 'use_logram', use_logram, 'save_regex_tuples_in_output', True, 'regex_tuples_column', 'RegexesColumn', 
                          'output_type', 'model');
    let code = ```if 1:
        from log_cluster import log_reduce
        result = log_reduce.log_reduce(df, kargs)
    ```;
    tbl
    | extend LogReduce=''
    | evaluate python(typeof(model:string), code, kwargs)
    | project name=model_name, timestamp=now(), model
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

//
// Finding common patterns in HDFS logs, export and store the trained model in ML_Models table
//
.set-or-append ML_Models <|
//
let log_reduce_train_fl=(tbl:(*), reduce_col:string, model_name:string,
              use_logram:bool=True, use_drain:bool=True, custom_regexes: dynamic = dynamic([]), custom_regexes_policy: string = 'prepend',
              delimiters:dynamic = dynamic(' '), similarity_th:double=0.5, tree_depth:int = 4, trigram_th:int=10, bigram_th:int=15)
{
    let default_regex_table = pack_array('(/|)([0-9]+\\.){3}[0-9]+(:[0-9]+|)(:|)', '<IP>', 
                                         '([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})', '<GUID>', 
                                         '(?<=[^A-Za-z0-9])(\\-?\\+?\\d+)(?=[^A-Za-z0-9])|[0-9]+$', '<NUM>');
    let kwargs = bag_pack('reduced_column', reduce_col, 'delimiters', delimiters,'output_column', 'LogReduce', 'parameters_column', '', 
                          'trigram_th', trigram_th, 'bigram_th', bigram_th, 'default_regexes', default_regex_table, 
                          'custom_regexes', custom_regexes, 'custom_regexes_policy', custom_regexes_policy, 'tree_depth', tree_depth, 'similarity_th', similarity_th, 
                          'use_drain', use_drain, 'use_logram', use_logram, 'save_regex_tuples_in_output', True, 'regex_tuples_column', 'RegexesColumn', 
                          'output_type', 'model');
    let code = ```if 1:
        from log_cluster import log_reduce
        result = log_reduce.log_reduce(df, kargs)
    ```;
    tbl
    | extend LogReduce=''
    | evaluate python(typeof(model:string), code, kwargs)
    | project name=model_name, timestamp=now(), model
};
HDFS_log_100k
| take 100000
| invoke log_reduce_train_fl(reduce_col="data", model_name="HDFS_100K")

Stored

//
// Finding common patterns in HDFS logs, export and store the trained model in ML_Models table
//
.set-or-append ML_Models <|
//
HDFS_log_100k
| take 100000
| invoke log_reduce_train_fl(reduce_col="data", model_name="HDFS_100K")

Output

ExtentIdOriginalSizeExtentSizeCompressedSizeIndexSizeRowCount
3734a525-cc08-44b9-a992-72de97b324141038311546108347121

5.21 - mann_whitney_u_test_fl()

This article describes the mann_whitney_u_test_fl() user-defined function.

The function mann_whitney_u_test_fl() is a UDF (user-defined function) that performs the Mann-Whitney U Test.

Syntax

T | mann_whitney_u_test_fl(data1, data2, test_statistic,p_value [, use_continuity ])

Parameters

NameTypeRequiredDescription
data1string✔️The name of the column containing the first set of data to be used for the test.
data2string✔️The name of the column containing the second set of data to be used for the test.
test_statisticstring✔️The name of the column to store test statistic value for the results.
p_valuestring✔️The name of the column to store p-value for the results.
use_continuityboolDetermines if a continuity correction (1/2) is applied. Default is true.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let mann_whitney_u_test_fl = (tbl:(*), data1:string, data2:string, test_statistic:string, p_value:string, use_continuity:bool=true)
{
    let kwargs = bag_pack('data1', data1, 'data2', data2, 'test_statistic', test_statistic, 'p_value', p_value, 'use_continuity', use_continuity);
    let code = ```if 1:
        from scipy import stats
        data1 = kargs["data1"]
        data2 = kargs["data2"]
        test_statistic = kargs["test_statistic"]
        p_value = kargs["p_value"]
        use_continuity = kargs["use_continuity"]
        def func(row):
            statistics = stats.mannwhitneyu(row[data1], row[data2], use_continuity=use_continuity)
            return statistics[0], statistics[1]
        result = df
        result[[test_statistic, p_value]]  = df.apply(func, axis=1, result_type = "expand")
        ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Stats", docstring = "Mann-Whitney U Test")
mann_whitney_u_test_fl(tbl:(*), data1:string, data2:string, test_statistic:string, p_value:string, use_continuity:bool=true)
{
    let kwargs = bag_pack('data1', data1, 'data2', data2, 'test_statistic', test_statistic, 'p_value', p_value, 'use_continuity', use_continuity);
    let code = ```if 1:
        from scipy import stats
        data1 = kargs["data1"]
        data2 = kargs["data2"]
        test_statistic = kargs["test_statistic"]
        p_value = kargs["p_value"]
        use_continuity = kargs["use_continuity"]
        def func(row):
            statistics = stats.mannwhitneyu(row[data1], row[data2], use_continuity=use_continuity)
            return statistics[0], statistics[1]
        result = df
        result[[test_statistic, p_value]]  = df.apply(func, axis=1, result_type = "expand")
        ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let mann_whitney_u_test_fl = (tbl:(*), data1:string, data2:string, test_statistic:string, p_value:string, use_continuity:bool=true)
{
    let kwargs = bag_pack('data1', data1, 'data2', data2, 'test_statistic', test_statistic, 'p_value', p_value, 'use_continuity', use_continuity);
    let code = ```if 1:
        from scipy import stats
        data1 = kargs["data1"]
        data2 = kargs["data2"]
        test_statistic = kargs["test_statistic"]
        p_value = kargs["p_value"]
        use_continuity = kargs["use_continuity"]
        def func(row):
            statistics = stats.mannwhitneyu(row[data1], row[data2], use_continuity=use_continuity)
            return statistics[0], statistics[1]
        result = df
        result[[test_statistic, p_value]]  = df.apply(func, axis=1, result_type = "expand")
        ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
};
datatable(id:string, sample1:dynamic, sample2:dynamic) [
'Test #1', dynamic([23.64, 20.57, 20.42]), dynamic([27.1, 22.12, 33.56]),
'Test #2', dynamic([20.85, 21.89, 23.41]), dynamic([35.09, 30.02, 26.52]),
'Test #3', dynamic([20.13, 20.5, 21.7, 22.02]), dynamic([32.2, 32.79, 33.9, 34.22])
]
| extend test_stat= 0.0, p_val = 0.0
| invoke mann_whitney_u_test_fl('sample1', 'sample2', 'test_stat', 'p_val')

Stored

datatable(id:string, sample1:dynamic, sample2:dynamic) [
'Test #1', dynamic([23.64, 20.57, 20.42]), dynamic([27.1, 22.12, 33.56]),
'Test #2', dynamic([20.85, 21.89, 23.41]), dynamic([35.09, 30.02, 26.52]),
'Test #3', dynamic([20.13, 20.5, 21.7, 22.02]), dynamic([32.2, 32.79, 33.9, 34.22])
]
| extend test_stat= 0.0, p_val = 0.0
| invoke mann_whitney_u_test_fl('sample1', 'sample2', 'test_stat', 'p_val')

Output

idsample1sample2test_statp_val
Test #1[23.64, 20.57, 20.42][27.1, 22.12, 33.56]10.095215131912761986
Test #2[20.85, 21.89, 23.41][35.09, 30.02, 26.52]00.04042779918502612
Test #3[20.13, 20.5, 21.7, 22.02][32.2, 32.79, 33.9, 34.22]00.015191410988288745

5.22 - normality_test_fl()

This article describes the normality_test_fl() user-defined function.

The function normality_test_fl() is a UDF (user-defined function) that performs the Normality Test.

Syntax

T | invoke normality_test_fl(data, test_statistic,p_value)

Parameters

NameTypeRequiredDescription
datastring✔️The name of the column containing the data to be used for the test.
test_statisticstring✔️The name of the column to store test statistic value for the results.
p_valuestring✔️The name of the column to store p-value for the results.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let normality_test_fl = (tbl:(*), data:string, test_statistic:string, p_value:string)
{
    let kwargs = bag_pack('data', data, 'test_statistic', test_statistic, 'p_value', p_value);
    let code = ```if 1:
        from scipy import stats
        data = kargs["data"]
        test_statistic = kargs["test_statistic"]
        p_value = kargs["p_value"]
        def func(row):
            statistics = stats.normaltest(row[data])
            return statistics[0], statistics[1]
        result = df
        result[[test_statistic, p_value]]  = df.apply(func, axis=1, result_type = "expand")
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Stats", docstring = "Normality Test")
normality_test_fl(tbl:(*), data:string, test_statistic:string, p_value:string)
{
    let kwargs = bag_pack('data', data, 'test_statistic', test_statistic, 'p_value', p_value);
    let code = ```if 1:
        from scipy import stats
        data = kargs["data"]
        test_statistic = kargs["test_statistic"]
        p_value = kargs["p_value"]
        def func(row):
            statistics = stats.normaltest(row[data])
            return statistics[0], statistics[1]
        result = df
        result[[test_statistic, p_value]]  = df.apply(func, axis=1, result_type = "expand")
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let normality_test_fl = (tbl:(*), data:string, test_statistic:string, p_value:string)
{
    let kwargs = bag_pack('data', data, 'test_statistic', test_statistic, 'p_value', p_value);
    let code = ```if 1:
        from scipy import stats
        data = kargs["data"]
        test_statistic = kargs["test_statistic"]
        p_value = kargs["p_value"]
        def func(row):
            statistics = stats.normaltest(row[data])
            return statistics[0], statistics[1]
        result = df
        result[[test_statistic, p_value]]  = df.apply(func, axis=1, result_type = "expand")
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
};
datatable(id:string, sample1:dynamic) [
'Test #1', dynamic([23.64, 20.57, 20.42, 27.1, 22.12, 33.56, 23.64, 20.57]),
'Test #2', dynamic([20.85, 21.89, 23.41, 35.09, 30.02, 26.52, 20.85, 21.89]),
'Test #3', dynamic([20.13, 20.5, 21.7, 22.02, 32.2, 32.79, 33.9, 34.22, 20.13, 20.5])
]
| extend test_stat= 0.0, p_val = 0.0
| invoke normality_test_fl('sample1', 'test_stat', 'p_val')

Stored

datatable(id:string, sample1:dynamic) [
'Test #1', dynamic([23.64, 20.57, 20.42, 27.1, 22.12, 33.56, 23.64, 20.57]),
'Test #2', dynamic([20.85, 21.89, 23.41, 35.09, 30.02, 26.52, 20.85, 21.89]),
'Test #3', dynamic([20.13, 20.5, 21.7, 22.02, 32.2, 32.79, 33.9, 34.22, 20.13, 20.5])
]
| extend test_stat= 0.0, p_val = 0.0
| invoke normality_test_fl('sample1', 'test_stat', 'p_val')

Output

idsample1test_statp_val
Test #1[23.64, 20.57, 20.42, 27.1, 22.12, 33.56, 23.64, 20.57]7.48818731539410360.023657060728893706
Test #2[20.85, 21.89, 23.41, 35.09, 30.02, 26.52, 20.85, 21.89]3.299827503302760.19206647332255408
Test #3[20.13, 20.5, 21.7, 22.02, 32.2, 32.79, 33.9, 34.22, 20.13, 20.5]6.98684338513643240.030396685911910585

5.23 - pair_probabilities_fl()

This article describes the pair_probabilities_fl() user-defined function.

Calculate various probabilities and related metrics for a pair of categorical variables.

The function pair_probabilities_fl() is a UDF (user-defined function) that calculates the following probabilities and related metrics for a pair of categorical variables, A and B, as follows:

  • P(A) is the probability of each value A=a
  • P(B) is the probability of each value B=b
  • P(A|B) is the conditional probability of A=a given B=b
  • P(B|A) is the conditional probability of B=b given A=a
  • P(A∪B) is the union probability (A=a or B=b)
  • P(A∩B) is the intersection probability (A=a and B=b)
  • The lift metric is calculated as P(A∩B)/P(A)*P(B). For more information, see lift metric.
    • A lift near 1 means that the joint probability of two values is similar to what is expected in case that both variables are independent.
    • Lift » 1 means that values cooccur more often than expected under independence assumption.
    • Lift « 1 means that values are less likely to cooccur than expected under independence assumption.
  • The Jaccard similarity coefficient is calculated as P(A∩B)/P(A∪B). For more information, see Jaccard similarity coefficient.
    • A high Jaccard coefficient, close to 1, means that the values tend to occur together.
    • A low Jaccard coefficient, close to 0, means that the values tend to stay apart.

Syntax

pair_probabilities_fl(A, B, Scope)

Parameters

NameTypeRequiredDescription
Ascalar✔️The first categorical variable.
Bscalar✔️The second categorical variable.
Scopescalar✔️The field that contains the scope, so that the probabilities for A and B are calculated independently for each scope value.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let pair_probabilities_fl = (tbl:(*), A_col:string, B_col:string, scope_col:string)
{
let T = materialize(tbl | extend _A = column_ifexists(A_col, ''), _B = column_ifexists(B_col, ''), _scope = column_ifexists(scope_col, ''));
let countOnScope = T | summarize countAllOnScope = count() by _scope;
let probAB = T | summarize countAB = count() by _A, _B, _scope | join kind = leftouter (countOnScope) on _scope | extend P_AB = todouble(countAB)/countAllOnScope;
let probA  = probAB | summarize countA = sum(countAB), countAllOnScope = max(countAllOnScope) by _A, _scope | extend P_A = todouble(countA)/countAllOnScope;
let probB  = probAB | summarize countB = sum(countAB), countAllOnScope = max(countAllOnScope) by _B, _scope | extend P_B = todouble(countB)/countAllOnScope;
probAB
| join kind = leftouter (probA) on _A, _scope           // probability for each value of A
| join kind = leftouter (probB) on _B, _scope           // probability for each value of B
| extend P_AUB = P_A + P_B - P_AB                       // union probability
       , P_AIB = P_AB/P_B                               // conditional probability of A on B
       , P_BIA = P_AB/P_A                               // conditional probability of B on A
| extend Lift_AB = P_AB/(P_A * P_B)                     // lift metric
       , Jaccard_AB = P_AB/P_AUB                        // Jaccard similarity index
| project _A, _B, _scope, bin(P_A, 0.00001), bin(P_B, 0.00001), bin(P_AB, 0.00001), bin(P_AUB, 0.00001), bin(P_AIB, 0.00001)
, bin(P_BIA, 0.00001), bin(Lift_AB, 0.00001), bin(Jaccard_AB, 0.00001)
| sort by _scope, _A, _B
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Stats", docstring = "Calculate probabilities and related metrics for a pair of categorical variables")
pair_probabilities_fl (tbl:(*), A_col:string, B_col:string, scope_col:string)
{
let T = materialize(tbl | extend _A = column_ifexists(A_col, ''), _B = column_ifexists(B_col, ''), _scope = column_ifexists(scope_col, ''));
let countOnScope = T | summarize countAllOnScope = count() by _scope;
let probAB = T | summarize countAB = count() by _A, _B, _scope | join kind = leftouter (countOnScope) on _scope | extend P_AB = todouble(countAB)/countAllOnScope;
let probA  = probAB | summarize countA = sum(countAB), countAllOnScope = max(countAllOnScope) by _A, _scope | extend P_A = todouble(countA)/countAllOnScope;
let probB  = probAB | summarize countB = sum(countAB), countAllOnScope = max(countAllOnScope) by _B, _scope | extend P_B = todouble(countB)/countAllOnScope;
probAB
| join kind = leftouter (probA) on _A, _scope           // probability for each value of A
| join kind = leftouter (probB) on _B, _scope           // probability for each value of B
| extend P_AUB = P_A + P_B - P_AB                       // union probability
       , P_AIB = P_AB/P_B                               // conditional probability of A on B
       , P_BIA = P_AB/P_A                               // conditional probability of B on A
| extend Lift_AB = P_AB/(P_A * P_B)                     // lift metric
       , Jaccard_AB = P_AB/P_AUB                        // Jaccard similarity index
| project _A, _B, _scope, bin(P_A, 0.00001), bin(P_B, 0.00001), bin(P_AB, 0.00001), bin(P_AUB, 0.00001), bin(P_AIB, 0.00001)
, bin(P_BIA, 0.00001), bin(Lift_AB, 0.00001), bin(Jaccard_AB, 0.00001)
| sort by _scope, _A, _B
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let pair_probabilities_fl = (tbl:(*), A_col:string, B_col:string, scope_col:string)
{
let T = materialize(tbl | extend _A = column_ifexists(A_col, ''), _B = column_ifexists(B_col, ''), _scope = column_ifexists(scope_col, ''));
let countOnScope = T | summarize countAllOnScope = count() by _scope;
let probAB = T | summarize countAB = count() by _A, _B, _scope | join kind = leftouter (countOnScope) on _scope | extend P_AB = todouble(countAB)/countAllOnScope;
let probA  = probAB | summarize countA = sum(countAB), countAllOnScope = max(countAllOnScope) by _A, _scope | extend P_A = todouble(countA)/countAllOnScope;
let probB  = probAB | summarize countB = sum(countAB), countAllOnScope = max(countAllOnScope) by _B, _scope | extend P_B = todouble(countB)/countAllOnScope;
probAB
| join kind = leftouter (probA) on _A, _scope           // probability for each value of A
| join kind = leftouter (probB) on _B, _scope           // probability for each value of B
| extend P_AUB = P_A + P_B - P_AB                       // union probability
       , P_AIB = P_AB/P_B                               // conditional probability of A on B
       , P_BIA = P_AB/P_A                               // conditional probability of B on A
| extend Lift_AB = P_AB/(P_A * P_B)                     // lift metric
       , Jaccard_AB = P_AB/P_AUB                        // Jaccard similarity index
| project _A, _B, _scope, bin(P_A, 0.00001), bin(P_B, 0.00001), bin(P_AB, 0.00001), bin(P_AUB, 0.00001), bin(P_AIB, 0.00001)
, bin(P_BIA, 0.00001), bin(Lift_AB, 0.00001), bin(Jaccard_AB, 0.00001)
| sort by _scope, _A, _B
};
//
let dancePairs = datatable(boy:string, girl:string, dance_class:string)[
    'James',   'Mary',      'Modern',
    'James',   'Mary',      'Modern',
    'Robert',  'Mary',      'Modern',
    'Robert',  'Mary',      'Modern',
    'Michael', 'Patricia',  'Modern',
    'Michael', 'Patricia',  'Modern',
    'James',   'Patricia',  'Modern',
    'Robert',  'Patricia',  'Modern',
    'Michael', 'Patricia',  'Modern',
    'Michael', 'Patricia',  'Modern',
    'James',   'Linda',     'Modern',
    'James',   'Linda',     'Modern',
    'Robert',  'Linda',     'Modern',
    'Robert',  'Linda',     'Modern',
    'James',   'Linda',     'Modern',
    'Robert',  'Mary',      'Modern',
    'Michael', 'Patricia',  'Modern',
    'Michael', 'Patricia',  'Modern',
    'James',   'Linda',     'Modern',
    'Robert',  'Mary',      'Classic',
    'Robert',  'Linda',     'Classic',
    'James',   'Mary',      'Classic',
    'James',   'Linda',     'Classic'
];
dancePairs
| invoke pair_probabilities_fl('boy','girl', 'dance_class')

Stored

let dancePairs = datatable(boy:string, girl:string, dance_class:string)[
    'James',   'Mary',      'Modern',
    'James',   'Mary',      'Modern',
    'Robert',  'Mary',      'Modern',
    'Robert',  'Mary',      'Modern',
    'Michael', 'Patricia',  'Modern',
    'Michael', 'Patricia',  'Modern',
    'James',   'Patricia',  'Modern',
    'Robert',  'Patricia',  'Modern',
    'Michael', 'Patricia',  'Modern',
    'Michael', 'Patricia',  'Modern',
    'James',   'Linda',     'Modern',
    'James',   'Linda',     'Modern',
    'Robert',  'Linda',     'Modern',
    'Robert',  'Linda',     'Modern',
    'James',   'Linda',     'Modern',
    'Robert',  'Mary',      'Modern',
    'Michael', 'Patricia',  'Modern',
    'Michael', 'Patricia',  'Modern',
    'James',   'Linda',     'Modern',
    'Robert',  'Mary',      'Classic',
    'Robert',  'Linda',     'Classic',
    'James',   'Mary',      'Classic',
    'James',   'Linda',     'Classic'
];
dancePairs
| invoke pair_probabilities_fl('boy','girl', 'dance_class')

Output

Let’s look at list of pairs of people dancing at two dance classes supposedly at random to find out if anything looks anomalous (meaning, not random). We’ll start by looking at each class by itself.

The Michael-Patricia pair has a lift metric of 2.375, which is significantly above 1. This value means that they’re seen together much more often that what would be expected if this pairing was random. Their Jaccard coefficient is 0.75, which is close to 1. When the pair dances, they prefer to dance together.

ABscopeP_AP_BP_ABP_AUBP_AIBP_BIALift_ABJaccard_AB
RobertPatriciaModern0.315780.421050.052630.684210.124990.166660.395830.07692
RobertMaryModern0.315780.263150.157890.421050.599990.499991.899990.37499
RobertLindaModern0.315780.315780.105260.526310.333330.333331.055550.2
MichaelPatriciaModern0.315780.421050.315780.421050.750.999992.3750.75
JamesPatriciaModern0.368420.421050.052630.736840.124990.142850.339280.07142
JamesMaryModern0.368420.263150.105260.526310.40.285711.085710.2
JamesLindaModern0.368420.315780.210520.473680.666660.571421.809520.44444
RobertMaryClassic0.499990.499990.249990.750.499990.499990.999990.33333
RobertLindaClassic0.499990.499990.249990.750.499990.499990.999990.33333
JamesMaryClassic0.499990.499990.249990.750.499990.499990.999990.33333
JamesLindaClassic0.499990.499990.249990.750.499990.499990.999990.33333

5.24 - pairwise_dist_fl()

Learn how to use the pairwise_dist_fl() function to calculate the multivariate distance between data points in the same partition.

Calculate pairwise distances between entities based on multiple nominal and numerical variables.

The function pairwise_dist_fl() is a UDF (user-defined function) that calculates the multivariate distance between data points belonging to the same partition, taking into account nominal and numerical variables.

  • All string fields, besides entity and partition names, are considered nominal variables. The distance is equal to 1 if the values are different, and 0 if they’re the same.
  • All numerical fields are considered numerical variables. They’re normalized by transforming to z-scores and the distance is calculated as the absolute value of the difference. The total multivariate distance between data points is calculated as the average of the distances between variables.

A distance close to zero means that the entities are similar and a distance above 1 means they’re different. Similarly, an entity with an average distance close to or above one indicates that it’s different from many other entities in the partition, indicating a potential outlier.

Syntax

pairwise_dist_fl(entity, partition)

Parameters

NameTypeRequiredDescription
entitystring✔️The name of the input table column containing the names or IDs of the entities for which the distances will be calculated.
partitionstring✔️The name of the input table column containing the partition or scope, so that the distances are calculated for all pairs of entities under the same partition.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let pairwise_dist_fl = (tbl:(*), id_col:string, partition_col:string)
{
    let generic_dist = (value1:dynamic, value2:dynamic) 
    {
        // Calculates the distance between two values; treats all strings as nominal values and numbers as numerical,
        // can be extended to other data types or tweaked by adding weights or changing formulas.
            iff(gettype(value1[0]) == "string", todouble(tostring(value1[0]) != tostring(value2[0])), abs(todouble(value1[0]) - todouble(value2[0])))
    };
    let T = (tbl | extend _entity = column_ifexists(id_col, ''), _partition = column_ifexists(partition_col, '') | project-reorder _entity, _partition);
    let sum_data = (
        // Calculates summary statistics to be used for normalization.
        T
        | project-reorder _entity
        | project _partition, p = pack_array(*)
        | mv-expand with_itemindex=idx p
        | summarize count(), avg(todouble(p)), stdev(todouble(p)) by _partition, idx
        | sort by _partition, idx asc
        | summarize make_list(avg_p), make_list(stdev_p) by _partition
    );
    let normalized_data = (
        // Performs normalization on numerical variables by substrcting mean and scaling by standard deviation. Other normalization techniques can be used
        // by adding metrics to previous function and using here.
        T
        | project _partition, p = pack_array(*)
        | join kind = leftouter (sum_data) on _partition
        | mv-apply p, list_avg_p, list_stdev_p on (
            extend normalized = iff((not(isnan(todouble(list_avg_p))) and (list_stdev_p > 0)), pack_array((todouble(p) - todouble(list_avg_p))/todouble(list_stdev_p)), p)
            | summarize a = make_list(normalized) by _partition
        )
        | project _partition, a
    );
    let dist_data = (
        // Calculates distances of included variables and sums them up to get a multivariate distance between all entities under the same partition.
        normalized_data
        | join kind = inner (normalized_data) on _partition
        | project entity = tostring(a[0]), entity1 = tostring(a1[0]), a = array_slice(a, 1, -1), a1 = array_slice(a1, 1, -1), _partition
        | mv-apply a, a1 on 
        (
            project d = generic_dist(pack_array(a), pack_array(a1))
            | summarize d = make_list(d)
        )
        | extend dist = bin((1.0*array_sum(d)-1.0)/array_length(d), 0.0001) // -1 cancels the artifact distance calculated between entity names appearing in the bag and normalizes by number of features        
        | project-away d
        | where entity != entity1
        | sort by _partition asc, entity asc, dist asc
    );
    dist_data
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Stats", docstring = "Calculate distances between pairs of entites based on multiple nominal and numerical variables")
pairwise_dist_fl (tbl:(*), id_col:string, partition_col:string)
{
    let generic_dist = (value1:dynamic, value2:dynamic) 
    {
        // Calculates the distance between two values; treats all strings as nominal values and numbers as numerical,
        // can be extended to other data types or tweaked by adding weights or changing formulas.
            iff(gettype(value1[0]) == "string", todouble(tostring(value1[0]) != tostring(value2[0])), abs(todouble(value1[0]) - todouble(value2[0])))
    };
    let T = (tbl | extend _entity = column_ifexists(id_col, ''), _partition = column_ifexists(partition_col, '') | project-reorder _entity, _partition);
    let sum_data = (
        // Calculates summary statistics to be used for normalization.
        T
        | project-reorder _entity
        | project _partition, p = pack_array(*)
        | mv-expand with_itemindex=idx p
        | summarize count(), avg(todouble(p)), stdev(todouble(p)) by _partition, idx
        | sort by _partition, idx asc
        | summarize make_list(avg_p), make_list(stdev_p) by _partition
    );
    let normalized_data = (
        // Performs normalization on numerical variables by substrcting mean and scaling by standard deviation. Other normalization techniques can be used
        // by adding metrics to previous function and using here.
        T
        | project _partition, p = pack_array(*)
        | join kind = leftouter (sum_data) on _partition
        | mv-apply p, list_avg_p, list_stdev_p on (
            extend normalized = iff((not(isnan(todouble(list_avg_p))) and (list_stdev_p > 0)), pack_array((todouble(p) - todouble(list_avg_p))/todouble(list_stdev_p)), p)
            | summarize a = make_list(normalized) by _partition
        )
        | project _partition, a
    );
    let dist_data = (
        // Calculates distances of included variables and sums them up to get a multivariate distance between all entities under the same partition.
        normalized_data
        | join kind = inner (normalized_data) on _partition
        | project entity = tostring(a[0]), entity1 = tostring(a1[0]), a = array_slice(a, 1, -1), a1 = array_slice(a1, 1, -1), _partition
        | mv-apply a, a1 on 
        (
            project d = generic_dist(pack_array(a), pack_array(a1))
            | summarize d = make_list(d)
        )
        | extend dist = bin((1.0*array_sum(d)-1.0)/array_length(d), 0.0001) // -1 cancels the artifact distance calculated between entity names appearing in the bag and normalizes by number of features        
        | project-away d
        | where entity != entity1
        | sort by _partition asc, entity asc, dist asc
    );
    dist_data
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let pairwise_dist_fl = (tbl:(*), id_col:string, partition_col:string)
{
    let generic_dist = (value1:dynamic, value2:dynamic) 
    {
        // Calculates the distance between two values; treats all strings as nominal values and numbers as numerical,
        // can be extended to other data types or tweaked by adding weights or changing formulas.
            iff(gettype(value1[0]) == "string", todouble(tostring(value1[0]) != tostring(value2[0])), abs(todouble(value1[0]) - todouble(value2[0])))
    };
    let T = (tbl | extend _entity = column_ifexists(id_col, ''), _partition = column_ifexists(partition_col, '') | project-reorder _entity, _partition);
    let sum_data = (
        // Calculates summary statistics to be used for normalization.
        T
        | project-reorder _entity
        | project _partition, p = pack_array(*)
        | mv-expand with_itemindex=idx p
        | summarize count(), avg(todouble(p)), stdev(todouble(p)) by _partition, idx
        | sort by _partition, idx asc
        | summarize make_list(avg_p), make_list(stdev_p) by _partition
    );
    let normalized_data = (
        // Performs normalization on numerical variables by substrcting mean and scaling by standard deviation. Other normalization techniques can be used
        // by adding metrics to previous function and using here.
        T
        | project _partition, p = pack_array(*)
        | join kind = leftouter (sum_data) on _partition
        | mv-apply p, list_avg_p, list_stdev_p on (
            extend normalized = iff((not(isnan(todouble(list_avg_p))) and (list_stdev_p > 0)), pack_array((todouble(p) - todouble(list_avg_p))/todouble(list_stdev_p)), p)
            | summarize a = make_list(normalized) by _partition
        )
        | project _partition, a
    );
    let dist_data = (
        // Calculates distances of included variables and sums them up to get a multivariate distance between all entities under the same partition.
        normalized_data
        | join kind = inner (normalized_data) on _partition
        | project entity = tostring(a[0]), entity1 = tostring(a1[0]), a = array_slice(a, 1, -1), a1 = array_slice(a1, 1, -1), _partition
        | mv-apply a, a1 on 
        (
            project d = generic_dist(pack_array(a), pack_array(a1))
            | summarize d = make_list(d)
        )
        | extend dist = bin((1.0*array_sum(d)-1.0)/array_length(d), 0.0001) // -1 cancels the artifact distance calculated between entity names appearing in the bag and normalizes by number of features        
        | project-away d
        | where entity != entity1
        | sort by _partition asc, entity asc, dist asc
    );
    dist_data
};
//
let raw_data = datatable(name:string, gender: string, height:int, weight:int, limbs:int, accessory:string, type:string)[
    'Andy',     'M',    160,    80,     4,  'Hat',      'Person',
    'Betsy',    'F',    170,    70,     4,  'Bag',      'Person',
    'Cindy',    'F',    130,    30,     4,  'Hat',      'Person',
    'Dan',      'M',    190,    105,    4,  'Hat',      'Person',
    'Elmie',    'M',    110,    30,     4,  'Toy',      'Person',
    'Franny',   'F',    170,    65,     4,  'Bag',      'Person',
    'Godzilla', '?',    260,    210,    5,  'Tail',     'Person',
    'Hannie',   'F',    112,    28,     4,  'Toy',      'Person',
    'Ivie',     'F',    105,    20,     4,  'Toy',      'Person',
    'Johnnie',  'M',    107,    21,     4,  'Toy',      'Person',
    'Kyle',     'M',    175,    76,     4,  'Hat',      'Person',
    'Laura',    'F',    180,    70,     4,  'Bag',      'Person',
    'Mary',     'F',    160,    60,     4,  'Bag',      'Person',
    'Noah',     'M',    178,    90,     4,  'Hat',      'Person',
    'Odelia',   'F',    186,    76,     4,  'Bag',      'Person',
    'Paul',     'M',    158,    69,     4,  'Bag',      'Person',
    'Qui',      'F',    168,    62,     4,  'Bag',      'Person',
    'Ronnie',   'M',    108,    26,     4,  'Toy',      'Person',
    'Sonic',    'F',    52,     20,     6,  'Tail',     'Pet',
    'Tweety',   'F',    52,     20,     6,  'Tail',     'Pet' ,
    'Ulfie',    'M',    39,     29,     4,  'Wings',    'Pet',
    'Vinnie',   'F',    53,     22,     1,  'Tail',     'Pet',
    'Waldo',    'F',    51,     21,     4,  'Tail',     'Pet',
    'Xander',   'M',    50,     24,     4,  'Tail',     'Pet'
];
raw_data
| invoke pairwise_dist_fl('name', 'type')
| where _partition == 'Person' | sort by entity asc, entity1 asc
| evaluate pivot (entity, max(dist), entity1) | sort by entity1 asc

Stored

let raw_data = datatable(name:string, gender: string, height:int, weight:int, limbs:int, accessory:string, type:string)[
    'Andy',     'M',    160,    80,     4,  'Hat',      'Person',
    'Betsy',    'F',    170,    70,     4,  'Bag',      'Person',
    'Cindy',    'F',    130,    30,     4,  'Hat',      'Person',
    'Dan',      'M',    190,    105,    4,  'Hat',      'Person',
    'Elmie',    'M',    110,    30,     4,  'Toy',      'Person',
    'Franny',   'F',    170,    65,     4,  'Bag',      'Person',
    'Godzilla', '?',    260,    210,    5,  'Tail',     'Person',
    'Hannie',   'F',    112,    28,     4,  'Toy',      'Person',
    'Ivie',     'F',    105,    20,     4,  'Toy',      'Person',
    'Johnnie',  'M',    107,    21,     4,  'Toy',      'Person',
    'Kyle',     'M',    175,    76,     4,  'Hat',      'Person',
    'Laura',    'F',    180,    70,     4,  'Bag',      'Person',
    'Mary',     'F',    160,    60,     4,  'Bag',      'Person',
    'Noah',     'M',    178,    90,     4,  'Hat',      'Person',
    'Odelia',   'F',    186,    76,     4,  'Bag',      'Person',
    'Paul',     'M',    158,    69,     4,  'Bag',      'Person',
    'Qui',      'F',    168,    62,     4,  'Bag',      'Person',
    'Ronnie',   'M',    108,    26,     4,  'Toy',      'Person',
    'Sonic',    'F',    52,     20,     6,  'Tail',     'Pet',
    'Tweety',   'F',    52,     20,     6,  'Tail',     'Pet' ,
    'Ulfie',    'M',    39,     29,     4,  'Wings',    'Pet',
    'Vinnie',   'F',    53,     22,     1,  'Tail',     'Pet',
    'Woody',    'F',    51,     21,     4,  'Tail',     'Pet',
    'Xander',   'M',    50,     24,     4,  'Tail',     'Pet'
];
raw_data
| invoke pairwise_dist_fl('name', 'type')
| where _partition == 'Person' | sort by entity asc, entity1 asc
| evaluate pivot (entity, max(dist), entity1) | sort by entity1 asc

Output

entity1AndyBetsyCindyDanElmieFrannyGodzillaHannie
Andy0.3540.41250.18870.48430.37021.20870.6265
Betsy0.3540.4160.47080.63070.01611.20510.4872
Cindy0.41250.4160.60120.35750.39981.47830.214
Dan0.18870.47080.60120.6730.4871.01990.8152
Elmie0.48430.63070.35750.6730.61451.55020.1565
Franny0.37020.01610.39980.4870.61451.22130.471
Godzilla1.20871.20511.47831.01991.55021.22131.5495
Hannie0.62650.48720.2140.81520.15650.4711.5495

Looking at entities of two different types, we would like to calculate distance between entities belonging to the same type, by taking into account both nominal variables (such as gender or preferred accessory) and numerical variables (such as the number of limbs, height, and weight). The numerical variables are on different scales and must be centralized and scaled, which is done automatically. The output is pairs of entities under the same partition with calculated multivariate distance. It can be analyzed directly, visualized as a distance matrix or scatterplot, or used as input data for outlier detection algorithm by calculating mean distance per entity, with entities with high values indicating global outliers. For example, when adding an optional visualization using a distance matrix, you get a table as shown in the sample. From the sample, you can see that:

  • Some pairs of entities (Betsy and Franny) have a low distance value (close to 0) indicating they’re similar.
  • Some pairs of entities (Godzilla and Elmie) have a high distance value (1 or above) indicating they’re different.

The output can further be used to calculate the average distance per entity. A high average distance might indicate global outliers. For example, we can see that on average Godzilla has a high distance from the others indicating that it’s a probable global outlier.

5.25 - percentiles_linear_fl()

Learn how to use the percentiles_linear_fl() function to calculate percentiles using the linear interpolation between closest ranks.

The function percentiles_linear_fl() is a user-defined function (UDF) that calculates percentiles using linear interpolation between closest ranks, the same method used by Excel’s PERCENTILES.INC function. Kusto native percentile functions use the nearest rank method. For large sets of values the difference between both methods is insignificant, and we recommend using the native function for best performance. For further details on these and additional percentile calculation methods have a look at percentile article on Wikipedia. The function accepts a table containing the column to calculate on and an optional grouping key, and a dynamic array of the required percentiles, and returns a column containing dynamic array of the percentiles’ values per each group.

Syntax

T | invoke percentiles_linear_fl(val_col, pct_arr [, aggr_col ])

Parameters

NameTypeRequiredDescription
val_colstring✔️The name of the column that contains the values with which to calculate the percentiles.
pct_arrdynamic✔️A numerical array containing the required percentiles. Each percentile should be in the range [0-100].
aggr_colstringThe name of the column that contains the grouping key.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let percentiles_linear_fl=(tbl:(*), val_col:string, pct_arr:dynamic, aggr_col:string='')
{
    tbl
    | extend _vals = column_ifexists(val_col, 0.0)
    | extend _key = column_ifexists(aggr_col, 'ALL')
    | order by _key asc, _vals asc 
    | summarize _vals=make_list(_vals) by _key
    | extend n = array_length(_vals)
    | extend pct=pct_arr
    | mv-apply pct to typeof(real) on (
          extend index=pct/100.0*(n-1)
        | extend low_index=tolong(floor(index, 1)), high_index=tolong(ceiling(index))
        | extend interval=todouble(_vals[high_index])-todouble(_vals[low_index])
        | extend pct_val=todouble(_vals[low_index])+(index-low_index)*interval
        | summarize pct_arr=make_list(pct), pct_val=make_list(pct_val))
    | project-away n
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Stats", docstring = "Calculate linear interpolated percentiles (identical to Excel's PERCENTILE.INC)")
percentiles_linear_fl(tbl:(*), val_col:string, pct_arr:dynamic, aggr_col:string='')
{
    tbl
    | extend _vals = column_ifexists(val_col, 0.0)
    | extend _key = column_ifexists(aggr_col, 'ALL')
    | order by _key asc, _vals asc 
    | summarize _vals=make_list(_vals) by _key
    | extend n = array_length(_vals)
    | extend pct=pct_arr
    | mv-apply pct to typeof(real) on (
          extend index=pct/100.0*(n-1)
        | extend low_index=tolong(floor(index, 1)), high_index=tolong(ceiling(index))
        | extend interval=todouble(_vals[high_index])-todouble(_vals[low_index])
        | extend pct_val=todouble(_vals[low_index])+(index-low_index)*interval
        | summarize pct_arr=make_list(pct), pct_val=make_list(pct_val))
    | project-away n
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let percentiles_linear_fl=(tbl:(*), val_col:string, pct_arr:dynamic, aggr_col:string='')
{
    tbl
    | extend _vals = column_ifexists(val_col, 0.0)
    | extend _key = column_ifexists(aggr_col, 'ALL')
    | order by _key asc, _vals asc 
    | summarize _vals=make_list(_vals) by _key
    | extend n = array_length(_vals)
    | extend pct=pct_arr
    | mv-apply pct to typeof(real) on (
          extend index=pct/100.0*(n-1)
        | extend low_index=tolong(floor(index, 1)), high_index=tolong(ceiling(index))
        | extend interval=todouble(_vals[high_index])-todouble(_vals[low_index])
        | extend pct_val=todouble(_vals[low_index])+(index-low_index)*interval
        | summarize pct_arr=make_list(pct), pct_val=make_list(pct_val))
    | project-away n
};
datatable(x:long, name:string) [
5, 'A',
9, 'A',
7, 'A',
5, 'B',
7, 'B',
7, 'B',
10, 'B',
]
| invoke percentiles_linear_fl('x', dynamic([0, 25, 50, 75, 100]), 'name')
| project-rename name=_key, x=_vals

Stored

datatable(x:long, name:string) [
5, 'A',
9, 'A',
7, 'A',
5, 'B',
7, 'B',
7, 'B',
10, 'B',
]
| invoke percentiles_linear_fl('x', dynamic([0, 25, 50, 75, 100]), 'name')
| project-rename name=_key, x=_vals

Output

namexpct_arrpct_val
A[5,7,9][0,25,50,75,100][5,6,7,8,9]
B[5,7,7,10][0,25,50,75,100][5,6.5,7,7.75,10]

5.26 - perm_fl()

This article describes perm_fl() user-defined function.

Calculate P(n, k)

The function perm_fl() is a user-defined function (UDF) that calculates P(n, k), the number of permutations for selection of k items out of n, with order. It’s based on the native gamma() function to calculate factorial, (see facorial_fl()). For selection of k items without order, use comb_fl().

Syntax

perm_fl(n, k)

Parameters

NameTypeRequiredDescription

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let perm_fl=(n:int, k:int)
{
    let fact_n = gamma(n+1);
    let fact_nk = gamma(n-k+1);
    tolong(fact_n/fact_nk)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Stats", docstring = "Calculate number of permutations for selection of k items out of n items with order")
perm_fl(n:int, k:int)
{
    let fact_n = gamma(n+1);
    let fact_nk = gamma(n-k+1);
    tolong(fact_n/fact_nk)
}

Example

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let perm_fl=(n:int, k:int)
{
    let fact_n = gamma(n+1);
    let fact_nk = gamma(n-k+1);
    tolong(fact_n/fact_nk)
}
;
range n from 3 to 10 step 3
| extend k = n-2
| extend pnk = perm_fl(n, k)

Stored

range n from 3 to 10 step 3
| extend k = n-2
| extend pnk = perm_fl(n, k)

Output

nkpnk
313
64360
97181440

5.27 - plotly_anomaly_fl()

Learn how to use the plotly_anomaly_fl() user-defined function.

The function plotly_anomaly_fl() is a user-defined function (UDF) that allows you to customize a plotly template to create an interactive anomaly chart.

The function accepts a table containing the source and the baseline time series, lists of positive and negative anomalies with their respective sizes, and chart labeling string. The function returns a single cell table containing plotly JSON. Optionally, you can render the data in an Azure Data Explorer dashboard tile. For more information, see Plotly (preview). The function accepts a table containing the source and the baseline time series, lists of positive and negative anomalies with their respective sizes, and chart labeling string. The function returns a single cell table containing plotly JSON. Optionally, you can render the data in a Real-Time dashboard tile. For more information, see Plotly (preview).

Prerequisite

Extract the required ‘anomaly’ template from the publicly available PlotlyTemplate table. Copy this table from the Samples database to your database by running the following KQL command from your target database:

.set PlotlyTemplate <| cluster('help.kusto.windows.net').database('Samples').PlotlyTemplate

Syntax

T | invoke plotly_anomaly_fl(time_col, val_col, baseline_col, time_high_col, val_high_col, size_high_col, time_low_col, val_low__col, size_low_col, chart_title, series_name, val_name)

Parameters

NameTypeRequiredDescription
time_colstring✔️The name of the column containing the dynamic array of the time points of the original time series
val_colstring✔️The name of the column containing the values of the original time series
baseline_colstring✔️The name of the column containing the values of the baseline time series. Anomalies are usually detected by large value offset from the expected baseline value.
time_high_colstring✔️The name of the column containing the time points of high (above the baseline) anomalies
val_high_colstring✔️The name of the column containing the values of the high anomalies
size_high_colstring✔️The name of the column containing the marker sizes of the high anomalies
time_low_colstring✔️The name of the column containing the time points of low anomalies
val_low_colstring✔️The name of the column containing the values of the low anomalies
size_low_colstring✔️The name of the column containing the marker sizes of the low anomalies
chart_titlestringChart title, default is ‘Anomaly Chart’
series_namestringTime series name, default is ‘Metric’
val_namestringValue axis name, default is ‘Value’

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let plotly_anomaly_fl=(tbl:(*), time_col:string, val_col:string, baseline_col:string, time_high_col:string , val_high_col:string, size_high_col:string,
                                time_low_col:string, val_low_col:string, size_low_col:string,
                                chart_title:string='Anomaly chart', series_name:string='Metric', val_name:string='Value')
{
    let anomaly_chart = toscalar(PlotlyTemplate | where name == "anomaly" | project plotly);
    let tbl_ex = tbl | extend _timestamp = column_ifexists(time_col, datetime(null)), _values = column_ifexists(val_col, 0.0), _baseline = column_ifexists(baseline_col, 0.0),
                              _high_timestamp = column_ifexists(time_high_col, datetime(null)), _high_values = column_ifexists(val_high_col, 0.0), _high_size = column_ifexists(size_high_col, 1),
                              _low_timestamp = column_ifexists(time_low_col, datetime(null)), _low_values = column_ifexists(val_low_col, 0.0), _low_size = column_ifexists(size_low_col, 1);
    tbl_ex
    | extend plotly = anomaly_chart
    | extend plotly=replace_string(plotly, '$TIME_STAMPS$', tostring(_timestamp))
    | extend plotly=replace_string(plotly, '$SERIES_VALS$', tostring(_values))
    | extend plotly=replace_string(plotly, '$BASELINE_VALS$', tostring(_baseline))
    | extend plotly=replace_string(plotly, '$TIME_STAMPS_HIGH_ANOMALIES$', tostring(_high_timestamp))
    | extend plotly=replace_string(plotly, '$HIGH_ANOMALIES_VALS$', tostring(_high_values))
    | extend plotly=replace_string(plotly, '$HIGH_ANOMALIES_MARKER_SIZE$', tostring(_high_size))
    | extend plotly=replace_string(plotly, '$TIME_STAMPS_LOW_ANOMALIES$', tostring(_low_timestamp))
    | extend plotly=replace_string(plotly, '$LOW_ANOMALIES_VALS$', tostring(_low_values))
    | extend plotly=replace_string(plotly, '$LOW_ANOMALIES_MARKER_SIZE$', tostring(_low_size))
    | extend plotly=replace_string(plotly, '$TITLE$', chart_title)
    | extend plotly=replace_string(plotly, '$SERIES_NAME$', series_name)
    | extend plotly=replace_string(plotly, '$Y_NAME$', val_name)
    | project plotly
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Plotly", docstring = "Render anomaly chart using plotly template")
plotly_anomaly_fl(tbl:(*), time_col:string, val_col:string, baseline_col:string, time_high_col:string , val_high_col:string, size_high_col:string,
                                time_low_col:string, val_low_col:string, size_low_col:string,
                                chart_title:string='Anomaly chart', series_name:string='Metric', val_name:string='Value')
{
    let anomaly_chart = toscalar(PlotlyTemplate | where name == "anomaly" | project plotly);
    let tbl_ex = tbl | extend _timestamp = column_ifexists(time_col, datetime(null)), _values = column_ifexists(val_col, 0.0), _baseline = column_ifexists(baseline_col, 0.0),
                              _high_timestamp = column_ifexists(time_high_col, datetime(null)), _high_values = column_ifexists(val_high_col, 0.0), _high_size = column_ifexists(size_high_col, 1),
                              _low_timestamp = column_ifexists(time_low_col, datetime(null)), _low_values = column_ifexists(val_low_col, 0.0), _low_size = column_ifexists(size_low_col, 1);
    tbl_ex
    | extend plotly = anomaly_chart
    | extend plotly=replace_string(plotly, '$TIME_STAMPS$', tostring(_timestamp))
    | extend plotly=replace_string(plotly, '$SERIES_VALS$', tostring(_values))
    | extend plotly=replace_string(plotly, '$BASELINE_VALS$', tostring(_baseline))
    | extend plotly=replace_string(plotly, '$TIME_STAMPS_HIGH_ANOMALIES$', tostring(_high_timestamp))
    | extend plotly=replace_string(plotly, '$HIGH_ANOMALIES_VALS$', tostring(_high_values))
    | extend plotly=replace_string(plotly, '$HIGH_ANOMALIES_MARKER_SIZE$', tostring(_high_size))
    | extend plotly=replace_string(plotly, '$TIME_STAMPS_LOW_ANOMALIES$', tostring(_low_timestamp))
    | extend plotly=replace_string(plotly, '$LOW_ANOMALIES_VALS$', tostring(_low_values))
    | extend plotly=replace_string(plotly, '$LOW_ANOMALIES_MARKER_SIZE$', tostring(_low_size))
    | extend plotly=replace_string(plotly, '$TITLE$', chart_title)
    | extend plotly=replace_string(plotly, '$SERIES_NAME$', series_name)
    | extend plotly=replace_string(plotly, '$Y_NAME$', val_name)
    | project plotly
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let plotly_anomaly_fl=(tbl:(*), time_col:string, val_col:string, baseline_col:string, time_high_col:string , val_high_col:string, size_high_col:string,
                                time_low_col:string, val_low_col:string, size_low_col:string,
                                chart_title:string='Anomaly chart', series_name:string='Metric', val_name:string='Value')
{
    let anomaly_chart = toscalar(PlotlyTemplate | where name == "anomaly" | project plotly);
    let tbl_ex = tbl | extend _timestamp = column_ifexists(time_col, datetime(null)), _values = column_ifexists(val_col, 0.0), _baseline = column_ifexists(baseline_col, 0.0),
                              _high_timestamp = column_ifexists(time_high_col, datetime(null)), _high_values = column_ifexists(val_high_col, 0.0), _high_size = column_ifexists(size_high_col, 1),
                              _low_timestamp = column_ifexists(time_low_col, datetime(null)), _low_values = column_ifexists(val_low_col, 0.0), _low_size = column_ifexists(size_low_col, 1);
    tbl_ex
    | extend plotly = anomaly_chart
    | extend plotly=replace_string(plotly, '$TIME_STAMPS$', tostring(_timestamp))
    | extend plotly=replace_string(plotly, '$SERIES_VALS$', tostring(_values))
    | extend plotly=replace_string(plotly, '$BASELINE_VALS$', tostring(_baseline))
    | extend plotly=replace_string(plotly, '$TIME_STAMPS_HIGH_ANOMALIES$', tostring(_high_timestamp))
    | extend plotly=replace_string(plotly, '$HIGH_ANOMALIES_VALS$', tostring(_high_values))
    | extend plotly=replace_string(plotly, '$HIGH_ANOMALIES_MARKER_SIZE$', tostring(_high_size))
    | extend plotly=replace_string(plotly, '$TIME_STAMPS_LOW_ANOMALIES$', tostring(_low_timestamp))
    | extend plotly=replace_string(plotly, '$LOW_ANOMALIES_VALS$', tostring(_low_values))
    | extend plotly=replace_string(plotly, '$LOW_ANOMALIES_MARKER_SIZE$', tostring(_low_size))
    | extend plotly=replace_string(plotly, '$TITLE$', chart_title)
    | extend plotly=replace_string(plotly, '$SERIES_NAME$', series_name)
    | extend plotly=replace_string(plotly, '$Y_NAME$', val_name)
    | project plotly
};
let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
let marker_scale = 8;
let s_name = 'TS1';
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t step dt by sid
| where sid == s_name
| extend (anomalies, score, baseline) = series_decompose_anomalies(num, 1.5, -1, 'linefit')
| mv-apply num1=num to typeof(double), anomalies1=anomalies to typeof(double), score1=score to typeof(double), TimeStamp1=TimeStamp to typeof(datetime)  on (
    summarize pAnomalies=make_list_if(num1, anomalies1 > 0), pTimeStamp=make_list_if(TimeStamp1, anomalies1 > 0), pSize=make_list_if(toint(score1*marker_scale), anomalies1 > 0),
              nAnomalies=make_list_if(num1, anomalies1 < 0), nTimeStamp=make_list_if(TimeStamp1, anomalies1 < 0), nSize=make_list_if(toint(-score1*marker_scale), anomalies1 < 0)
)
| invoke plotly_anomaly_fl('TimeStamp', 'num', 'baseline', 'pTimeStamp', 'pAnomalies', 'pSize', 'nTimeStamp', 'nAnomalies', 'nSize',
                           chart_title='Anomaly chart using plotly_anomaly_fl()', series_name=s_name, val_name='# of requests')
| render plotly

Stored

let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
let marker_scale = 8;
let s_name = 'TS1';
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t step dt by sid
| where sid == s_name
| extend (anomalies, score, baseline) = series_decompose_anomalies(num, 1.5, -1, 'linefit')
| mv-apply num1=num to typeof(double), anomalies1=anomalies to typeof(double), score1=score to typeof(double), TimeStamp1=TimeStamp to typeof(datetime)  on (
    summarize pAnomalies=make_list_if(num1, anomalies1 > 0), pTimeStamp=make_list_if(TimeStamp1, anomalies1 > 0), pSize=make_list_if(toint(score1*marker_scale), anomalies1 > 0),
              nAnomalies=make_list_if(num1, anomalies1 < 0), nTimeStamp=make_list_if(TimeStamp1, anomalies1 < 0), nSize=make_list_if(toint(-score1*marker_scale), anomalies1 < 0)
)
| invoke plotly_anomaly_fl('TimeStamp', 'num', 'baseline', 'pTimeStamp', 'pAnomalies', 'pSize', 'nTimeStamp', 'nAnomalies', 'nSize',
                           chart_title='Anomaly chart using plotly_anomaly_fl()', series_name=s_name, val_name='# of requests')
| render plotly

Output

The output is a Plotly JSON string that can be rendered using ‘| render plotly’ or in an Azure Data Explorer dashboard tile. For more information on creating dashboard tiles, see Visualize data with Azure Data Explorer dashboards . The output is a Plotly JSON string that can be rendered in a Real-Time dashboard tile. For more information on creating dashboard tiles, see Real-Time dashboards.

The following image shows a sample anomaly chart using the above function:

Screenshot of anomaly chart of the sample dataset.

You can zoom in and hover over anomalies:

Screenshot of zoom in anomalous region. Screenshot of hover over anomaly.

5.28 - plotly_gauge_fl()

Learn how to use the plotly_gauge_fl() user-defined function.

The function plotly_gauge_fl() is a user-defined function (UDF) that allows you to customize a plotly template to create a gauge chart.

The function accepts few parameters to customize the gauge chart and returns a single cell table containing plotly JSON. Optionally, you can render the data in an Azure Data Explorer dashboard tile. For more information, see Plotly (preview). The function accepts few parameters to customize the gauge chart and returns a single cell table containing plotly JSON. Optionally, you can render the data in a Real-Time dashboard tile. For more information, see Plotly (preview).

Prerequisite

Extract the required ‘gauge’ template from the publicly available PlotlyTemplate table. Copy this table from the Samples database to your database by running the following KQL command from your target database:

.set PlotlyTemplate <| cluster('help.kusto.windows.net').database('Samples').PlotlyTemplate

Syntax

T | invoke plotly_gauge_fl(value, max_range, mode, chart_title, font_color, bar_color, bar_bg_color, tick_color, tick_width)

Parameters

NameTypeRequiredDescription
valuereal✔️The number to be displayed.
max_rangerangeThe maximum range of the gauge.
modestringSpecifies how the value is displayed on the graph. Default is ‘gauge+number’.
chart_titlestringThe chart title. The default is empty title.
font_colorstringThe chart’s font color. Default is ‘black’.
bar_colorstringThe gauge’s filled bar color. Default is ‘green’.
bar_bg_colorstringThe gauge’s not filled bar color. Default is ’lightgreen’.
tick_colorstringThe gauge’s ticks color. Default is ‘darkblue’.
tick_widthintThe gauge’s ticks width. Default is 1.

Plotly gauge charts support many parameters, still this function exposes only the above ones. For more information, see indicator traces reference.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let plotly_gauge_fl=(value:real, max_range:real=real(null), mode:string='gauge+number', chart_title:string='',font_color:string='black',
                    bar_color:string='green', bar_bg_color:string='lightgreen', tick_color:string='darkblue', tick_width:int=1)
{
    let gauge_chart = toscalar(PlotlyTemplate | where name == "gauge" | project plotly);
    print plotly = gauge_chart
    | extend plotly=replace_string(plotly, '$VALUE$', tostring(value))
    | extend plotly=replace_string(plotly, '$MAX_RANGE$', iff(isnull(max_range), 'null', tostring(max_range)))
    | extend plotly=replace_string(plotly, '$MODE$', mode)
    | extend plotly=replace_string(plotly, '$TITLE$', chart_title)
    | extend plotly=replace_string(plotly, '$FONT_COLOR$', font_color)
    | extend plotly=replace_string(plotly, '$BAR_COLOR$', bar_color)
    | extend plotly=replace_string(plotly, '$BAR_BG_COLOR$', bar_bg_color)
    | extend plotly=replace_string(plotly, '$TICK_COLOR$', tick_color)
    | extend plotly=replace_string(plotly, '$TICK_WIDTH$', tostring(tick_width))
    | project plotly
};
// Write your query to use your function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Plotly", docstring = "Render gauge chart using plotly template")
plotly_gauge_fl(value:real, max_range:real=real(null), mode:string='gauge+number', chart_title:string='',font_color:string='black',
                    bar_color:string='green', bar_bg_color:string='lightgreen', tick_color:string='darkblue', tick_width:int=1)
{
    let gauge_chart = toscalar(PlotlyTemplate | where name == "gauge" | project plotly);
    print plotly = gauge_chart
    | extend plotly=replace_string(plotly, '$VALUE$', tostring(value))
    | extend plotly=replace_string(plotly, '$MAX_RANGE$', iff(isnull(max_range), 'null', tostring(max_range)))
    | extend plotly=replace_string(plotly, '$MODE$', mode)
    | extend plotly=replace_string(plotly, '$TITLE$', chart_title)
    | extend plotly=replace_string(plotly, '$FONT_COLOR$', font_color)
    | extend plotly=replace_string(plotly, '$BAR_COLOR$', bar_color)
    | extend plotly=replace_string(plotly, '$BAR_BG_COLOR$', bar_bg_color)
    | extend plotly=replace_string(plotly, '$TICK_COLOR$', tick_color)
    | extend plotly=replace_string(plotly, '$TICK_WIDTH$', tostring(tick_width))
    | project plotly
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let plotly_gauge_fl=(value:real, max_range:real=real(null), mode:string='gauge+number', chart_title:string='',font_color:string='black',
                    bar_color:string='green', bar_bg_color:string='lightgreen', tick_color:string='darkblue', tick_width:int=1)
{
    let gauge_chart = toscalar(PlotlyTemplate | where name == "gauge" | project plotly);
    print plotly = gauge_chart
    | extend plotly=replace_string(plotly, '$VALUE$', tostring(value))
    | extend plotly=replace_string(plotly, '$MAX_RANGE$', iff(isnull(max_range), 'null', tostring(max_range)))
    | extend plotly=replace_string(plotly, '$MODE$', mode)
    | extend plotly=replace_string(plotly, '$TITLE$', chart_title)
    | extend plotly=replace_string(plotly, '$FONT_COLOR$', font_color)
    | extend plotly=replace_string(plotly, '$BAR_COLOR$', bar_color)
    | extend plotly=replace_string(plotly, '$BAR_BG_COLOR$', bar_bg_color)
    | extend plotly=replace_string(plotly, '$TICK_COLOR$', tick_color)
    | extend plotly=replace_string(plotly, '$TICK_WIDTH$', tostring(tick_width))
    | project plotly
};
plotly_gauge_fl(value=180, chart_title='Speed', font_color='purple', tick_width=5)
| render plotly

Stored

plotly_gauge_fl(value=180, chart_title='Speed', font_color='purple', tick_width=5)
| render plotly

Output

The output is a Plotly JSON string that can be rendered in an Azure Data Explorer dashboard tile. For more information on creating dashboard tiles, see Visualize data with Azure Data Explorer dashboards. The output is a Plotly JSON string that can be rendered in a Real-Time dashboard tile. For more information on creating dashboard tiles, see Real-Time dashboards.

Screenshot of gauge chart with random data.

5.29 - plotly_scatter3d_fl()

Learn how to use the plotly_scatter3d_fl() user-defined function.

The function plotly_scatter3d_fl() is a user-defined function (UDF) that allows you to customize a plotly template to create an interactive 3D scatter chart.

The function accepts a table containing the records to be rendered, the names of the x, y, z & aggregation columns, and the chart title string. The function returns a single cell table containing plotly JSON. Optionally, you can render the data in an Azure Data Explorer dashboard tile. For more information, see Plotly (preview). The function accepts a table containing the records to be rendered, the names of the x, y, z & aggregation columns, and the chart title string. The function returns a single cell table containing plotly JSON. Optionally, you can render the data in a Real-Time dashboard tile. For more information, see Plotly (preview).

Prerequisite

Extract the required ‘scatter3d’ template from the publicly available PlotlyTemplate table. Copy this table from the Samples database to your database by running the following KQL command from your target database:

.set PlotlyTemplate <| cluster('help.kusto.windows.net').database('Samples').PlotlyTemplate

Syntax

T | invoke plotly_scatter3d_fl(x_col, y_col, z_col, aggr_col [, chart_title ])

Parameters

NameTypeRequiredDescription
x_colstring✔️The name of the column for the X coordinated of the 3D plot.
y_colstring✔️The name of the column for the Y coordinated of the 3D plot.
z_colstring✔️The name of the column for the Z coordinated of the 3D plot.
aggr_colstring✔️The name of the grouping column. Records in the same group are rendered in distinct color.
chart_titlestringThe chart title. The default is ‘3D Scatter chart’.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let plotly_scatter3d_fl=(tbl:(*), x_col:string, y_col:string, z_col:string, aggr_col:string='', chart_title:string='3D Scatter chart')
{
    let scatter3d_chart = toscalar(PlotlyTemplate | where name == "scatter3d" | project plotly);
    let tbl_ex = tbl | extend _x = column_ifexists(x_col, 0.0), _y = column_ifexists(y_col, 0.0), _z = column_ifexists(z_col, 0.0), _aggr = column_ifexists(aggr_col, 'ALL');
    tbl_ex
    | serialize 
    | summarize _x=pack_array(make_list(_x)), _y=pack_array(make_list(_y)), _z=pack_array(make_list(_z)) by _aggr
    | summarize _aggr=make_list(_aggr), _x=make_list(_x), _y=make_list(_y), _z=make_list(_z)
    | extend plotly = scatter3d_chart
    | extend plotly=replace_string(plotly, '$CLASS1$', tostring(_aggr[0]))
    | extend plotly=replace_string(plotly, '$CLASS2$', tostring(_aggr[1]))
    | extend plotly=replace_string(plotly, '$CLASS3$', tostring(_aggr[2]))
    | extend plotly=replace_string(plotly, '$X_NAME$', x_col)
    | extend plotly=replace_string(plotly, '$Y_NAME$', y_col)
    | extend plotly=replace_string(plotly, '$Z_NAME$', z_col)
    | extend plotly=replace_string(plotly, '$CLASS1_X$', tostring(_x[0]))
    | extend plotly=replace_string(plotly, '$CLASS1_Y$', tostring(_y[0]))
    | extend plotly=replace_string(plotly, '$CLASS1_Z$', tostring(_z[0]))
    | extend plotly=replace_string(plotly, '$CLASS2_X$', tostring(_x[1]))
    | extend plotly=replace_string(plotly, '$CLASS2_Y$', tostring(_y[1]))
    | extend plotly=replace_string(plotly, '$CLASS2_Z$', tostring(_z[1]))
    | extend plotly=replace_string(plotly, '$CLASS3_X$', tostring(_x[2]))
    | extend plotly=replace_string(plotly, '$CLASS3_Y$', tostring(_y[2]))
    | extend plotly=replace_string(plotly, '$CLASS3_Z$', tostring(_z[2]))
    | extend plotly=replace_string(plotly, '$TITLE$', chart_title)
    | project plotly
};
// Write your query to use your function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Plotly", docstring = "Render 3D scatter chart using plotly template")
plotly_scatter3d_fl(tbl:(*), x_col:string, y_col:string, z_col:string, aggr_col:string='', chart_title:string='3D Scatter chart')
{
    let scatter3d_chart = toscalar(PlotlyTemplate | where name == "scatter3d" | project plotly);
    let tbl_ex = tbl | extend _x = column_ifexists(x_col, 0.0), _y = column_ifexists(y_col, 0.0), _z = column_ifexists(z_col, 0.0), _aggr = column_ifexists(aggr_col, 'ALL');
    tbl_ex
    | serialize 
    | summarize _x=pack_array(make_list(_x)), _y=pack_array(make_list(_y)), _z=pack_array(make_list(_z)) by _aggr
    | summarize _aggr=make_list(_aggr), _x=make_list(_x), _y=make_list(_y), _z=make_list(_z)
    | extend plotly = scatter3d_chart
    | extend plotly=replace_string(plotly, '$CLASS1$', tostring(_aggr[0]))
    | extend plotly=replace_string(plotly, '$CLASS2$', tostring(_aggr[1]))
    | extend plotly=replace_string(plotly, '$CLASS3$', tostring(_aggr[2]))
    | extend plotly=replace_string(plotly, '$X_NAME$', x_col)
    | extend plotly=replace_string(plotly, '$Y_NAME$', y_col)
    | extend plotly=replace_string(plotly, '$Z_NAME$', z_col)
    | extend plotly=replace_string(plotly, '$CLASS1_X$', tostring(_x[0]))
    | extend plotly=replace_string(plotly, '$CLASS1_Y$', tostring(_y[0]))
    | extend plotly=replace_string(plotly, '$CLASS1_Z$', tostring(_z[0]))
    | extend plotly=replace_string(plotly, '$CLASS2_X$', tostring(_x[1]))
    | extend plotly=replace_string(plotly, '$CLASS2_Y$', tostring(_y[1]))
    | extend plotly=replace_string(plotly, '$CLASS2_Z$', tostring(_z[1]))
    | extend plotly=replace_string(plotly, '$CLASS3_X$', tostring(_x[2]))
    | extend plotly=replace_string(plotly, '$CLASS3_Y$', tostring(_y[2]))
    | extend plotly=replace_string(plotly, '$CLASS3_Z$', tostring(_z[2]))
    | extend plotly=replace_string(plotly, '$TITLE$', chart_title)
    | project plotly
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let plotly_scatter3d_fl=(tbl:(*), x_col:string, y_col:string, z_col:string, aggr_col:string='', chart_title:string='3D Scatter chart')
{
    let scatter3d_chart = toscalar(PlotlyTemplate | where name == "scatter3d" | project plotly);
    let tbl_ex = tbl | extend _x = column_ifexists(x_col, 0.0), _y = column_ifexists(y_col, 0.0), _z = column_ifexists(z_col, 0.0), _aggr = column_ifexists(aggr_col, 'ALL');
    tbl_ex
    | serialize 
    | summarize _x=pack_array(make_list(_x)), _y=pack_array(make_list(_y)), _z=pack_array(make_list(_z)) by _aggr
    | summarize _aggr=make_list(_aggr), _x=make_list(_x), _y=make_list(_y), _z=make_list(_z)
    | extend plotly = scatter3d_chart
    | extend plotly=replace_string(plotly, '$CLASS1$', tostring(_aggr[0]))
    | extend plotly=replace_string(plotly, '$CLASS2$', tostring(_aggr[1]))
    | extend plotly=replace_string(plotly, '$CLASS3$', tostring(_aggr[2]))
    | extend plotly=replace_string(plotly, '$X_NAME$', x_col)
    | extend plotly=replace_string(plotly, '$Y_NAME$', y_col)
    | extend plotly=replace_string(plotly, '$Z_NAME$', z_col)
    | extend plotly=replace_string(plotly, '$CLASS1_X$', tostring(_x[0]))
    | extend plotly=replace_string(plotly, '$CLASS1_Y$', tostring(_y[0]))
    | extend plotly=replace_string(plotly, '$CLASS1_Z$', tostring(_z[0]))
    | extend plotly=replace_string(plotly, '$CLASS2_X$', tostring(_x[1]))
    | extend plotly=replace_string(plotly, '$CLASS2_Y$', tostring(_y[1]))
    | extend plotly=replace_string(plotly, '$CLASS2_Z$', tostring(_z[1]))
    | extend plotly=replace_string(plotly, '$CLASS3_X$', tostring(_x[2]))
    | extend plotly=replace_string(plotly, '$CLASS3_Y$', tostring(_y[2]))
    | extend plotly=replace_string(plotly, '$CLASS3_Z$', tostring(_z[2]))
    | extend plotly=replace_string(plotly, '$TITLE$', chart_title)
    | project plotly
};
Iris
| invoke plotly_scatter3d_fl(x_col='SepalLength', y_col='PetalLength', z_col='SepalWidth', aggr_col='Class', chart_title='3D scatter chart using plotly_scatter3d_fl()')
| render plotly

Stored

Iris
| invoke plotly_scatter3d_fl(x_col='SepalLength', y_col='PetalLength', z_col='SepalWidth', aggr_col='Class', chart_title='3D scatter chart using plotly_scatter3d_fl()')

Output

The output is a Plotly JSON string that can be rendered in an Azure Data Explorer dashboard tile. For more information on creating dashboard tiles, see Visualize data with Azure Data Explorer dashboards. The output is a Plotly JSON string that can be rendered in a Real-Time dashboard tile. For more information on creating dashboard tiles, see Real-Time dashboards.

Screenshot of 3D scatter chart of a sample dataset.

You can rotate, zoom and hover over specific records:

Screenshot of rotated 3D scatter chart of a sample dataset.

5.30 - predict_fl()

This article describes the predict_fl() user-defined function.

The function predict_fl() is a user-defined function (UDF) that predicts using an existing trained machine learning model. This model was built using Scikit-learn, serialized to string, and saved in a standard table.

Syntax

T | invoke predict_fl(models_tbl, model_name, features_cols, pred_col)

Parameters

NameTypeRequiredDescription
models_tblstring✔️The name of the table that contains all serialized models. The table must have the following columns:
name: the model name
timestamp: time of model training
model: string representation of the serialized model
model_namestring✔️The name of the specific model to use.
features_colssynamic✔️An array containing the names of the features columns that are used by the model for prediction.
pred_colstring✔️The name of the column that stores the predictions.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let predict_fl=(samples:(*), models_tbl:(name:string, timestamp:datetime, model:string), model_name:string, features_cols:dynamic, pred_col:string)
{
    let model_str = toscalar(models_tbl | where name == model_name | top 1 by timestamp desc | project model);
    let kwargs = bag_pack('smodel', model_str, 'features_cols', features_cols, 'pred_col', pred_col);
    let code = ```if 1:
        
        import pickle
        import binascii
        
        smodel = kargs["smodel"]
        features_cols = kargs["features_cols"]
        pred_col = kargs["pred_col"]
        bmodel = binascii.unhexlify(smodel)
        clf1 = pickle.loads(bmodel)
        df1 = df[features_cols]
        predictions = clf1.predict(df1)
        
        result = df
        result[pred_col] = pd.DataFrame(predictions, columns=[pred_col])
        
    ```;
    samples
    | evaluate python(typeof(*), code, kwargs)
};
// Write your code to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create function with (folder = "Packages\\ML", docstring = "Predict using ML model, build by Scikit-learn")
predict_fl(samples:(*), models_tbl:(name:string, timestamp:datetime, model:string), model_name:string, features_cols:dynamic, pred_col:string)
{
    let model_str = toscalar(models_tbl | where name == model_name | top 1 by timestamp desc | project model);
    let kwargs = bag_pack('smodel', model_str, 'features_cols', features_cols, 'pred_col', pred_col);
    let code = ```if 1:
        
        import pickle
        import binascii
        
        smodel = kargs["smodel"]
        features_cols = kargs["features_cols"]
        pred_col = kargs["pred_col"]
        bmodel = binascii.unhexlify(smodel)
        clf1 = pickle.loads(bmodel)
        df1 = df[features_cols]
        predictions = clf1.predict(df1)
        
        result = df
        result[pred_col] = pd.DataFrame(predictions, columns=[pred_col])
        
    ```;
    samples
    | evaluate python(typeof(*), code, kwargs)
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let predict_fl=(samples:(*), models_tbl:(name:string, timestamp:datetime, model:string), model_name:string, features_cols:dynamic, pred_col:string)
{
    let model_str = toscalar(models_tbl | where name == model_name | top 1 by timestamp desc | project model);
    let kwargs = bag_pack('smodel', model_str, 'features_cols', features_cols, 'pred_col', pred_col);
    let code = ```if 1:
        
        import pickle
        import binascii
        
        smodel = kargs["smodel"]
        features_cols = kargs["features_cols"]
        pred_col = kargs["pred_col"]
        bmodel = binascii.unhexlify(smodel)
        clf1 = pickle.loads(bmodel)
        df1 = df[features_cols]
        predictions = clf1.predict(df1)
        
        result = df
        result[pred_col] = pd.DataFrame(predictions, columns=[pred_col])
        
    ```;
    samples
    | evaluate python(typeof(*), code, kwargs)
};
//
// Predicts room occupancy from sensors measurements, and calculates the confusion matrix
//
// Occupancy Detection is an open dataset from UCI Repository at https://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+
// It contains experimental data for binary classification of room occupancy from Temperature,Humidity,Light and CO2.
// Ground-truth labels were obtained from time stamped pictures that were taken every minute
//
OccupancyDetection 
| where Test == 1
| extend pred_Occupancy=false
| invoke predict_fl(ML_Models, 'Occupancy', pack_array('Temperature', 'Humidity', 'Light', 'CO2', 'HumidityRatio'), 'pred_Occupancy')
| summarize n=count() by Occupancy, pred_Occupancy

Stored

//
// Predicts room occupancy from sensors measurements, and calculates the confusion matrix
//
// Occupancy Detection is an open dataset from UCI Repository at https://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+
// It contains experimental data for binary classification of room occupancy from Temperature,Humidity,Light and CO2.
// Ground-truth labels were obtained from time stamped pictures that were taken every minute
//
OccupancyDetection 
| where Test == 1
| extend pred_Occupancy=false
| invoke predict_fl(ML_Models, 'Occupancy', pack_array('Temperature', 'Humidity', 'Light', 'CO2', 'HumidityRatio'), 'pred_Occupancy')
| summarize n=count() by Occupancy, pred_Occupancy

Output

Occupancypred_Occupancyn
TRUETRUE3006
FALSETRUE112
TRUEFALSE15
FALSEFALSE9284

Model asset

Get sample dataset and pre-trained model with Python plugin enabled.

//dataset
.set OccupancyDetection <| cluster('help').database('Samples').OccupancyDetection

//model
.set ML_Models <| datatable(name:string, timestamp:datetime, model:string) [
'Occupancy', datetime(now), '800363736b6c6561726e2e6c696e6561725f6d6f64656c2e6c6f6769737469630a4c6f67697374696352656772657373696f6e0a7100298171017d710228580700000070656e616c7479710358020000006c32710458040000006475616c7105895803000000746f6c7106473f1a36e2eb1c432d5801000000437107473ff0000000000000580d0000006669745f696e746572636570747108885811000000696e746572636570745f7363616c696e6771094b01580c000000636c6173735f776569676874710a4e580c00000072616e646f6d5f7374617465710b4e5806000000736f6c766572710c58090000006c69626c696e656172710d58080000006d61785f69746572710e4b64580b0000006d756c74695f636c617373710f58030000006f767271105807000000766572626f736571114b00580a0000007761726d5f737461727471128958060000006e5f6a6f627371134b015808000000636c61737365735f7114636e756d70792e636f72652e6d756c746961727261790a5f7265636f6e7374727563740a7115636e756d70790a6e6461727261790a71164b00857117430162711887711952711a284b014b0285711b636e756d70790a64747970650a711c58020000006231711d4b004b0187711e52711f284b0358010000007c71204e4e4e4affffffff4affffffff4b007471216289430200017122747123625805000000636f65665f7124681568164b008571256818877126527127284b014b014b05867128681c5802000000663871294b004b0187712a52712b284b0358010000003c712c4e4e4e4affffffff4affffffff4b0074712d628943286a02e0d50687e0bfc6d7c974fa93a63fb3d3b8080e6e943ffceb15defdad713f14c3a76bd73202bf712e74712f62580a000000696e746572636570745f7130681568164b008571316818877132527133284b014b01857134682b894308f1e89f57711290bf71357471366258070000006e5f697465725f7137681568164b00857138681887713952713a284b014b0185713b681c58020000006934713c4b004b0187713d52713e284b03682c4e4e4e4affffffff4affffffff4b0074713f628943040c00000071407471416258100000005f736b6c6561726e5f76657273696f6e71425806000000302e31392e32714375622e'
]

5.31 - predict_onnx_fl()

This article describes the predict_onnx_fl() user-defined function.

The function predict_onnx_fl() is a user-defined function (UDF) that predicts using an existing trained machine learning model. This model has been converted to ONNX format, serialized to string, and saved in a standard table.

Syntax

T | invoke predict_onnx_fl(models_tbl, model_name, features_cols, pred_col)

Parameters

NameTypeRequiredDescription
models_tblstring✔️The name of the table that contains all serialized models. The table must have the following columns:
name: the model name
timestamp: time of model training
model: string representation of the serialized model
model_namestring✔️The name of the specific model to use.
features_colssynamic✔️An array containing the names of the features columns that are used by the model for prediction.
pred_colstring✔️The name of the column that stores the predictions.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let predict_onnx_fl=(samples:(*), models_tbl:(name:string, timestamp:datetime, model:string), model_name:string, features_cols:dynamic, pred_col:string)
{
    let model_str = toscalar(models_tbl | where name == model_name | top 1 by timestamp desc | project model);
    let kwargs = bag_pack('smodel', model_str, 'features_cols', features_cols, 'pred_col', pred_col);
    let code = ```if 1:
    
    import binascii
    
    smodel = kargs["smodel"]
    features_cols = kargs["features_cols"]
    pred_col = kargs["pred_col"]
    bmodel = binascii.unhexlify(smodel)
    
    features_cols = kargs["features_cols"]
    pred_col = kargs["pred_col"]
    
    import onnxruntime as rt
    sess = rt.InferenceSession(bmodel)
    input_name = sess.get_inputs()[0].name
    label_name = sess.get_outputs()[0].name
    df1 = df[features_cols]
    predictions = sess.run([label_name], {input_name: df1.values.astype(np.float32)})[0]
    
    result = df
    result[pred_col] = pd.DataFrame(predictions, columns=[pred_col])
    
    ```;
    samples | evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\ML", docstring = "Predict using ONNX model")
predict_onnx_fl(samples:(*), models_tbl:(name:string, timestamp:datetime, model:string), model_name:string, features_cols:dynamic, pred_col:string)
{
    let model_str = toscalar(models_tbl | where name == model_name | top 1 by timestamp desc | project model);
    let kwargs = bag_pack('smodel', model_str, 'features_cols', features_cols, 'pred_col', pred_col);
    let code = ```if 1:
    
    import binascii
    
    smodel = kargs["smodel"]
    features_cols = kargs["features_cols"]
    pred_col = kargs["pred_col"]
    bmodel = binascii.unhexlify(smodel)
    
    features_cols = kargs["features_cols"]
    pred_col = kargs["pred_col"]
    
    import onnxruntime as rt
    sess = rt.InferenceSession(bmodel)
    input_name = sess.get_inputs()[0].name
    label_name = sess.get_outputs()[0].name
    df1 = df[features_cols]
    predictions = sess.run([label_name], {input_name: df1.values.astype(np.float32)})[0]
    
    result = df
    result[pred_col] = pd.DataFrame(predictions, columns=[pred_col])
    
    ```;
    samples | evaluate python(typeof(*), code, kwargs)
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let predict_onnx_fl=(samples:(*), models_tbl:(name:string, timestamp:datetime, model:string), model_name:string, features_cols:dynamic, pred_col:string)
{
    let model_str = toscalar(models_tbl | where name == model_name | top 1 by timestamp desc | project model);
    let kwargs = bag_pack('smodel', model_str, 'features_cols', features_cols, 'pred_col', pred_col);
    let code = ```if 1:
    
    import binascii
    
    smodel = kargs["smodel"]
    features_cols = kargs["features_cols"]
    pred_col = kargs["pred_col"]
    bmodel = binascii.unhexlify(smodel)
    
    features_cols = kargs["features_cols"]
    pred_col = kargs["pred_col"]
    
    import onnxruntime as rt
    sess = rt.InferenceSession(bmodel)
    input_name = sess.get_inputs()[0].name
    label_name = sess.get_outputs()[0].name
    df1 = df[features_cols]
    predictions = sess.run([label_name], {input_name: df1.values.astype(np.float32)})[0]
    
    result = df
    result[pred_col] = pd.DataFrame(predictions, columns=[pred_col])
    
    ```;
    samples | evaluate python(typeof(*), code, kwargs)
};
//
// Predicts room occupancy from sensors measurements, and calculates the confusion matrix
//
// Occupancy Detection is an open dataset from UCI Repository at https://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+
// It contains experimental data for binary classification of room occupancy from Temperature,Humidity,Light and CO2.
// Ground-truth labels were obtained from time stamped pictures that were taken every minute
//
OccupancyDetection 
| where Test == 1
| extend pred_Occupancy=bool(0)
| invoke predict_onnx_fl(ML_Models, 'ONNX-Occupancy', pack_array('Temperature', 'Humidity', 'Light', 'CO2', 'HumidityRatio'), 'pred_Occupancy')
| summarize n=count() by Occupancy, pred_Occupancy

Stored

//
// Predicts room occupancy from sensors measurements, and calculates the confusion matrix
//
// Occupancy Detection is an open dataset from UCI Repository at https://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+
// It contains experimental data for binary classification of room occupancy from Temperature,Humidity,Light and CO2.
// Ground-truth labels were obtained from time stamped pictures that were taken every minute
//
OccupancyDetection 
| where Test == 1
| extend pred_Occupancy=bool(0)
| invoke predict_onnx_fl(ML_Models, 'ONNX-Occupancy', pack_array('Temperature', 'Humidity', 'Light', 'CO2', 'HumidityRatio'), 'pred_Occupancy')
| summarize n=count() by Occupancy, pred_Occupancy

Output

Occupancypred_Occupancyn
TRUETRUE3006
FALSETRUE112
TRUEFALSE15
FALSEFALSE9284

5.32 - quantize_fl()

This article describes the quantize_fl() user-defined function.

The function quantize_fl() is a user-defined function (UDF) that bins metric columns. It quantizes metric columns to categorical labels, based on the K-Means algorithm.

Syntax

T | invoke quantize_fl(num_bins, in_cols, out_cols [, labels ])

Parameters

NameTypeRequiredDescription
num_binsint✔️The required number of bins.
in_colsdynamic✔️An array containing the names of the columns to quantize.
out_colsdynamic✔️An array containing the names of the respective output columns for the binned values.
labelsdynamicAn array containing the label names. If unspecified, bin ranges will be used.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let quantize_fl=(tbl:(*), num_bins:int, in_cols:dynamic, out_cols:dynamic, labels:dynamic=dynamic(null))
{
    let kwargs = bag_pack('num_bins', num_bins, 'in_cols', in_cols, 'out_cols', out_cols, 'labels', labels);
    let code = ```if 1:
        
        from sklearn.preprocessing import KBinsDiscretizer
        
        num_bins = kargs["num_bins"]
        in_cols = kargs["in_cols"]
        out_cols = kargs["out_cols"]
        labels = kargs["labels"]
        
        result = df
        binner = KBinsDiscretizer(n_bins=num_bins, encode="ordinal", strategy="kmeans")
        df_in = df[in_cols]
        bdata = binner.fit_transform(df_in)
        if labels is None:
            for i in range(len(out_cols)):    # loop on each column and convert it to binned labels
                ii = np.round(binner.bin_edges_[i], 3)
                labels = [str(ii[j-1]) + '-' + str(ii[j]) for j in range(1, num_bins+1)]
                result.loc[:,out_cols[i]] = np.take(labels, bdata[:, i].astype(int))
        else:
            result[out_cols] = np.take(labels, bdata.astype(int))
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create function with (folder = "Packages\\ML", docstring = "Binning metric columns")
quantize_fl(tbl:(*), num_bins:int, in_cols:dynamic, out_cols:dynamic, labels:dynamic)
{
    let kwargs = bag_pack('num_bins', num_bins, 'in_cols', in_cols, 'out_cols', out_cols, 'labels', labels);
    let code = ```if 1:
        
        from sklearn.preprocessing import KBinsDiscretizer
        
        num_bins = kargs["num_bins"]
        in_cols = kargs["in_cols"]
        out_cols = kargs["out_cols"]
        labels = kargs["labels"]
        
        result = df
        binner = KBinsDiscretizer(n_bins=num_bins, encode="ordinal", strategy="kmeans")
        df_in = df[in_cols]
        bdata = binner.fit_transform(df_in)
        if labels is None:
            for i in range(len(out_cols)):    # loop on each column and convert it to binned labels
                ii = np.round(binner.bin_edges_[i], 3)
                labels = [str(ii[j-1]) + '-' + str(ii[j]) for j in range(1, num_bins+1)]
                result.loc[:,out_cols[i]] = np.take(labels, bdata[:, i].astype(int))
        else:
            result[out_cols] = np.take(labels, bdata.astype(int))
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let quantize_fl=(tbl:(*), num_bins:int, in_cols:dynamic, out_cols:dynamic, labels:dynamic=dynamic(null))
{
    let kwargs = bag_pack('num_bins', num_bins, 'in_cols', in_cols, 'out_cols', out_cols, 'labels', labels);
    let code = ```if 1:
        
        from sklearn.preprocessing import KBinsDiscretizer
        
        num_bins = kargs["num_bins"]
        in_cols = kargs["in_cols"]
        out_cols = kargs["out_cols"]
        labels = kargs["labels"]
        
        result = df
        binner = KBinsDiscretizer(n_bins=num_bins, encode="ordinal", strategy="kmeans")
        df_in = df[in_cols]
        bdata = binner.fit_transform(df_in)
        if labels is None:
            for i in range(len(out_cols)):    # loop on each column and convert it to binned labels
                ii = np.round(binner.bin_edges_[i], 3)
                labels = [str(ii[j-1]) + '-' + str(ii[j]) for j in range(1, num_bins+1)]
                result.loc[:,out_cols[i]] = np.take(labels, bdata[:, i].astype(int))
        else:
            result[out_cols] = np.take(labels, bdata.astype(int))
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
};
//
union 
(range x from 1 to 5 step 1),
(range x from 10 to 15 step 1),
(range x from 20 to 25 step 1)
| extend x_label='', x_bin=''
| invoke quantize_fl(3, pack_array('x'), pack_array('x_label'), pack_array('Low', 'Med', 'High'))
| invoke quantize_fl(3, pack_array('x'), pack_array('x_bin'), dynamic(null))

Stored

union 
(range x from 1 to 5 step 1),
(range x from 10 to 15 step 1),
(range x from 20 to 25 step 1)
| extend x_label='', x_bin=''
| invoke quantize_fl(3, pack_array('x'), pack_array('x_label'), pack_array('Low', 'Med', 'High'))
| invoke quantize_fl(3, pack_array('x'), pack_array('x_bin'), dynamic(null))

Output

xx_labelx_bin
1Low1.0-7.75
2Low1.0-7.75
3Low1.0-7.75
4Low1.0-7.75
5Low1.0-7.75
20High17.5-25.0
21High17.5-25.0
22High17.5-25.0
23High17.5-25.0
24High17.5-25.0
25High17.5-25.0
10Med7.75-17.5
11Med7.75-17.5
12Med7.75-17.5
13Med7.75-17.5
14Med7.75-17.5
15Med7.75-17.5

5.33 - series_clean_anomalies_fl()

Learn how to use the series_clean_anomalies_fl() function to clean anomalous points in a series.

Cleans anomalous points in a series.

The function series_clean_anomalies_fl() is a user-defined function (UDF) that takes a dynamic numerical array as input and another numerical array of anomalies and replaces the anomalies in the input array with interpolated value of their adjacent points.

Syntax

series_clean_anomalies_fl(y_series, anomalies)

Parameters

NameTypeRequiredDescription
y_seriesdynamic✔️The input array of numeric values.
anomaliesdynamic✔️The anomalies array containing either 0 for normal points or any other value for anomalous points.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let series_clean_anomalies_fl = (y_series:dynamic, anomalies:dynamic)
{
    let fnum = array_iff(series_not_equals(anomalies, 0), real(null), y_series);  //  replace anomalies with null values
    series_fill_linear(fnum)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Series", docstring = "Replace anomalies by interpolated value", skipvalidation = "true")
series_clean_anomalies_fl(y_series:dynamic, anomalies:dynamic)
{
    let fnum = array_iff(series_not_equals(anomalies, 0), real(null), y_series);  //  replace anomalies with null values
    series_fill_linear(fnum)
}

Example

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let series_clean_anomalies_fl = (y_series:dynamic, anomalies:dynamic)
{
    let fnum = array_iff(series_not_equals(anomalies, 0), real(null), y_series);  //  replace anomalies with null values
    series_fill_linear(fnum)
}
;
let min_t = datetime(2016-08-29);
let max_t = datetime(2016-08-31);
demo_make_series1
| make-series num=count() on TimeStamp from min_t to max_t step 20m by OsVer
| extend anomalies = series_decompose_anomalies(num, 0.8)
| extend num_c = series_clean_anomalies_fl(num, anomalies)
| render anomalychart with (anomalycolumns=anomalies)

Stored

let min_t = datetime(2016-08-29);
let max_t = datetime(2016-08-31);
demo_make_series1
| make-series num=count() on TimeStamp from min_t to max_t step 20m by OsVer
| extend anomalies = series_decompose_anomalies(num, 0.8)
| extend num_c = series_clean_anomalies_fl(num, anomalies)
| render anomalychart with (anomalycolumns=anomalies)

Output

Graph of a time series with anomalies before and after cleaning.

5.34 - series_cosine_similarity_fl()

This article describes series_cosine_similarity_fl() user-defined function.

Calculates the cosine similarity of two numerical vectors.

The function series_cosine_similarity_fl() is a user-defined function (UDF) that takes an expression containing two dynamic numerical arrays as input and calculates their cosine similarity.

Syntax

series_cosine_similarity_fl(vec1, vec2, [ vec1_size [, vec2_size ]])

Parameters

NameTypeRequiredDescription
vec1dynamic✔️An array of numeric values.
vec2dynamic✔️An array of numeric values that is the same length as vec1.
vec1_sizerealThe size of vec1. This is equivalent to the square root of the dot product of the vector with itself.
vec2_sizerealThe size of vec2.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let series_cosine_similarity_fl=(vec1:dynamic, vec2:dynamic, vec1_size:real=double(null), vec2_size:real=double(null))
{
    let dp = series_dot_product(vec1, vec2);
    let v1l = iff(isnull(vec1_size), sqrt(series_dot_product(vec1, vec1)), vec1_size);
    let v2l = iff(isnull(vec2_size), sqrt(series_dot_product(vec2, vec2)), vec2_size);
    dp/(v1l*v2l)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Series", docstring = "Calculate the Cosine similarity of 2 numerical arrays")
series_cosine_similarity_fl(vec1:dynamic, vec2:dynamic, vec1_size:real=double(null), vec2_size:real=double(null))
{
    let dp = series_dot_product(vec1, vec2);
    let v1l = iff(isnull(vec1_size), sqrt(series_dot_product(vec1, vec1)), vec1_size);
    let v2l = iff(isnull(vec2_size), sqrt(series_dot_product(vec2, vec2)), vec2_size);
    dp/(v1l*v2l)
}

Example

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let series_cosine_similarity_fl=(vec1:dynamic, vec2:dynamic, vec1_size:real=double(null), vec2_size:real=double(null))
{
    let dp = series_dot_product(vec1, vec2);
    let v1l = iff(isnull(vec1_size), sqrt(series_dot_product(vec1, vec1)), vec1_size);
    let v2l = iff(isnull(vec2_size), sqrt(series_dot_product(vec2, vec2)), vec2_size);
    dp/(v1l*v2l)
};
let s1=pack_array(0, 1);
let s2=pack_array(sqrt(2), sqrt(2));
print angle=acos(series_cosine_similarity_fl(s1, s2))/(2*pi())*360

Stored

let s1=pack_array(0, 1);
let s2=pack_array(sqrt(2), sqrt(2));
print angle=acos(series_cosine_similarity_fl(s1, s2))/(2*pi())*360

Output

angle
45

5.35 - series_dbl_exp_smoothing_fl()

This article describes the series_dbl_exp_smoothing_fl() user-defined function.

Applies a double exponential smoothing filter on a series.

The function series_dbl_exp_smoothing_fl() is a user-defined function (UDF) that takes an expression containing a dynamic numerical array as input and applies a double exponential smoothing filter. When there is trend in the series, this function is superior to the series_exp_smoothing_fl() function, which implements a basic exponential smoothing filter.

Syntax

series_dbl_exp_smoothing_fl(y_series [, alpha [, beta ]])

Parameters

NameTypeRequiredDescription
y_seriesdynamic✔️An array of numeric values.
alpharealA value in the range [0-1] that specifies the weight of the last point vs. the weight of the previous points, which is 1 - alpha. The default is 0.5.
betarealA value in the range [0-1] that specifies the weight of the last slope vs. the weight of the previous slopes, which is 1 - beta. The default is 0.5.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let series_dbl_exp_smoothing_fl = (y_series:dynamic, alpha:double=0.5, beta:double=0.5)
{
    series_iir(y_series, pack_array(alpha, alpha*(beta-1)), pack_array(1, alpha*(1+beta)-2, 1-alpha))
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Series", docstring = "Double exponential smoothing for a series")
series_dbl_exp_smoothing_fl(y_series:dynamic, alpha:double=0.5, beta:double=0.5)
{
    series_iir(y_series, pack_array(alpha, alpha*(beta-1)), pack_array(1, alpha*(1+beta)-2, 1-alpha))
}

Example

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let series_dbl_exp_smoothing_fl = (y_series:dynamic, alpha:double=0.5, beta:double=0.5)
{
    series_iir(y_series, pack_array(alpha, alpha*(beta-1)), pack_array(1, alpha*(1+beta)-2, 1-alpha))
};
range x from 1 to 50 step 1
| extend y = x + rand()*10
| summarize x = make_list(x), y = make_list(y)
| extend dbl_exp_smooth_y = series_dbl_exp_smoothing_fl(y, 0.2, 0.4) 
| render linechart

Stored

range x from 1 to 50 step 1
| extend y = x + rand()*10
| summarize x = make_list(x), y = make_list(y)
| extend dbl_exp_smooth_y = series_dbl_exp_smoothing_fl(y, 0.2, 0.4) 
| render linechart

Output

Graph showing double exponential smoothing of artificial series.

5.36 - series_dot_product_fl()

This article describes series_dot_product_fl() user-defined function.

Calculates the dot product of two numerical vectors.

The function series_dot_product_fl() is a user-defined function (UDF) that takes an expression containing two dynamic numerical arrays as input and calculates their dot product.

Syntax

series_dot_product_fl(vec1, vec2)

Parameters

NameTypeRequiredDescription
vec1dynamic✔️An array of numeric values.
vec2dynamic✔️An array of numeric values that is the same length as vec1.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let series_dot_product_fl=(vec1:dynamic, vec2:dynamic)
{
    let elem_prod = series_multiply(vec1, vec2);
    let cum_sum = series_iir(elem_prod, dynamic([1]), dynamic([1,-1]));
    todouble(cum_sum[-1])
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Series", docstring = "Calculate the dot product of 2 numerical arrays")
series_dot_product_fl(vec1:dynamic, vec2:dynamic)
{
    let elem_prod = series_multiply(vec1, vec2);
    let cum_sum = series_iir(elem_prod, dynamic([1]), dynamic([1,-1]));
    todouble(cum_sum[-1])
}

Example

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let series_dot_product_fl=(vec1:dynamic, vec2:dynamic)
{
    let elem_prod = series_multiply(vec1, vec2);
    let cum_sum = series_iir(elem_prod, dynamic([1]), dynamic([1,-1]));
    todouble(cum_sum[-1])
};
union
(print 1 | project v1=range(1, 3, 1), v2=range(4, 6, 1)),
(print 1 | project v1=range(11, 13, 1), v2=range(14, 16, 1))
| extend v3=series_dot_product_fl(v1, v2)

Stored

union
(print 1 | project v1=range(1, 3, 1), v2=range(4, 6, 1)),
(print 1 | project v1=range(11, 13, 1), v2=range(14, 16, 1))
| extend v3=series_dot_product_fl(v1, v2)

Output

Table showing the result of dot product of 2 vectors using user-defined function series_dot_product_fl.

5.37 - series_downsample_fl()

This article describes the series_downsample_fl() user-defined function.

The function series_downsample_fl() is a user-defined function (UDF) that downsamples a time series by an integer factor. This function takes a table containing multiple time series (dynamic numerical array), and downsamples each series. The output contains both the coarser series and its respective times array. To avoid aliasing, the function applies a simple low pass filter on each series before subsampling.

Syntax

T | invoke series_downsample_fl(t_col, y_col, ds_t_col, ds_y_col, sampling_factor)

Parameters

NameTypeRequiredDescription
t_colstring✔️The name of the column that contains the time axis of the series to downsample.
y_colstring✔️The name of the column that contains the series to downsample.
ds_t_colstring✔️The name of the column to store the down sampled time axis of each series.
ds_y_colstring✔️The name of the column to store the down sampled series.
sampling_factorint✔️An integer specifying the required down sampling.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let series_downsample_fl=(tbl:(*), t_col:string, y_col:string, ds_t_col:string, ds_y_col:string, sampling_factor:int)
{
    tbl
    | extend _t_ = column_ifexists(t_col, dynamic(0)), _y_ = column_ifexists(y_col, dynamic(0))
    | extend _y_ = series_fir(_y_, repeat(1, sampling_factor), true, true)    //  apply a simple low pass filter before sub-sampling
    | mv-apply _t_ to typeof(DateTime), _y_ to typeof(double) on
    (extend rid=row_number()-1
    | where rid % sampling_factor == ceiling(sampling_factor/2.0)-1                    //  sub-sampling
    | summarize _t_ = make_list(_t_), _y_ = make_list(_y_))
    | extend cols = bag_pack(ds_t_col, _t_, ds_y_col, _y_)
    | project-away _t_, _y_
    | evaluate bag_unpack(cols)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Series", docstring = "Downsampling a series by an integer factor")
series_downsample_fl(tbl:(*), t_col:string, y_col:string, ds_t_col:string, ds_y_col:string, sampling_factor:int)
{
    tbl
    | extend _t_ = column_ifexists(t_col, dynamic(0)), _y_ = column_ifexists(y_col, dynamic(0))
    | extend _y_ = series_fir(_y_, repeat(1, sampling_factor), true, true)    //  apply a simple low pass filter before sub-sampling
    | mv-apply _t_ to typeof(DateTime), _y_ to typeof(double) on
    (extend rid=row_number()-1
    | where rid % sampling_factor == ceiling(sampling_factor/2.0)-1                    //  sub-sampling
    | summarize _t_ = make_list(_t_), _y_ = make_list(_y_))
    | extend cols = bag_pack(ds_t_col, _t_, ds_y_col, _y_)
    | project-away _t_, _y_
    | evaluate bag_unpack(cols)
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let series_downsample_fl=(tbl:(*), t_col:string, y_col:string, ds_t_col:string, ds_y_col:string, sampling_factor:int)
{
    tbl
    | extend _t_ = column_ifexists(t_col, dynamic(0)), _y_ = column_ifexists(y_col, dynamic(0))
    | extend _y_ = series_fir(_y_, repeat(1, sampling_factor), true, true)    //  apply a simple low pass filter before sub-sampling
    | mv-apply _t_ to typeof(DateTime), _y_ to typeof(double) on
    (extend rid=row_number()-1
    | where rid % sampling_factor == ceiling(sampling_factor/2.0)-1                    //  sub-sampling
    | summarize _t_ = make_list(_t_), _y_ = make_list(_y_))
    | extend cols = bag_pack(ds_t_col, _t_, ds_y_col, _y_)
    | project-away _t_, _y_
    | evaluate bag_unpack(cols)
};
demo_make_series1
| make-series num=count() on TimeStamp step 1h by OsVer
| invoke series_downsample_fl('TimeStamp', 'num', 'coarse_TimeStamp', 'coarse_num', 4)
| render timechart with(xcolumn=coarse_TimeStamp, ycolumns=coarse_num)

Stored

demo_make_series1
| make-series num=count() on TimeStamp step 1h by OsVer
| invoke series_downsample_fl('TimeStamp', 'num', 'coarse_TimeStamp', 'coarse_num', 4)
| render timechart with(xcolumn=coarse_TimeStamp, ycolumns=coarse_num)

Output

The time series downsampled by 4: Graph showing downsampling of a time series.

For reference, here is the original time series (before downsampling):

demo_make_series1
| make-series num=count() on TimeStamp step 1h by OsVer
| render timechart with(xcolumn=TimeStamp, ycolumns=num)

Graph showing the original time series, before downsampling

5.38 - series_exp_smoothing_fl()

This article describes series_exp_smoothing_fl() user-defined function.

Applies a basic exponential smoothing filter on a series.

The function series_exp_smoothing_fl() is a user-defined function (UDF) that takes an expression containing a dynamic numerical array as input and applies a basic exponential smoothing filter.

Syntax

series_exp_smoothing_fl(y_series [, alpha ])

Parameters

NameTypeRequiredDescription
y_seriesdynamic✔️An array cell of numeric values.
alpharealA value in the range [0-1] that specifies the weight of the last point vs. the weight of the previous points, which is 1 - alpha. The default is 0.5.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let series_exp_smoothing_fl = (y_series:dynamic, alpha:double=0.5)
{
    series_iir(y_series, pack_array(alpha), pack_array(1, alpha-1))
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Series", docstring = "Basic exponential smoothing for a series")
series_exp_smoothing_fl(y_series:dynamic, alpha:double=0.5)
{
    series_iir(y_series, pack_array(alpha), pack_array(1, alpha-1))
}

Example

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let series_exp_smoothing_fl = (y_series:dynamic, alpha:double=0.5)
{
    series_iir(y_series, pack_array(alpha), pack_array(1, alpha-1))
};
range x from 1 to 50 step 1
| extend y = x % 10
| summarize x = make_list(x), y = make_list(y)
| extend exp_smooth_y = series_exp_smoothing_fl(y, 0.4) 
| render linechart

Stored

range x from 1 to 50 step 1
| extend y = x % 10
| summarize x = make_list(x), y = make_list(y)
| extend exp_smooth_y = series_exp_smoothing_fl(y, 0.4) 
| render linechart

Output

Graph showing exponential smoothing of artificial series.

5.39 - series_fbprophet_forecast_fl()

This article describes the series_fbprophet_forecast_fl() user-defined function.

The function series_fbprophet_forecast_fl() is a user-defined function (UDF) that takes an expression containing a time series as input, and predicts the values of the last trailing points using the Prophet algorithm. The function returns both the forecasted points and their confidence intervals. This function is a Kusto Query Language (KQL) wrapper to Prophet() class, and exposes only the parameters that are mandatory for prediction. Feel free to modify your copy to support more parameters. such as holidays, change points, Fourier order, and so on.

Syntax

T | invoke series_fbprophet_forecast_fl(ts_series, y_series, y_pred_series, [ points ], [ y_pred_low_series ], [ y_pred_high_series ])

Parameters

NameTypeRequiredDescription
ts_seriesstring✔️The name of the input table column containing the time stamps of the series to predict.
y_seriesstring✔️The name of the input table column containing the values of the series to predict.
y_pred_seriesstring✔️The name of the column to store the predicted series.
pointsint✔️The number of points at the end of the series to predict (forecast). These points are excluded from the learning (regression) process. The default is 0.
y_pred_low_seriesstringThe name of the column to store the series of the lowest values of the confidence interval. Omit if the confidence interval isn’t needed.
y_pred_high_seriesstringThe name of the column to store the series of the highest values of the confidence interval. Omit if the confidence interval isn’t needed.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let series_fbprophet_forecast_fl=(tbl:(*), ts_series:string, y_series:string, y_pred_series:string, points:int=0, y_pred_low_series:string='', y_pred_high_series:string='')
{
    let kwargs = bag_pack('ts_series', ts_series, 'y_series', y_series, 'y_pred_series', y_pred_series, 'points', points, 'y_pred_low_series', y_pred_low_series, 'y_pred_high_series', y_pred_high_series);
    let code = ```if 1:
        from sandbox_utils import Zipackage
        Zipackage.install("prophet.zip")
        ts_series = kargs["ts_series"]
        y_series = kargs["y_series"]
        y_pred_series = kargs["y_pred_series"]
        points = kargs["points"]
        y_pred_low_series = kargs["y_pred_low_series"]
        y_pred_high_series = kargs["y_pred_high_series"]
        result = df
        sr = pd.Series(df[y_pred_series])
        if y_pred_low_series != '':
            srl = pd.Series(df[y_pred_low_series])
        if y_pred_high_series != '':
            srh = pd.Series(df[y_pred_high_series])
        from prophet import Prophet
        df1 = pd.DataFrame(columns=["ds", "y"])
        for i in range(df.shape[0]):
            df1["ds"] = pd.to_datetime(df[ts_series][i])
            df1["ds"] = df1["ds"].dt.tz_convert(None)
            df1["y"] = df[y_series][i]
            df2 = df1[:-points]
            m = Prophet()
            m.fit(df2)
            future = df1[["ds"]]
            forecast = m.predict(future)
            sr[i] = list(forecast["yhat"])
            if y_pred_low_series != '':
                srl[i] = list(forecast["yhat_lower"])
            if y_pred_high_series != '':
                srh[i] = list(forecast["yhat_upper"])
        result[y_pred_series] = sr
        if y_pred_low_series != '':
            result[y_pred_low_series] = srl
        if y_pred_high_series != '':
            result[y_pred_high_series] = srh
    ```;
    tbl
     | evaluate python(typeof(*), code, kwargs
, external_artifacts=bag_pack('prophet.zip', 'https://artifactswestusnew.blob.core.windows.net/public/prophet-1.1.5.zip?*** YOUR SAS TOKEN ***'))
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Series", docstring = "Time Series Forecast using Facebook fbprophet package")
series_fbprophet_forecast_fl(tbl:(*), ts_series:string, y_series:string, y_pred_series:string, points:int=0, y_pred_low_series:string='', y_pred_high_series:string='')
{
    let kwargs = bag_pack('ts_series', ts_series, 'y_series', y_series, 'y_pred_series', y_pred_series, 'points', points, 'y_pred_low_series', y_pred_low_series, 'y_pred_high_series', y_pred_high_series);
    let code = ```if 1:
        from sandbox_utils import Zipackage
        Zipackage.install("prophet.zip")
        ts_series = kargs["ts_series"]
        y_series = kargs["y_series"]
        y_pred_series = kargs["y_pred_series"]
        points = kargs["points"]
        y_pred_low_series = kargs["y_pred_low_series"]
        y_pred_high_series = kargs["y_pred_high_series"]
        result = df
        sr = pd.Series(df[y_pred_series])
        if y_pred_low_series != '':
            srl = pd.Series(df[y_pred_low_series])
        if y_pred_high_series != '':
            srh = pd.Series(df[y_pred_high_series])
        from prophet import Prophet
        df1 = pd.DataFrame(columns=["ds", "y"])
        for i in range(df.shape[0]):
            df1["ds"] = pd.to_datetime(df[ts_series][i])
            df1["ds"] = df1["ds"].dt.tz_convert(None)
            df1["y"] = df[y_series][i]
            df2 = df1[:-points]
            m = Prophet()
            m.fit(df2)
            future = df1[["ds"]]
            forecast = m.predict(future)
            sr[i] = list(forecast["yhat"])
            if y_pred_low_series != '':
                srl[i] = list(forecast["yhat_lower"])
            if y_pred_high_series != '':
                srh[i] = list(forecast["yhat_upper"])
        result[y_pred_series] = sr
        if y_pred_low_series != '':
            result[y_pred_low_series] = srl
        if y_pred_high_series != '':
            result[y_pred_high_series] = srh
    ```;
    tbl
     | evaluate python(typeof(*), code, kwargs
, external_artifacts=bag_pack('prophet.zip', 'https://artifactswestusnew.blob.core.windows.net/public/prophet-1.1.5.zip?*** YOUR SAS TOKEN ***'))
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let series_fbprophet_forecast_fl=(tbl:(*), ts_series:string, y_series:string, y_pred_series:string, points:int=0, y_pred_low_series:string='', y_pred_high_series:string='')
{
    let kwargs = bag_pack('ts_series', ts_series, 'y_series', y_series, 'y_pred_series', y_pred_series, 'points', points, 'y_pred_low_series', y_pred_low_series, 'y_pred_high_series', y_pred_high_series);
    let code = ```if 1:
        from sandbox_utils import Zipackage
        Zipackage.install("prophet.zip")
        ts_series = kargs["ts_series"]
        y_series = kargs["y_series"]
        y_pred_series = kargs["y_pred_series"]
        points = kargs["points"]
        y_pred_low_series = kargs["y_pred_low_series"]
        y_pred_high_series = kargs["y_pred_high_series"]
        result = df
        sr = pd.Series(df[y_pred_series])
        if y_pred_low_series != '':
            srl = pd.Series(df[y_pred_low_series])
        if y_pred_high_series != '':
            srh = pd.Series(df[y_pred_high_series])
        from prophet import Prophet
        df1 = pd.DataFrame(columns=["ds", "y"])
        for i in range(df.shape[0]):
            df1["ds"] = pd.to_datetime(df[ts_series][i])
            df1["ds"] = df1["ds"].dt.tz_convert(None)
            df1["y"] = df[y_series][i]
            df2 = df1[:-points]
            m = Prophet()
            m.fit(df2)
            future = df1[["ds"]]
            forecast = m.predict(future)
            sr[i] = list(forecast["yhat"])
            if y_pred_low_series != '':
                srl[i] = list(forecast["yhat_lower"])
            if y_pred_high_series != '':
                srh[i] = list(forecast["yhat_upper"])
        result[y_pred_series] = sr
        if y_pred_low_series != '':
            result[y_pred_low_series] = srl
        if y_pred_high_series != '':
            result[y_pred_high_series] = srh
    ```;
    tbl
     | evaluate python(typeof(*), code, kwargs
, external_artifacts=bag_pack('prophet.zip', 'https://artifactswestusnew.blob.core.windows.net/public/prophet-1.1.5.zip?*** YOUR SAS TOKEN ***'))
};
//
//  Forecasting 3 time series using fbprophet, compare to forecasting using the native function series_decompose_forecast()
//
let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
let horizon=7d;
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t+horizon step dt by sid 
| extend pred_num_native = series_decompose_forecast(num, toint(horizon/dt))
| extend pred_num=dynamic(null), pred_num_lower=dynamic(null), pred_num_upper=dynamic(null)
| invoke series_fbprophet_forecast_fl('TimeStamp', 'num', 'pred_num', toint(horizon/dt), 'pred_num_lower', 'pred_num_upper')
| render timechart 

Stored

//
//  Forecasting 3 time series using fbprophet, compare to forecasting using the native function series_decompose_forecast()
//
let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
let horizon=7d;
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t+horizon step dt by sid 
| extend pred_num_native = series_decompose_forecast(num, toint(horizon/dt))
| extend pred_num=dynamic(null), pred_num_lower=dynamic(null), pred_num_upper=dynamic(null)
| invoke series_fbprophet_forecast_fl('TimeStamp', 'num', 'pred_num', toint(horizon/dt), 'pred_num_lower', 'pred_num_upper')
| render timechart 

Output

Graph showing forecasting few time series.

5.40 - series_fit_lowess_fl()

This article describes the series_fit_lowess_fl() user-defined function.

no-loc: LOWESS

series_fit_lowess_fl()

The function series_fit_lowess_fl() is a user-defined function (UDF) that applies a LOWESS regression on a series. This function takes a table with multiple series (dynamic numerical arrays) and generates a LOWESS Curve, which is a smoothed version of the original series.

Syntax

T | invoke series_fit_lowess_fl(y_series, y_fit_series, [ fit_size ], [ x_series ], [ x_istime ])

Parameters

NameTypeRequiredDescription
y_seriesstring✔️The name of the input table column containing the dependent variable. This column is the series to fit.
y_fit_seriesstring✔️The name of the column to store the fitted series.
fit_sizeintFor each point, the local regression is applied on its respective fit_size closest points. The default is 5.
x_seriesstringThe name of the column containing the independent variable, that is, the x or time axis. This parameter is optional, and is needed only for unevenly spaced series. The default value is an empty string, as x is redundant for the regression of an evenly spaced series.
x_istimeboolThis boolean parameter is needed only if x_series is specified and it’s a vector of datetime. The default is false.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let series_fit_lowess_fl=(tbl:(*), y_series:string, y_fit_series:string, fit_size:int=5, x_series:string='', x_istime:bool=False)
{
    let kwargs = bag_pack('y_series', y_series, 'y_fit_series', y_fit_series, 'fit_size', fit_size, 'x_series', x_series, 'x_istime', x_istime);
    let code = ```if 1:
        y_series = kargs["y_series"]
        y_fit_series = kargs["y_fit_series"]
        fit_size = kargs["fit_size"]
        x_series = kargs["x_series"]
        x_istime = kargs["x_istime"]
        import statsmodels.api as sm
        def lowess_fit(ts_row, x_col, y_col, fsize):
            y = ts_row[y_col]
            fraction = fsize/len(y)
            if x_col == "": # If there is no x column creates sequential range [1, len(y)]
               x = np.arange(len(y)) + 1
            else: # if x column exists check whether its a time column. If so, normalize it to the [1, len(y)] range, else take it as is.
               if x_istime: 
                   x = pd.to_numeric(pd.to_datetime(ts_row[x_col]))
                   x = x - x.min()
                   x = x / x.max()
                   x = x * (len(x) - 1) + 1
               else:
                   x = ts_row[x_col]
            lowess = sm.nonparametric.lowess
            z = lowess(y, x, return_sorted=False, frac=fraction)
            return list(z)
        result = df
        result[y_fit_series] = df.apply(lowess_fit, axis=1, args=(x_series, y_series, fit_size))
    ```;
    tbl
     | evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Series", docstring = "Fits a local polynomial using LOWESS method to a series")
series_fit_lowess_fl(tbl:(*), y_series:string, y_fit_series:string, fit_size:int=5, x_series:string='', x_istime:bool=False)
{
    let kwargs = bag_pack('y_series', y_series, 'y_fit_series', y_fit_series, 'fit_size', fit_size, 'x_series', x_series, 'x_istime', x_istime);
    let code = ```if 1:
        y_series = kargs["y_series"]
        y_fit_series = kargs["y_fit_series"]
        fit_size = kargs["fit_size"]
        x_series = kargs["x_series"]
        x_istime = kargs["x_istime"]
        import statsmodels.api as sm
        def lowess_fit(ts_row, x_col, y_col, fsize):
            y = ts_row[y_col]
            fraction = fsize/len(y)
            if x_col == "": # If there is no x column creates sequential range [1, len(y)]
               x = np.arange(len(y)) + 1
            else: # if x column exists check whether its a time column. If so, normalize it to the [1, len(y)] range, else take it as is.
               if x_istime: 
                   x = pd.to_numeric(pd.to_datetime(ts_row[x_col]))
                   x = x - x.min()
                   x = x / x.max()
                   x = x * (len(x) - 1) + 1
               else:
                   x = ts_row[x_col]
            lowess = sm.nonparametric.lowess
            z = lowess(y, x, return_sorted=False, frac=fraction)
            return list(z)
        result = df
        result[y_fit_series] = df.apply(lowess_fit, axis=1, args=(x_series, y_series, fit_size))
    ```;
    tbl
     | evaluate python(typeof(*), code, kwargs)
}

Examples

The following examples use the invoke operator to run the function.

LOWESS regression on regular time series

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let series_fit_lowess_fl=(tbl:(*), y_series:string, y_fit_series:string, fit_size:int=5, x_series:string='', x_istime:bool=False)
{
    let kwargs = bag_pack('y_series', y_series, 'y_fit_series', y_fit_series, 'fit_size', fit_size, 'x_series', x_series, 'x_istime', x_istime);
    let code = ```if 1:
        y_series = kargs["y_series"]
        y_fit_series = kargs["y_fit_series"]
        fit_size = kargs["fit_size"]
        x_series = kargs["x_series"]
        x_istime = kargs["x_istime"]
        import statsmodels.api as sm
        def lowess_fit(ts_row, x_col, y_col, fsize):
            y = ts_row[y_col]
            fraction = fsize/len(y)
            if x_col == "": # If there is no x column creates sequential range [1, len(y)]
               x = np.arange(len(y)) + 1
            else: # if x column exists check whether its a time column. If so, normalize it to the [1, len(y)] range, else take it as is.
               if x_istime: 
                   x = pd.to_numeric(pd.to_datetime(ts_row[x_col]))
                   x = x - x.min()
                   x = x / x.max()
                   x = x * (len(x) - 1) + 1
               else:
                   x = ts_row[x_col]
            lowess = sm.nonparametric.lowess
            z = lowess(y, x, return_sorted=False, frac=fraction)
            return list(z)
        result = df
        result[y_fit_series] = df.apply(lowess_fit, axis=1, args=(x_series, y_series, fit_size))
    ```;
    tbl
     | evaluate python(typeof(*), code, kwargs)
};
//
// Apply 9 points LOWESS regression on regular time series
//
let max_t = datetime(2016-09-03);
demo_make_series1
| make-series num=count() on TimeStamp from max_t-1d to max_t step 5m by OsVer
| extend fnum = dynamic(null)
| invoke series_fit_lowess_fl('num', 'fnum', 9)
| render timechart

Stored

//
// Apply 9 points LOWESS regression on regular time series
//
let max_t = datetime(2016-09-03);
demo_make_series1
| make-series num=count() on TimeStamp from max_t-1d to max_t step 5m by OsVer
| extend fnum = dynamic(null)
| invoke series_fit_lowess_fl('num', 'fnum', 9)
| render timechart

Output

Graph showing nine points LOWESS fit to a regular time series.

Test irregular time series

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let series_fit_lowess_fl=(tbl:(*), y_series:string, y_fit_series:string, fit_size:int=5, x_series:string='', x_istime:bool=False)
{
    let kwargs = bag_pack('y_series', y_series, 'y_fit_series', y_fit_series, 'fit_size', fit_size, 'x_series', x_series, 'x_istime', x_istime);
    let code = ```if 1:
        y_series = kargs["y_series"]
        y_fit_series = kargs["y_fit_series"]
        fit_size = kargs["fit_size"]
        x_series = kargs["x_series"]
        x_istime = kargs["x_istime"]
        import statsmodels.api as sm
        def lowess_fit(ts_row, x_col, y_col, fsize):
            y = ts_row[y_col]
            fraction = fsize/len(y)
            if x_col == "": # If there is no x column creates sequential range [1, len(y)]
               x = np.arange(len(y)) + 1
            else: # if x column exists check whether its a time column. If so, normalize it to the [1, len(y)] range, else take it as is.
               if x_istime: 
                   x = pd.to_numeric(pd.to_datetime(ts_row[x_col]))
                   x = x - x.min()
                   x = x / x.max()
                   x = x * (len(x) - 1) + 1
               else:
                   x = ts_row[x_col]
            lowess = sm.nonparametric.lowess
            z = lowess(y, x, return_sorted=False, frac=fraction)
            return list(z)
        result = df
        result[y_fit_series] = df.apply(lowess_fit, axis=1, args=(x_series, y_series, fit_size))
    ```;
    tbl
     | evaluate python(typeof(*), code, kwargs)
};
let max_t = datetime(2016-09-03);
demo_make_series1
| where TimeStamp between ((max_t-1d)..max_t)
| summarize num=count() by bin(TimeStamp, 5m), OsVer
| order by TimeStamp asc
| where hourofday(TimeStamp) % 6 != 0   //  delete every 6th hour to create irregular time series
| summarize TimeStamp=make_list(TimeStamp), num=make_list(num) by OsVer
| extend fnum = dynamic(null)
| invoke series_fit_lowess_fl('num', 'fnum', 9, 'TimeStamp', True)
| render timechart 

Stored

let max_t = datetime(2016-09-03);
demo_make_series1
| where TimeStamp between ((max_t-1d)..max_t)
| summarize num=count() by bin(TimeStamp, 5m), OsVer
| order by TimeStamp asc
| where hourofday(TimeStamp) % 6 != 0   //  delete every 6th hour to create irregular time series
| summarize TimeStamp=make_list(TimeStamp), num=make_list(num) by OsVer
| extend fnum = dynamic(null)
| invoke series_fit_lowess_fl('num', 'fnum', 9, 'TimeStamp', True)
| render timechart 

Output

Graph showing nine points LOWESS fit to an irregular time series.

Compare LOWESS versus polynomial fit

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let series_fit_lowess_fl=(tbl:(*), y_series:string, y_fit_series:string, fit_size:int=5, x_series:string='', x_istime:bool=False)
{
    let kwargs = bag_pack('y_series', y_series, 'y_fit_series', y_fit_series, 'fit_size', fit_size, 'x_series', x_series, 'x_istime', x_istime);
    let code = ```if 1:
        y_series = kargs["y_series"]
        y_fit_series = kargs["y_fit_series"]
        fit_size = kargs["fit_size"]
        x_series = kargs["x_series"]
        x_istime = kargs["x_istime"]
        import statsmodels.api as sm
        def lowess_fit(ts_row, x_col, y_col, fsize):
            y = ts_row[y_col]
            fraction = fsize/len(y)
            if x_col == "": # If there is no x column creates sequential range [1, len(y)]
               x = np.arange(len(y)) + 1
            else: # if x column exists check whether its a time column. If so, normalize it to the [1, len(y)] range, else take it as is.
               if x_istime: 
                   x = pd.to_numeric(pd.to_datetime(ts_row[x_col]))
                   x = x - x.min()
                   x = x / x.max()
                   x = x * (len(x) - 1) + 1
               else:
                   x = ts_row[x_col]
            lowess = sm.nonparametric.lowess
            z = lowess(y, x, return_sorted=False, frac=fraction)
            return list(z)
        result = df
        result[y_fit_series] = df.apply(lowess_fit, axis=1, args=(x_series, y_series, fit_size))
    ```;
    tbl
     | evaluate python(typeof(*), code, kwargs)
};
range x from 1 to 200 step 1
| project x = rand()*5 - 2.3
| extend y = pow(x, 5)-8*pow(x, 3)+10*x+6
| extend y = y + (rand() - 0.5)*0.5*y
| summarize x=make_list(x), y=make_list(y)
| extend y_lowess = dynamic(null)
| invoke series_fit_lowess_fl('y', 'y_lowess', 15, 'x')
| extend series_fit_poly(y, x, 5)
| project x, y, y_lowess, y_polynomial=series_fit_poly_y_poly_fit
| render linechart

Stored

range x from 1 to 200 step 1
| project x = rand()*5 - 2.3
| extend y = pow(x, 5)-8*pow(x, 3)+10*x+6
| extend y = y + (rand() - 0.5)*0.5*y
| summarize x=make_list(x), y=make_list(y)
| extend y_lowess = dynamic(null)
| invoke series_fit_lowess_fl('y', 'y_lowess', 15, 'x')
| extend series_fit_poly(y, x, 5)
| project x, y, y_lowess, y_polynomial=series_fit_poly_y_poly_fit
| render linechart

Output

Graphs of LOWESS vs polynomial fit for a fifth order polynomial with noise on x & y axes

5.41 - series_fit_poly_fl()

This article describes the series_fit_poly_fl() user-defined function.

The function series_fit_poly_fl() is a user-defined function (UDF) that applies a polynomial regression on a series. This function takes a table containing multiple series (dynamic numerical arrays) and generates the best fit high-order polynomial for each series using polynomial regression. This function returns both the polynomial coefficients and the interpolated polynomial over the range of the series.

Syntax

T | invoke series_fit_poly_fl(y_series, y_fit_series, fit_coeff, degree, [ x_series ], [ x_istime ])

Parameters

NameTypeRequiredDescription
y_seriesstring✔️The name of the input table column containing the dependent variable. That is, the series to fit.
y_fit_seriesstring✔️The name of the column to store the best fit series.
fit_coeffstring✔️The name of the column to store the best fit polynomial coefficients.
degreeint✔️The required order of the polynomial to fit. For example, 1 for linear regression, 2 for quadratic regression, and so on.
x_seriesstringThe name of the column containing the independent variable, that is, the x or time axis. This parameter is optional, and is needed only for unevenly spaced series. The default value is an empty string, as x is redundant for the regression of an evenly spaced series.
x_istimeboolThis parameter is needed only if x_series is specified and it’s a vector of datetime.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let series_fit_poly_fl=(tbl:(*), y_series:string, y_fit_series:string, fit_coeff:string, degree:int, x_series:string='', x_istime:bool=False)
{
    let kwargs = bag_pack('y_series', y_series, 'y_fit_series', y_fit_series, 'fit_coeff', fit_coeff, 'degree', degree, 'x_series', x_series, 'x_istime', x_istime);
    let code = ```if 1:
        y_series = kargs["y_series"]
        y_fit_series = kargs["y_fit_series"]
        fit_coeff = kargs["fit_coeff"]
        degree = kargs["degree"]
        x_series = kargs["x_series"]
        x_istime = kargs["x_istime"]
        
        def fit(ts_row, x_col, y_col, deg):
            y = ts_row[y_col]
            if x_col == "": # If there is no x column creates sequential range [1, len(y)]
               x = np.arange(len(y)) + 1
            else: # if x column exists check whether its a time column. If so, normalize it to the [1, len(y)] range, else take it as is.
               if x_istime: 
                   x = pd.to_numeric(pd.to_datetime(ts_row[x_col]))
                   x = x - x.min()
                   x = x / x.max()
                   x = x * (len(x) - 1) + 1
               else:
                   x = ts_row[x_col]
            coeff = np.polyfit(x, y, deg)
            p = np.poly1d(coeff)
            z = p(x)
            return z, coeff
        
        result = df
        if len(df):
           result[[y_fit_series, fit_coeff]] = df.apply(fit, axis=1, args=(x_series, y_series, degree,), result_type="expand")
    ```;
    tbl
     | evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Series", docstring = "Fit a polynomial of a specified degree to a series")
series_fit_poly_fl(tbl:(*), y_series:string, y_fit_series:string, fit_coeff:string, degree:int, x_series:string='', x_istime:bool=false)
{
    let kwargs = bag_pack('y_series', y_series, 'y_fit_series', y_fit_series, 'fit_coeff', fit_coeff, 'degree', degree, 'x_series', x_series, 'x_istime', x_istime);
    let code = ```if 1:
        y_series = kargs["y_series"]
        y_fit_series = kargs["y_fit_series"]
        fit_coeff = kargs["fit_coeff"]
        degree = kargs["degree"]
        x_series = kargs["x_series"]
        x_istime = kargs["x_istime"]
        
        def fit(ts_row, x_col, y_col, deg):
            y = ts_row[y_col]
            if x_col == "": # If there is no x column creates sequential range [1, len(y)]
               x = np.arange(len(y)) + 1
            else: # if x column exists check whether its a time column. If so, normalize it to the [1, len(y)] range, else take it as is.
               if x_istime: 
                   x = pd.to_numeric(pd.to_datetime(ts_row[x_col]))
                   x = x - x.min()
                   x = x / x.max()
                   x = x * (len(x) - 1) + 1
               else:
                   x = ts_row[x_col]
            coeff = np.polyfit(x, y, deg)
            p = np.poly1d(coeff)
            z = p(x)
            return z, coeff
        
        result = df
        if len(df):
           result[[y_fit_series, fit_coeff]] = df.apply(fit, axis=1, args=(x_series, y_series, degree,), result_type="expand")
    ```;
    tbl
     | evaluate python(typeof(*), code, kwargs)
}

Examples

The following examples use the invoke operator to run the function.

Fit fifth order polynomial to a regular time series

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let series_fit_poly_fl=(tbl:(*), y_series:string, y_fit_series:string, fit_coeff:string, degree:int, x_series:string='', x_istime:bool=False)
{
    let kwargs = bag_pack('y_series', y_series, 'y_fit_series', y_fit_series, 'fit_coeff', fit_coeff, 'degree', degree, 'x_series', x_series, 'x_istime', x_istime);
    let code = ```if 1:
        y_series = kargs["y_series"]
        y_fit_series = kargs["y_fit_series"]
        fit_coeff = kargs["fit_coeff"]
        degree = kargs["degree"]
        x_series = kargs["x_series"]
        x_istime = kargs["x_istime"]
        
        def fit(ts_row, x_col, y_col, deg):
            y = ts_row[y_col]
            if x_col == "": # If there is no x column creates sequential range [1, len(y)]
               x = np.arange(len(y)) + 1
            else: # if x column exists check whether its a time column. If so, normalize it to the [1, len(y)] range, else take it as is.
               if x_istime: 
                   x = pd.to_numeric(pd.to_datetime(ts_row[x_col]))
                   x = x - x.min()
                   x = x / x.max()
                   x = x * (len(x) - 1) + 1
               else:
                   x = ts_row[x_col]
            coeff = np.polyfit(x, y, deg)
            p = np.poly1d(coeff)
            z = p(x)
            return z, coeff
        
        result = df
        if len(df):
           result[[y_fit_series, fit_coeff]] = df.apply(fit, axis=1, args=(x_series, y_series, degree,), result_type="expand")
    ```;
    tbl
     | evaluate python(typeof(*), code, kwargs)
};
//
// Fit fifth order polynomial to a regular (evenly spaced) time series, created with make-series
//
let max_t = datetime(2016-09-03);
demo_make_series1
| make-series num=count() on TimeStamp from max_t-1d to max_t step 5m by OsVer
| extend fnum = dynamic(null), coeff=dynamic(null), fnum1 = dynamic(null), coeff1=dynamic(null)
| invoke series_fit_poly_fl('num', 'fnum', 'coeff', 5)
| render timechart with(ycolumns=num, fnum)

Stored

//
// Fit fifth order polynomial to a regular (evenly spaced) time series, created with make-series
//
let max_t = datetime(2016-09-03);
demo_make_series1
| make-series num=count() on TimeStamp from max_t-1d to max_t step 5m by OsVer
| extend fnum = dynamic(null), coeff=dynamic(null), fnum1 = dynamic(null), coeff1=dynamic(null)
| invoke series_fit_poly_fl('num', 'fnum', 'coeff', 5)
| render timechart with(ycolumns=num, fnum)

Output

Graph showing fifth order polynomial fit to a regular time series.

Test irregular time series

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let series_fit_poly_fl=(tbl:(*), y_series:string, y_fit_series:string, fit_coeff:string, degree:int, x_series:string='', x_istime:bool=False)
{
    let kwargs = bag_pack('y_series', y_series, 'y_fit_series', y_fit_series, 'fit_coeff', fit_coeff, 'degree', degree, 'x_series', x_series, 'x_istime', x_istime);
    let code = ```if 1:
        y_series = kargs["y_series"]
        y_fit_series = kargs["y_fit_series"]
        fit_coeff = kargs["fit_coeff"]
        degree = kargs["degree"]
        x_series = kargs["x_series"]
        x_istime = kargs["x_istime"]
        
        def fit(ts_row, x_col, y_col, deg):
            y = ts_row[y_col]
            if x_col == "": # If there is no x column creates sequential range [1, len(y)]
               x = np.arange(len(y)) + 1
            else: # if x column exists check whether its a time column. If so, normalize it to the [1, len(y)] range, else take it as is.
               if x_istime: 
                   x = pd.to_numeric(pd.to_datetime(ts_row[x_col]))
                   x = x - x.min()
                   x = x / x.max()
                   x = x * (len(x) - 1) + 1
               else:
                   x = ts_row[x_col]
            coeff = np.polyfit(x, y, deg)
            p = np.poly1d(coeff)
            z = p(x)
            return z, coeff
        
        result = df
        if len(df):
           result[[y_fit_series, fit_coeff]] = df.apply(fit, axis=1, args=(x_series, y_series, degree,), result_type="expand")
    ```;
    tbl
     | evaluate python(typeof(*), code, kwargs)
};
let max_t = datetime(2016-09-03);
demo_make_series1
| where TimeStamp between ((max_t-2d)..max_t)
| summarize num=count() by bin(TimeStamp, 5m), OsVer
| order by TimeStamp asc
| where hourofday(TimeStamp) % 6 != 0   //  delete every 6th hour to create unevenly spaced time series
| summarize TimeStamp=make_list(TimeStamp), num=make_list(num) by OsVer
| extend fnum = dynamic(null), coeff=dynamic(null)
| invoke series_fit_poly_fl('num', 'fnum', 'coeff', 8, 'TimeStamp', True)
| render timechart with(ycolumns=num, fnum)

Stored

let max_t = datetime(2016-09-03);
demo_make_series1
| where TimeStamp between ((max_t-2d)..max_t)
| summarize num=count() by bin(TimeStamp, 5m), OsVer
| order by TimeStamp asc
| where hourofday(TimeStamp) % 6 != 0   //  delete every 6th hour to create unevenly spaced time series
| summarize TimeStamp=make_list(TimeStamp), num=make_list(num) by OsVer
| extend fnum = dynamic(null), coeff=dynamic(null)
| invoke series_fit_poly_fl('num', 'fnum', 'coeff', 8, 'TimeStamp', True)
| render timechart with(ycolumns=num, fnum)

Output

Graph showing eighth order polynomial fit to an irregular time series.

Fifth order polynomial with noise on x & y axes

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let series_fit_poly_fl=(tbl:(*), y_series:string, y_fit_series:string, fit_coeff:string, degree:int, x_series:string='', x_istime:bool=False)
{
    let kwargs = bag_pack('y_series', y_series, 'y_fit_series', y_fit_series, 'fit_coeff', fit_coeff, 'degree', degree, 'x_series', x_series, 'x_istime', x_istime);
    let code = ```if 1:
        y_series = kargs["y_series"]
        y_fit_series = kargs["y_fit_series"]
        fit_coeff = kargs["fit_coeff"]
        degree = kargs["degree"]
        x_series = kargs["x_series"]
        x_istime = kargs["x_istime"]
        
        def fit(ts_row, x_col, y_col, deg):
            y = ts_row[y_col]
            if x_col == "": # If there is no x column creates sequential range [1, len(y)]
               x = np.arange(len(y)) + 1
            else: # if x column exists check whether its a time column. If so, normalize it to the [1, len(y)] range, else take it as is.
               if x_istime: 
                   x = pd.to_numeric(pd.to_datetime(ts_row[x_col]))
                   x = x - x.min()
                   x = x / x.max()
                   x = x * (len(x) - 1) + 1
               else:
                   x = ts_row[x_col]
            coeff = np.polyfit(x, y, deg)
            p = np.poly1d(coeff)
            z = p(x)
            return z, coeff
        
        result = df
        if len(df):
           result[[y_fit_series, fit_coeff]] = df.apply(fit, axis=1, args=(x_series, y_series, degree,), result_type="expand")
    ```;
    tbl
     | evaluate python(typeof(*), code, kwargs)
};
range x from 1 to 200 step 1
| project x = rand()*5 - 2.3
| extend y = pow(x, 5)-8*pow(x, 3)+10*x+6
| extend y = y + (rand() - 0.5)*0.5*y
| summarize x=make_list(x), y=make_list(y)
| extend y_fit = dynamic(null), coeff=dynamic(null)
| invoke series_fit_poly_fl('y', 'y_fit', 'coeff', 5, 'x')
|fork (project-away coeff) (project coeff | mv-expand coeff)
| render linechart

Stored

range x from 1 to 200 step 1
| project x = rand()*5 - 2.3
| extend y = pow(x, 5)-8*pow(x, 3)+10*x+6
| extend y = y + (rand() - 0.5)*0.5*y
| summarize x=make_list(x), y=make_list(y)
| extend y_fit = dynamic(null), coeff=dynamic(null)
| invoke series_fit_poly_fl('y', 'y_fit', 'coeff', 5, 'x')
|fork (project-away coeff) (project coeff | mv-expand coeff)
| render linechart

Output

Graph of fit of fifth order polynomial with noise on x & y axes

Coefficients of fit of fifth order polynomial with noise.

5.42 - series_lag_fl()

This article describes series_lag_fl() user-defined function.

Applies a lag on a series.

The function series_lag_fl() is a user-defined function (UDF) that takes an expression containing a dynamic numerical array as input and shifts it backward. It’s commonly used for shifting time series to test whether a pattern is new or it matches historical data.

Syntax

series_lag_fl(y_series, offset)

Parameters

NameTypeRequiredDescription
y_seriesdynamic✔️An array cell of numeric values.
offsetint✔️An integer specifying the required offset in bins.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let series_lag_fl = (series:dynamic, offset:int)
{
    let lag_f = toscalar(range x from 1 to offset+1 step 1
    | project y=iff(x == offset+1, 1, 0)
    | summarize lag_filter = make_list(y));
    fir(series, lag_f, false)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function  with (folder = "Packages\\Series", docstring = "Shift a series by a specified offset")
series_lag_fl(series:dynamic, offset:int)
{
    let lag_f = toscalar(range x from 1 to offset+1 step 1
    | project y=iff(x == offset+1, 1, 0)
    | summarize lag_filter = make_list(y));
    fir(series, lag_f, false)
} 

Example

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let series_lag_fl = (series:dynamic, offset:int)
{
    let lag_f = toscalar(range x from 1 to offset+1 step 1
    | project y=iff(x == offset+1, 1, 0)
    | summarize lag_filter = make_list(y));
    fir(series, lag_f, false)
};
let dt = 1h;
let time_shift = 1d;
let bins_shift = toint(time_shift/dt);
demo_make_series1
| make-series num=count() on TimeStamp step dt by OsVer
| extend num_shifted=series_lag_fl(num, bins_shift)
| render timechart

Stored

let dt = 1h;
let time_shift = 1d;
let bins_shift = toint(time_shift/dt);
demo_make_series1
| make-series num=count() on TimeStamp step dt by OsVer
| extend num_shifted=series_lag_fl(num, bins_shift)
| render timechart

Output

Graph of a time series shifted by one day.

5.43 - series_metric_fl()

This article describes the series_metric_fl() user-defined function.

The series_metric_fl() function is a user-defined function (UDF) that selects and retrieves time series of metrics ingested to your database using the Prometheus monitoring system. This function assumes the data stored in your database is structured following the Prometheus data model. Specifically, each record contains:

  • timestamp
  • metric name
  • metric value
  • a variable set of labels ("key":"value" pairs)

Prometheus defines a time series by its metric name and a distinct set of labels. You can retrieve sets of time series using Prometheus Query Language (PromQL) by specifying the metric name and time series selector (a set of labels).

Syntax

T | invoke series_metric_fl(timestamp_col, name_col, labels_col, value_col, metric_name, labels_selector, lookback, offset)

Parameters

NameTypeRequiredDescription
timestamp_colstring✔️The name of the column containing the timestamp.
name_colstring✔️The name of the column containing the metric name.
labels_colstring✔️The name of the column containing the labels dictionary.
value_colstring✔️The name of the column containing the metric value.
metric_namestring✔️The metric time series to retrieve.
labels_selectorstringTime series selector string, similar to PromQL. It’s a string containing a list of "key":"value" pairs, for example '"key1":"val1","key2":"val2"'. The default is an empty string, which means no filtering. Note that regular expressions are not supported.
lookbacktimespanThe range vector to retrieve, similar to PromQL. The default is 10 minutes.
offsetdatetimeOffset back from current time to retrieve, similar to PromQL. Data is retrieved from ago(offset)-lookback to ago(offset). The default is 0, which means that data is retrieved up to now().

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let series_metric_fl=(metrics_tbl:(*), timestamp_col:string, name_col:string, labels_col:string, value_col:string, metric_name:string, labels_selector:string='', lookback:timespan=timespan(10m), offset:timespan=timespan(0))
{
    let selector_d=iff(labels_selector == '', dynamic(['']), split(labels_selector, ','));
    let etime = ago(offset);
    let stime = etime - lookback;
    metrics_tbl
    | extend timestamp = column_ifexists(timestamp_col, datetime(null)), name = column_ifexists(name_col, ''), labels = column_ifexists(labels_col, dynamic(null)), value = column_ifexists(value_col, 0)
    | extend labels = dynamic_to_json(labels)       //  convert to string and sort by key
    | where name == metric_name and timestamp between(stime..etime)
    | order by timestamp asc
    | summarize timestamp = make_list(timestamp), value=make_list(value) by name, labels
    | where labels has_all (selector_d)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create function with (folder = "Packages\\Series", docstring = "Selecting & retrieving metrics like PromQL")
series_metric_fl(metrics_tbl:(*), timestamp_col:string, name_col:string, labels_col:string, value_col:string, metric_name:string, labels_selector:string='', lookback:timespan=timespan(10m), offset:timespan=timespan(0))
{
    let selector_d=iff(labels_selector == '', dynamic(['']), split(labels_selector, ','));
    let etime = ago(offset);
    let stime = etime - lookback;
    metrics_tbl
    | extend timestamp = column_ifexists(timestamp_col, datetime(null)), name = column_ifexists(name_col, ''), labels = column_ifexists(labels_col, dynamic(null)), value = column_ifexists(value_col, 0)
    | extend labels = dynamic_to_json(labels)       //  convert to string and sort by key
    | where name == metric_name and timestamp between(stime..etime)
    | order by timestamp asc
    | summarize timestamp = make_list(timestamp), value=make_list(value) by name, labels
    | where labels has_all (selector_d)
}

Examples

The following examples use the invoke operator to run the function.

With specifying selector

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let series_metric_fl=(metrics_tbl:(*), timestamp_col:string, name_col:string, labels_col:string, value_col:string, metric_name:string, labels_selector:string='', lookback:timespan=timespan(10m), offset:timespan=timespan(0))
{
    let selector_d=iff(labels_selector == '', dynamic(['']), split(labels_selector, ','));
    let etime = ago(offset);
    let stime = etime - lookback;
    metrics_tbl
    | extend timestamp = column_ifexists(timestamp_col, datetime(null)), name = column_ifexists(name_col, ''), labels = column_ifexists(labels_col, dynamic(null)), value = column_ifexists(value_col, 0)
    | extend labels = dynamic_to_json(labels)       //  convert to string and sort by key
    | where name == metric_name and timestamp between(stime..etime)
    | order by timestamp asc
    | summarize timestamp = make_list(timestamp), value=make_list(value) by name, labels
    | where labels has_all (selector_d)
};
demo_prometheus
| invoke series_metric_fl('TimeStamp', 'Name', 'Labels', 'Val', 'writes', '"disk":"sda1","host":"aks-agentpool-88086459-vmss000001"', offset=now()-datetime(2020-12-08 00:00))
| render timechart with(series=labels)

Stored

demo_prometheus
| invoke series_metric_fl('TimeStamp', 'Name', 'Labels', 'Val', 'writes', '"disk":"sda1","host":"aks-agentpool-88086459-vmss000001"', offset=now()-datetime(2020-12-08 00:00))
| render timechart with(series=labels)

Output

Graph showing disk write metric over 10 minutes.

Without specifying selector

The following example doesn’t specify selector, so all ‘writes’ metrics are selected. This example assumes that the function is already installed, and uses alternative direct calling syntax, specifying the input table as the first parameter:

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let series_metric_fl=(metrics_tbl:(*), timestamp_col:string, name_col:string, labels_col:string, value_col:string, metric_name:string, labels_selector:string='', lookback:timespan=timespan(10m), offset:timespan=timespan(0))
{
    let selector_d=iff(labels_selector == '', dynamic(['']), split(labels_selector, ','));
    let etime = ago(offset);
    let stime = etime - lookback;
    metrics_tbl
    | extend timestamp = column_ifexists(timestamp_col, datetime(null)), name = column_ifexists(name_col, ''), labels = column_ifexists(labels_col, dynamic(null)), value = column_ifexists(value_col, 0)
    | extend labels = dynamic_to_json(labels)       //  convert to string and sort by key
    | where name == metric_name and timestamp between(stime..etime)
    | order by timestamp asc
    | summarize timestamp = make_list(timestamp), value=make_list(value) by name, labels
    | where labels has_all (selector_d)
};
series_metric_fl(demo_prometheus, 'TimeStamp', 'Name', 'Labels', 'Val', 'writes', offset=now()-datetime(2020-12-08 00:00))
| render timechart with(series=labels, ysplit=axes)

Stored

series_metric_fl(demo_prometheus, 'TimeStamp', 'Name', 'Labels', 'Val', 'writes', offset=now()-datetime(2020-12-08 00:00))
| render timechart with(series=labels, ysplit=axes)

Output

Graph showing disk write metric for all disks over 10 minutes.

5.44 - series_monthly_decompose_anomalies_fl()

Learn how to use the series_monthly_decompose_anomalies_fl() function to detect anomalies in a series with monthly seasonality.

Detect anomalous points in a daily series with monthly seasonality.

The function series_monthly_decompose_anomalies_fl() is a user-defined function (UDF) that detects anomalies in multiple time series that have monthly seasonality. The function is built on top of series_decompose_anomalies(). The challenge is that the length of a month is variable between 28 to 31 days, so building a baseline by using series_decompose_anomalies() out of the box detects fixed seasonality thus fails to match spikes or other patterns that occur in the 1st or other day in each month.

Syntax

series_monthly_decompose_anomalies_fl(threshold)

Parameters

NameTypeRequiredDescription
thresholdrealAnomaly threshold. Default is 1.5.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let series_monthly_decompose_anomalies_fl=(tbl:(_key:string, _date:datetime, _val:real), threshold:real=1.5)
{
    let _tbl=materialize(tbl
    | extend _year=getyear(_date), _dom = dayofmonth(_date), _moy=monthofyear(_date), _doy=dayofyear(_date)
    | extend _vdoy = 31*(_moy-1)+_dom                  //  virtual day of year (assuming all months have 31 days)
    );
    let median_tbl = _tbl | summarize p50=percentiles(_val, 50) by _key, _dom;
    let keys = _tbl | summarize by _key | extend dummy=1;
    let years = _tbl | summarize by _year | extend dummy=1;
    let vdoys = range _vdoy from 0 to 31*12-1 step 1 | extend _moy=_vdoy/31+1, _vdom=_vdoy%31+1, _vdoy=_vdoy+1 | extend dummy=1
    | join kind=fullouter years on dummy | join kind=fullouter keys on dummy | project-away dummy, dummy1, dummy2;
    vdoys
    | join kind=leftouter _tbl on _key, _year, _vdoy
    | project-away _key1, _year1, _moy1, _vdoy1
    | extend _adoy=31*12*_year+_doy, _vadoy = 31*12*_year+_vdoy
    | partition by _key (as T
        | where _vadoy >= toscalar(T | summarize (_adoy, _vadoy)=arg_min(_adoy, _vadoy) | project _vadoy) and 
          _vadoy <= toscalar(T | summarize (_adoy, _vadoy)=arg_max(_adoy, _vadoy) | project _vadoy)
    )
    | join kind=inner median_tbl on _key, $left._vdom == $right._dom
    | extend _vval = coalesce(_val, p50)
    //| order by _key asc, _vadoy asc     //  for debugging
    | make-series _vval=avg(_vval), _date=any(_date) default=datetime(null) on _vadoy step 1 by _key
    | extend (anomalies, score, baseline) = series_decompose_anomalies(_vval, threshold, 31)
    | mv-expand _date to typeof(datetime), _vval to typeof(real), _vadoy to typeof(long), anomalies to typeof(int), score to typeof(real), baseline to typeof(real)
    | project-away _vadoy
    | project-rename _val=_vval
    | where isnotnull(_date)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Series", docstring = "Anomaly Detection for daily time series with monthly seasonality")
series_monthly_decompose_anomalies_fl(tbl:(_key:string, _date:datetime, _val:real), threshold:real=1.5)
{
    let _tbl=materialize(tbl
    | extend _year=getyear(_date), _dom = dayofmonth(_date), _moy=monthofyear(_date), _doy=dayofyear(_date)
    | extend _vdoy = 31*(_moy-1)+_dom                  //  virtual day of year (assuming all months have 31 days)
    );
    let median_tbl = _tbl | summarize p50=percentiles(_val, 50) by _key, _dom;
    let keys = _tbl | summarize by _key | extend dummy=1;
    let years = _tbl | summarize by _year | extend dummy=1;
    let vdoys = range _vdoy from 0 to 31*12-1 step 1 | extend _moy=_vdoy/31+1, _vdom=_vdoy%31+1, _vdoy=_vdoy+1 | extend dummy=1
    | join kind=fullouter years on dummy | join kind=fullouter keys on dummy | project-away dummy, dummy1, dummy2;
    vdoys
    | join kind=leftouter _tbl on _key, _year, _vdoy
    | project-away _key1, _year1, _moy1, _vdoy1
    | extend _adoy=31*12*_year+_doy, _vadoy = 31*12*_year+_vdoy
    | partition by _key (as T
        | where _vadoy >= toscalar(T | summarize (_adoy, _vadoy)=arg_min(_adoy, _vadoy) | project _vadoy) and 
          _vadoy <= toscalar(T | summarize (_adoy, _vadoy)=arg_max(_adoy, _vadoy) | project _vadoy)
    )
    | join kind=inner median_tbl on _key, $left._vdom == $right._dom
    | extend _vval = coalesce(_val, p50)
    //| order by _key asc, _vadoy asc     //  for debugging
    | make-series _vval=avg(_vval), _date=any(_date) default=datetime(null) on _vadoy step 1 by _key
    | extend (anomalies, score, baseline) = series_decompose_anomalies(_vval, threshold, 31)
    | mv-expand _date to typeof(datetime), _vval to typeof(real), _vadoy to typeof(long), anomalies to typeof(int), score to typeof(real), baseline to typeof(real)
    | project-away _vadoy
    | project-rename _val=_vval
    | where isnotnull(_date)
}

Example

The input table must contain _key, _date and _val columns. The query builds a set of time series of _val per each _key and adds anomalies, score and baseline columns.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let series_monthly_decompose_anomalies_fl=(tbl:(_key:string, _date:datetime, _val:real), threshold:real=1.5)
{
    let _tbl=materialize(tbl
    | extend _year=getyear(_date), _dom = dayofmonth(_date), _moy=monthofyear(_date), _doy=dayofyear(_date)
    | extend _vdoy = 31*(_moy-1)+_dom                  //  virtual day of year (assuming all months have 31 days)
    );
    let median_tbl = _tbl | summarize p50=percentiles(_val, 50) by _key, _dom;
    let keys = _tbl | summarize by _key | extend dummy=1;
    let years = _tbl | summarize by _year | extend dummy=1;
    let vdoys = range _vdoy from 0 to 31*12-1 step 1 | extend _moy=_vdoy/31+1, _vdom=_vdoy%31+1, _vdoy=_vdoy+1 | extend dummy=1
    | join kind=fullouter years on dummy | join kind=fullouter keys on dummy | project-away dummy, dummy1, dummy2;
    vdoys
    | join kind=leftouter _tbl on _key, _year, _vdoy
    | project-away _key1, _year1, _moy1, _vdoy1
    | extend _adoy=31*12*_year+_doy, _vadoy = 31*12*_year+_vdoy
    | partition by _key (as T
        | where _vadoy >= toscalar(T | summarize (_adoy, _vadoy)=arg_min(_adoy, _vadoy) | project _vadoy) and 
          _vadoy <= toscalar(T | summarize (_adoy, _vadoy)=arg_max(_adoy, _vadoy) | project _vadoy)
    )
    | join kind=inner median_tbl on _key, $left._vdom == $right._dom
    | extend _vval = coalesce(_val, p50)
    //| order by _key asc, _vadoy asc     //  for debugging
    | make-series _vval=avg(_vval), _date=any(_date) default=datetime(null) on _vadoy step 1 by _key
    | extend (anomalies, score, baseline) = series_decompose_anomalies(_vval, threshold, 31)
    | mv-expand _date to typeof(datetime), _vval to typeof(real), _vadoy to typeof(long), anomalies to typeof(int), score to typeof(real), baseline to typeof(real)
    | project-away _vadoy
    | project-rename _val=_vval
    | where isnotnull(_date)
};
demo_monthly_ts
| project _key=key, _date=ts, _val=val
| invoke series_monthly_decompose_anomalies_fl()
| project-rename key=_key, ts=_date, val=_val
| render anomalychart with(anomalycolumns=anomalies, xcolumn=ts, ycolumns=val)

Stored

demo_monthly_ts
| project _key=key, _date=ts, _val=val
| invoke series_monthly_decompose_anomalies_fl()
| project-rename key=_key, ts=_date, val=_val
| render anomalychart with(anomalycolumns=anomalies, xcolumn=ts, ycolumns=val)

Output

Series A with monthly anomalies:

Graph of time series ‘A’ with monthly anomalies.

Series B with monthly anomalies:

Graph of time series ‘B’ with monthly anomalies.

5.45 - series_moving_avg_fl()

This article describes series_moving_avg_fl() user-defined function.

Applies a moving average filter on a series.

The function series_moving_avg_fl() is a user-defined function (UDF) that takes an expression containing a dynamic numerical array as input and applies on it a simple moving average filter.

Syntax

series_moving_avg_fl(y_series, n [, center ])

Parameters

NameTypeRequiredDescription
y_seriesdynamic✔️An array cell of numeric values.
nint✔️The width of the moving average filter.
centerboolIndicates whether the moving average is either applied symmetrically on a window before and after the current point or applied on a window from the current point backwards. By default, center is false.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let series_moving_avg_fl = (y_series:dynamic, n:int, center:bool=false)
{
    series_fir(y_series, repeat(1, n), true, center)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Series", docstring = "Calculate moving average of specified width")
series_moving_avg_fl(y_series:dynamic, n:int, center:bool=false)
{
    series_fir(y_series, repeat(1, n), true, center)
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let series_moving_avg_fl = (y_series:dynamic, n:int, center:bool=false)
{
    series_fir(y_series, repeat(1, n), true, center)
};
//
//  Moving average of 5 bins
//
demo_make_series1
| make-series num=count() on TimeStamp step 1h by OsVer
| extend num_ma=series_moving_avg_fl(num, 5, True)
| render timechart 

Stored

//
//  Moving average of 5 bins
//
demo_make_series1
| make-series num=count() on TimeStamp step 1h by OsVer
| extend num_ma=series_moving_avg_fl(num, 5, True)
| render timechart 

Output

Graph depicting moving average of 5 bins.

5.46 - series_moving_var_fl()

This article describes series_moving_var_fl() user-defined function.

Applies a moving variance filter on a series.

The function series_moving_var_fl() is a user-defined function (UDF) that takes an expression containing a dynamic numerical array as input and applies on it a moving variance filter.

Syntax

series_moving_var_fl(y_series, n [, center ])

Parameters

NameTypeRequiredDescription
y_seriesdynamic✔️An array cell of numeric values.
nint✔️The width of the moving variance filter.
centerboolIndicates whether the moving variance is either applied symmetrically on a window before and after the current point or applied on a window from the current point backwards. By default, center is false.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let series_moving_var_fl = (y_series:dynamic, n:int, center:bool=false)
{
    let ey = series_fir(y_series, repeat(1, n), true, center);
    let e2y = series_multiply(ey, ey);
    let y2 = series_multiply(y_series, y_series);
    let ey2 = series_fir(y2, repeat(1, n), true, center);
    let var_series = series_subtract(ey2, e2y);
    var_series
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Series", docstring = "Calculate moving variance of specified width")
series_moving_var_fl(y_series:dynamic, n:int, center:bool=false)
{
    let ey = series_fir(y_series, repeat(1, n), true, center);
    let e2y = series_multiply(ey, ey);
    let y2 = series_multiply(y_series, y_series);
    let ey2 = series_fir(y2, repeat(1, n), true, center);
    let var_series = series_subtract(ey2, e2y);
    var_series
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let series_moving_var_fl = (y_series:dynamic, n:int, center:bool=false)
{
    let ey = series_fir(y_series, repeat(1, n), true, center);
    let e2y = series_multiply(ey, ey);
    let y2 = series_multiply(y_series, y_series);
    let ey2 = series_fir(y2, repeat(1, n), true, center);
    let var_series = series_subtract(ey2, e2y);
    var_series
}
;
let sinewave=(x:double, period:double, gain:double=1.0, phase:double=0.0)
{
    gain*sin(2*pi()/period*(x+phase))
}
;
let n=128;
let T=10;
let window=T*2;
union
(range x from 0 to n-1 step 1 | extend y=sinewave(x, T)),
(range x from n to 2*n-1 step 1 | extend y=0.0),
(range x from 2*n to 3*n-1 step 1 | extend y=sinewave(x, T)),
(range x from 3*n to 4*n-1 step 1 | extend y=(x-3.0*n)/128.0),
(range x from 4*n to 5*n-1 step 1 | extend y=sinewave(x, T))
| order by x asc 
| summarize x=make_list(x), y=make_list(y)
| extend y_var=series_moving_var_fl(y, T, true)
| render linechart  

Stored

let sinewave=(x:double, period:double, gain:double=1.0, phase:double=0.0)
{
    gain*sin(2*pi()/period*(x+phase))
}
;
let n=128;
let T=10;
let window=T*2;
union
(range x from 0 to n-1 step 1 | extend y=sinewave(x, T)),
(range x from n to 2*n-1 step 1 | extend y=0.0),
(range x from 2*n to 3*n-1 step 1 | extend y=sinewave(x, T)),
(range x from 3*n to 4*n-1 step 1 | extend y=(x-3.0*n)/128.0),
(range x from 4*n to 5*n-1 step 1 | extend y=sinewave(x, T))
| order by x asc 
| summarize x=make_list(x), y=make_list(y)
| extend y_var=series_moving_var_fl(y, T, true)
| render linechart

Output

Graph depicting moving variance applied over a sine wave.

5.47 - series_mv_ee_anomalies_fl()

Learn how to use the series_mv_ee_anomalies_fl() user-defined function.

The function series_mv_ee_anomalies_fl() is a user-defined function (UDF) that detects multivariate anomalies in series by applying elliptic envelope model from scikit-learn. This model assumes that the source of the multivariate data is multi-dimensional normal distribution. The function accepts a set of series as numerical dynamic arrays, the names of the features columns and the expected percentage of anomalies out of the whole series. The function builds a multi-dimensional elliptical envelope for each series and marks the points that fall outside this normal envelope as anomalies.

Syntax

T | invoke series_mv_ee_anomalies_fl(features_cols, anomaly_col [, score_col [, anomalies_pct ]])

Parameters

NameTypeRequiredDescription
features_colsdynamic✔️An array containing the names of the columns that are used for the multivariate anomaly detection model.
anomaly_colstring✔️The name of the column to store the detected anomalies.
score_colstringThe name of the column to store the scores of the anomalies.
anomalies_pctrealA real number in the range [0-50] specifying the expected percentage of anomalies in the data. Default value: 4%.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

// Define function
let series_mv_ee_anomalies_fl=(tbl:(*), features_cols:dynamic, anomaly_col:string, score_col:string='', anomalies_pct:real=4.0)
{
    let kwargs = bag_pack('features_cols', features_cols, 'anomaly_col', anomaly_col, 'score_col', score_col, 'anomalies_pct', anomalies_pct);
    let code = ```if 1:
        from sklearn.covariance import EllipticEnvelope
        features_cols = kargs['features_cols']
        anomaly_col = kargs['anomaly_col']
        score_col = kargs['score_col']
        anomalies_pct = kargs['anomalies_pct']
        dff = df[features_cols]
        ellipsoid = EllipticEnvelope(contamination=anomalies_pct/100.0)
        for i in range(len(dff)):
            dffi = dff.iloc[[i], :]
            dffe = dffi.explode(features_cols)
            ellipsoid.fit(dffe)
            df.loc[i, anomaly_col] = (ellipsoid.predict(dffe) < 0).astype(int).tolist()
            if score_col != '':
                df.loc[i, score_col] = ellipsoid.decision_function(dffe).tolist()
        result = df
    ```;
    tbl
    | evaluate hint.distribution=per_node python(typeof(*), code, kwargs)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Series", docstring = "Anomaly Detection for multi dimensional normally distributed data using elliptical envelope model")
series_mv_ee_anomalies_fl(tbl:(*), features_cols:dynamic, anomaly_col:string, score_col:string='', anomalies_pct:real=4.0)
{
    let kwargs = bag_pack('features_cols', features_cols, 'anomaly_col', anomaly_col, 'score_col', score_col, 'anomalies_pct', anomalies_pct);
    let code = ```if 1:
        from sklearn.covariance import EllipticEnvelope
        features_cols = kargs['features_cols']
        anomaly_col = kargs['anomaly_col']
        score_col = kargs['score_col']
        anomalies_pct = kargs['anomalies_pct']
        dff = df[features_cols]
        ellipsoid = EllipticEnvelope(contamination=anomalies_pct/100.0)
        for i in range(len(dff)):
            dffi = dff.iloc[[i], :]
            dffe = dffi.explode(features_cols)
            ellipsoid.fit(dffe)
            df.loc[i, anomaly_col] = (ellipsoid.predict(dffe) < 0).astype(int).tolist()
            if score_col != '':
                df.loc[i, score_col] = ellipsoid.decision_function(dffe).tolist()
        result = df
    ```;
    tbl
    | evaluate hint.distribution=per_node python(typeof(*), code, kwargs)
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

// Define function
let series_mv_ee_anomalies_fl=(tbl:(*), features_cols:dynamic, anomaly_col:string, score_col:string='', anomalies_pct:real=4.0)
{
    let kwargs = bag_pack('features_cols', features_cols, 'anomaly_col', anomaly_col, 'score_col', score_col, 'anomalies_pct', anomalies_pct);
    let code = ```if 1:
        from sklearn.covariance import EllipticEnvelope
        features_cols = kargs['features_cols']
        anomaly_col = kargs['anomaly_col']
        score_col = kargs['score_col']
        anomalies_pct = kargs['anomalies_pct']
        dff = df[features_cols]
        ellipsoid = EllipticEnvelope(contamination=anomalies_pct/100.0)
        for i in range(len(dff)):
            dffi = dff.iloc[[i], :]
            dffe = dffi.explode(features_cols)
            ellipsoid.fit(dffe)
            df.loc[i, anomaly_col] = (ellipsoid.predict(dffe) < 0).astype(int).tolist()
            if score_col != '':
                df.loc[i, score_col] = ellipsoid.decision_function(dffe).tolist()
        result = df
    ```;
    tbl
    | evaluate hint.distribution=per_node python(typeof(*), code, kwargs)
};
// Usage
normal_2d_with_anomalies
| extend anomalies=dynamic(null), scores=dynamic(null)
| invoke series_mv_ee_anomalies_fl(pack_array('x', 'y'), 'anomalies', 'scores')
| extend anomalies=series_multiply(80, anomalies)
| render timechart

Stored

normal_2d_with_anomalies
| extend anomalies=dynamic(null), scores=dynamic(null)
| invoke series_mv_ee_anomalies_fl(pack_array('x', 'y'), 'anomalies', 'scores')
| extend anomalies=series_multiply(80, anomalies)
| render timechart

Output

The table normal_2d_with_anomalies contains a set of 3 time series. Each time series has two-dimensional normal distribution with daily anomalies added at midnight, 8am, and 4pm respectively. You can create this sample dataset using an example query.

Graph showing multivariate anomalies on a time chart.

To view the data as a scatter chart, replace the usage code with the following:

normal_2d_with_anomalies
| extend anomalies=dynamic(null)
| invoke series_mv_ee_anomalies_fl(pack_array('x', 'y'), 'anomalies')
| where name == 'TS1'
| project x, y, anomalies
| mv-expand x to typeof(real), y to typeof(real), anomalies to typeof(string)
| render scatterchart with(series=anomalies)

Graph showing multivariate anomalies on a scatter chart.

You can see that on TS1 most of the midnight anomalies were detected using this multivariate model.

Create a sample dataset

.set normal_2d_with_anomalies <|
//
let window=14d;
let dt=1h;
let n=toint(window/dt);
let rand_normal_fl=(avg:real=0.0, stdv:real=1.0)
{
    let x =rand()+rand()+rand()+rand()+rand()+rand()+rand()+rand()+rand()+rand()+rand()+rand();
    (x - 6)*stdv + avg
};
union
(range s from 0 to n step 1
| project t=startofday(now())-s*dt
| extend x=rand_normal_fl(10, 5)
| extend y=iff(hourofday(t) == 0, 2*(10-x)+7+rand_normal_fl(0, 3), 2*x+7+rand_normal_fl(0, 3))  //  anomalies every midnight
| extend name='TS1'),
(range s from 0 to n step 1
| project t=startofday(now())-s*dt
| extend x=rand_normal_fl(15, 3)
| extend y=iff(hourofday(t) == 8, (15-x)+10+rand_normal_fl(0, 2), x-7+rand_normal_fl(0, 1)) //  anomalies every 8am
| extend name='TS2'),
(range s from 0 to n step 1
| project t=startofday(now())-s*dt
| extend x=rand_normal_fl(8, 6)
| extend y=iff(hourofday(t) == 16, x+5+rand_normal_fl(0, 4), (12-x)+rand_normal_fl(0, 4)) //  anomalies every 4pm
| extend name='TS3')
| summarize t=make_list(t), x=make_list(x), y=make_list(y) by name

Scatter chart of the sample dataset.

5.48 - series_mv_if_anomalies_fl()

This article describes the series_mv_if_anomalies_fl() user-defined function.

The function series_mv_if_anomalies_fl() is a user-defined function (UDF) that detects multivariate anomalies in series by applying isolation forest model from scikit-learn. The function accepts a set of series as numerical dynamic arrays, the names of the features columns and the expected percentage of anomalies out of the whole series. The function builds an ensemble of isolation trees for each series and marks the points that are quickly isolated as anomalies.

Syntax

T | invoke series_mv_if_anomalies_fl(features_cols, anomaly_col [, score_col [, anomalies_pct [, num_trees [, samples_pct ]]]])

Parameters

NameTypeRequiredDescription
features_colsdynamic✔️An array containing the names of the columns that are used for the multivariate anomaly detection model.
anomaly_colstring✔️The name of the column to store the detected anomalies.
score_colstringThe name of the column to store the scores of the anomalies.
anomalies_pctrealA real number in the range [0-50] specifying the expected percentage of anomalies in the data. Default value: 4%.
num_treesintThe number of isolation trees to build for each time series. Default value: 100.
samples_pctrealA real number in the range [0-100] specifying the percentage of samples used to build each tree. Default value: 100%, i.e. use the full series.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

// Define function
let series_mv_if_anomalies_fl=(tbl:(*), features_cols:dynamic, anomaly_col:string, score_col:string='', anomalies_pct:real=4.0, num_trees:int=100, samples_pct:real=100.0)
{
    let kwargs = bag_pack('features_cols', features_cols, 'anomaly_col', anomaly_col, 'score_col', score_col, 'anomalies_pct', anomalies_pct, 'num_trees', num_trees, 'samples_pct', samples_pct);
    let code = ```if 1:
        from sklearn.ensemble import IsolationForest
        features_cols = kargs['features_cols']
        anomaly_col = kargs['anomaly_col']
        score_col = kargs['score_col']
        anomalies_pct = kargs['anomalies_pct']
        num_trees = kargs['num_trees']
        samples_pct = kargs['samples_pct']
        dff = df[features_cols]
        iforest = IsolationForest(contamination=anomalies_pct/100.0, random_state=0, n_estimators=num_trees, max_samples=samples_pct/100.0)
        for i in range(len(dff)):
            dffi = dff.iloc[[i], :]
            dffe = dffi.explode(features_cols)
            iforest.fit(dffe)
            df.loc[i, anomaly_col] = (iforest.predict(dffe) < 0).astype(int).tolist()
            if score_col != '':
                df.loc[i, score_col] = iforest.decision_function(dffe).tolist()
        result = df
    ```;
    tbl
    | evaluate hint.distribution=per_node python(typeof(*), code, kwargs)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Series", docstring = "Anomaly Detection for multi dimensional data using isolation forest model")
series_mv_if_anomalies_fl(tbl:(*), features_cols:dynamic, anomaly_col:string, score_col:string='', anomalies_pct:real=4.0, num_trees:int=100, samples_pct:real=100.0)
{
    let kwargs = bag_pack('features_cols', features_cols, 'anomaly_col', anomaly_col, 'score_col', score_col, 'anomalies_pct', anomalies_pct, 'num_trees', num_trees, 'samples_pct', samples_pct);
    let code = ```if 1:
        from sklearn.ensemble import IsolationForest
        features_cols = kargs['features_cols']
        anomaly_col = kargs['anomaly_col']
        score_col = kargs['score_col']
        anomalies_pct = kargs['anomalies_pct']
        num_trees = kargs['num_trees']
        samples_pct = kargs['samples_pct']
        dff = df[features_cols]
        iforest = IsolationForest(contamination=anomalies_pct/100.0, random_state=0, n_estimators=num_trees, max_samples=samples_pct/100.0)
        for i in range(len(dff)):
            dffi = dff.iloc[[i], :]
            dffe = dffi.explode(features_cols)
            iforest.fit(dffe)
            df.loc[i, anomaly_col] = (iforest.predict(dffe) < 0).astype(int).tolist()
            if score_col != '':
                df.loc[i, score_col] = iforest.decision_function(dffe).tolist()
        result = df
    ```;
    tbl
    | evaluate hint.distribution=per_node python(typeof(*), code, kwargs)
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

// Define function
let series_mv_if_anomalies_fl=(tbl:(*), features_cols:dynamic, anomaly_col:string, score_col:string='', anomalies_pct:real=4.0, num_trees:int=100, samples_pct:real=100.0)
{
    let kwargs = bag_pack('features_cols', features_cols, 'anomaly_col', anomaly_col, 'score_col', score_col, 'anomalies_pct', anomalies_pct, 'num_trees', num_trees, 'samples_pct', samples_pct);
    let code = ```if 1:
        from sklearn.ensemble import IsolationForest
        features_cols = kargs['features_cols']
        anomaly_col = kargs['anomaly_col']
        score_col = kargs['score_col']
        anomalies_pct = kargs['anomalies_pct']
        num_trees = kargs['num_trees']
        samples_pct = kargs['samples_pct']
        dff = df[features_cols]
        iforest = IsolationForest(contamination=anomalies_pct/100.0, random_state=0, n_estimators=num_trees, max_samples=samples_pct/100.0)
        for i in range(len(dff)):
            dffi = dff.iloc[[i], :]
            dffe = dffi.explode(features_cols)
            iforest.fit(dffe)
            df.loc[i, anomaly_col] = (iforest.predict(dffe) < 0).astype(int).tolist()
            if score_col != '':
                df.loc[i, score_col] = iforest.decision_function(dffe).tolist()
        result = df
    ```;
    tbl
    | evaluate hint.distribution=per_node python(typeof(*), code, kwargs)
};
// Usage
normal_2d_with_anomalies
| extend anomalies=dynamic(null), scores=dynamic(null)
| invoke series_mv_if_anomalies_fl(pack_array('x', 'y'), 'anomalies', 'scores', anomalies_pct=8, num_trees=1000)
| extend anomalies=series_multiply(40, anomalies)
| render timechart

Stored

normal_2d_with_anomalies
| extend anomalies=dynamic(null), scores=dynamic(null)
| invoke series_mv_if_anomalies_fl(pack_array('x', 'y'), 'anomalies', 'scores', anomalies_pct=8, num_trees=1000)
| extend anomalies=series_multiply(40, anomalies)
| render timechart

Output

The table normal_2d_with_anomalies contains a set of 3 time series. Each time series has two-dimensional normal distribution with daily anomalies added at midnight, 8am, and 4pm respectively. You can create this sample dataset using an example query.

Graph showing multivariate anomalies on a time chart.

To view the data as a scatter chart, replace the usage code with the following:

normal_2d_with_anomalies
| extend anomalies=dynamic(null)
| invoke series_mv_if_anomalies_fl(pack_array('x', 'y'), 'anomalies')
| where name == 'TS1'
| project x, y, anomalies
| mv-expand x to typeof(real), y to typeof(real), anomalies to typeof(string)
| render scatterchart with(series=anomalies)

Graph showing multivariate anomalies on a scatter chart.

You can see that on TS2 most of the anomalies occurring at 8am were detected using this multivariate model.

5.49 - series_mv_oc_anomalies_fl()

This article describes the series_mv_oc_anomalies_fl() user-defined function.

The function series_mv_oc_anomalies_fl() is a user-defined function (UDF) that detects multivariate anomalies in series by applying the One Class SVM model from scikit-learn. The function accepts a set of series as numerical dynamic arrays, the names of the features columns and the expected percentage of anomalies out of the whole series. The function trains one class SVM for each series and marks the points that fall outside the hyper sphere as anomalies.

Syntax

T | invoke series_mv_oc_anomalies_fl(features_cols, anomaly_col [, score_col [, anomalies_pct ]])

Parameters

NameTypeRequiredDescription
features_colsdynamic✔️An array containing the names of the columns that are used for the multivariate anomaly detection model.
anomaly_colstring✔️The name of the column to store the detected anomalies.
score_colstringThe name of the column to store the scores of the anomalies.
anomalies_pctrealA real number in the range [0-50] specifying the expected percentage of anomalies in the data. Default value: 4%.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let series_mv_oc_anomalies_fl=(tbl:(*), features_cols:dynamic, anomaly_col:string, score_col:string='', anomalies_pct:real=4.0)
{
    let kwargs = bag_pack('features_cols', features_cols, 'anomaly_col', anomaly_col, 'score_col', score_col, 'anomalies_pct', anomalies_pct);
    let code = ```if 1:
        from sklearn.svm import OneClassSVM
        features_cols = kargs['features_cols']
        anomaly_col = kargs['anomaly_col']
        score_col = kargs['score_col']
        anomalies_pct = kargs['anomalies_pct']
        dff = df[features_cols]
        svm = OneClassSVM(nu=anomalies_pct/100.0)
        for i in range(len(dff)):
            dffi = dff.iloc[[i], :]
            dffe = dffi.explode(features_cols)
            svm.fit(dffe)
            df.loc[i, anomaly_col] = (svm.predict(dffe) < 0).astype(int).tolist()
            if score_col != '':
                df.loc[i, score_col] = svm.decision_function(dffe).tolist()
        result = df
    ```;
    tbl
    | evaluate hint.distribution=per_node python(typeof(*), code, kwargs)
};
// Write your query to use the function.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Series", docstring = "Anomaly Detection for multi dimensional data using One Class SVM model")
series_mv_oc_anomalies_fl(tbl:(*), features_cols:dynamic, anomaly_col:string, score_col:string='', anomalies_pct:real=4.0)
{
    let kwargs = bag_pack('features_cols', features_cols, 'anomaly_col', anomaly_col, 'score_col', score_col, 'anomalies_pct', anomalies_pct);
    let code = ```if 1:
        from sklearn.svm import OneClassSVM
        features_cols = kargs['features_cols']
        anomaly_col = kargs['anomaly_col']
        score_col = kargs['score_col']
        anomalies_pct = kargs['anomalies_pct']
        dff = df[features_cols]
        svm = OneClassSVM(nu=anomalies_pct/100.0)
        for i in range(len(dff)):
            dffi = dff.iloc[[i], :]
            dffe = dffi.explode(features_cols)
            svm.fit(dffe)
            df.loc[i, anomaly_col] = (svm.predict(dffe) < 0).astype(int).tolist()
            if score_col != '':
                df.loc[i, score_col] = svm.decision_function(dffe).tolist()
        result = df
    ```;
    tbl
    | evaluate hint.distribution=per_node python(typeof(*), code, kwargs)
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let series_mv_oc_anomalies_fl=(tbl:(*), features_cols:dynamic, anomaly_col:string, score_col:string='', anomalies_pct:real=4.0)
{
    let kwargs = bag_pack('features_cols', features_cols, 'anomaly_col', anomaly_col, 'score_col', score_col, 'anomalies_pct', anomalies_pct);
    let code = ```if 1:
        from sklearn.svm import OneClassSVM
        features_cols = kargs['features_cols']
        anomaly_col = kargs['anomaly_col']
        score_col = kargs['score_col']
        anomalies_pct = kargs['anomalies_pct']
        dff = df[features_cols]
        svm = OneClassSVM(nu=anomalies_pct/100.0)
        for i in range(len(dff)):
            dffi = dff.iloc[[i], :]
            dffe = dffi.explode(features_cols)
            svm.fit(dffe)
            df.loc[i, anomaly_col] = (svm.predict(dffe) < 0).astype(int).tolist()
            if score_col != '':
                df.loc[i, score_col] = svm.decision_function(dffe).tolist()
        result = df
    ```;
    tbl
    | evaluate hint.distribution=per_node python(typeof(*), code, kwargs)
};
// Usage
normal_2d_with_anomalies
| extend anomalies=dynamic(null), scores=dynamic(null)
| invoke series_mv_oc_anomalies_fl(pack_array('x', 'y'), 'anomalies', 'scores', anomalies_pct=6)
| extend anomalies=series_multiply(80, anomalies)
| render timechart

Stored

normal_2d_with_anomalies
| extend anomalies=dynamic(null), scores=dynamic(null)
| invoke series_mv_oc_anomalies_fl(pack_array('x', 'y'), 'anomalies', 'scores', anomalies_pct=6)
| extend anomalies=series_multiply(80, anomalies)
| render timechart

Output

The table normal_2d_with_anomalies contains a set of 3 time series. Each time series has two-dimensional normal distribution with daily anomalies added at midnight, 8am, and 4pm respectively. You can create this sample dataset using an example query.

Graph showing multivariate anomalies on a time chart.

To view the data as a scatter chart, replace the usage code with the following:

normal_2d_with_anomalies
| extend anomalies=dynamic(null)
| invoke series_mv_oc_anomalies_fl(pack_array('x', 'y'), 'anomalies')
| where name == 'TS1'
| project x, y, anomalies
| mv-expand x to typeof(real), y to typeof(real), anomalies to typeof(string)
| render scatterchart with(series=anomalies)

Graph showing multivariate anomalies on a scatter chart.

You can see that on TS1 most of the anomalies occurring at midnights were detected using this multivariate model.

5.50 - series_rate_fl()

This article describes the series_rate_fl() user-defined function.

The function series_rate_fl() is a user-defined function (UDF) that calculates the average rate of metric increase per second. Its logic follows PromQL rate() function. It should be used for time series of counter metrics ingested to your database by Prometheus monitoring system, and retrieved by series_metric_fl().

Syntax

T | invoke series_rate_fl([ n_bins [, fix_reset ]])

T is a table returned from series_metric_fl(). Its schema includes (timestamp:dynamic, name:string, labels:string, value:dynamic).

Parameters

NameTypeRequiredDescription
n_binsintThe number of bins to specify the gap between the extracted metric values for calculation of the rate. The function calculates the difference between the current sample and the one n_bins before, and divide it by the difference of their respective timestamps in seconds. The default is one bin. The default settings calculate irate(), the PromQL instantaneous rate function.
fix_resetboolControls whether to check for counter resets and correct it like PromQL rate() function. The default is true. Set it to false to save redundant analysis in case no need to check for resets.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let series_rate_fl=(tbl:(timestamp:dynamic, value:dynamic), n_bins:int=1, fix_reset:bool=true)
{
    tbl
    | where fix_reset                                                   //  Prometheus counters can only go up
    | mv-apply value to typeof(double) on   
    ( extend correction = iff(value < prev(value), prev(value), 0.0)    // if the value decreases we assume it was reset to 0, so add last value
    | extend cum_correction = row_cumsum(correction)
    | extend corrected_value = value + cum_correction
    | summarize value = make_list(corrected_value))
    | union (tbl | where not(fix_reset))
    | extend timestampS = array_shift_right(timestamp, n_bins), valueS = array_shift_right(value, n_bins)
    | extend dt = series_subtract(timestamp, timestampS)
    | extend dt = series_divide(dt, 1e7)                              //  converts from ticks to seconds
    | extend dv = series_subtract(value, valueS)
    | extend rate = series_divide(dv, dt)
    | project-away dt, dv, timestampS, value, valueS
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create function with (folder = "Packages\\Series", docstring = "Simulate PromQL rate()")
series_rate_fl(tbl:(timestamp:dynamic, value:dynamic), n_bins:int=1, fix_reset:bool=true)
{
    tbl
    | where fix_reset                                                   //  Prometheus counters can only go up
    | mv-apply value to typeof(double) on   
    ( extend correction = iff(value < prev(value), prev(value), 0.0)    // if the value decreases we assume it was reset to 0, so add last value
    | extend cum_correction = row_cumsum(correction)
    | extend corrected_value = value + cum_correction
    | summarize value = make_list(corrected_value))
    | union (tbl | where not(fix_reset))
    | extend timestampS = array_shift_right(timestamp, n_bins), valueS = array_shift_right(value, n_bins)
    | extend dt = series_subtract(timestamp, timestampS)
    | extend dt = series_divide(dt, 1e7)                              //  converts from ticks to seconds
    | extend dv = series_subtract(value, valueS)
    | extend rate = series_divide(dv, dt)
    | project-away dt, dv, timestampS, value, valueS
}

Examples

The following examples use the invoke operator to run the function.

Calculate average rate of metric increase

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let series_rate_fl=(tbl:(timestamp:dynamic, value:dynamic), n_bins:int=1, fix_reset:bool=true)
{
    tbl
    | where fix_reset                                                   //  Prometheus counters can only go up
    | mv-apply value to typeof(double) on   
    ( extend correction = iff(value < prev(value), prev(value), 0.0)    // if the value decreases we assume it was reset to 0, so add last value
    | extend cum_correction = row_cumsum(correction)
    | extend corrected_value = value + cum_correction
    | summarize value = make_list(corrected_value))
    | union (tbl | where not(fix_reset))
    | extend timestampS = array_shift_right(timestamp, n_bins), valueS = array_shift_right(value, n_bins)
    | extend dt = series_subtract(timestamp, timestampS)
    | extend dt = series_divide(dt, 1e7)                              //  converts from ticks to seconds
    | extend dv = series_subtract(value, valueS)
    | extend rate = series_divide(dv, dt)
    | project-away dt, dv, timestampS, value, valueS
};
//
demo_prometheus
| invoke series_metric_fl('TimeStamp', 'Name', 'Labels', 'Val', 'writes', offset=now()-datetime(2020-12-08 00:00))
| invoke series_rate_fl(2)
| render timechart with(series=labels)

Stored

demo_prometheus
| invoke series_metric_fl('TimeStamp', 'Name', 'Labels', 'Val', 'writes', offset=now()-datetime(2020-12-08 00:00))
| invoke series_rate_fl(2)
| render timechart with(series=labels)

Output

Graph showing rate per second of disk write metric for all disks.

Selects the main disk of two hosts

The following example selects the main disk of two hosts, and assumes that the function is already installed. This example uses alternative direct calling syntax, specifying the input table as the first parameter:

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let series_rate_fl=(tbl:(timestamp:dynamic, value:dynamic), n_bins:int=1, fix_reset:bool=true)
{
    tbl
    | where fix_reset                                                   //  Prometheus counters can only go up
    | mv-apply value to typeof(double) on   
    ( extend correction = iff(value < prev(value), prev(value), 0.0)    // if the value decreases we assume it was reset to 0, so add last value
    | extend cum_correction = row_cumsum(correction)
    | extend corrected_value = value + cum_correction
    | summarize value = make_list(corrected_value))
    | union (tbl | where not(fix_reset))
    | extend timestampS = array_shift_right(timestamp, n_bins), valueS = array_shift_right(value, n_bins)
    | extend dt = series_subtract(timestamp, timestampS)
    | extend dt = series_divide(dt, 1e7)                              //  converts from ticks to seconds
    | extend dv = series_subtract(value, valueS)
    | extend rate = series_divide(dv, dt)
    | project-away dt, dv, timestampS, value, valueS
};
//
series_rate_fl(series_metric_fl(demo_prometheus, 'TimeStamp', 'Name', 'Labels', 'Val', 'writes', '"disk":"sda1"', lookback=2h, offset=now()-datetime(2020-12-08 00:00)), n_bins=10)
| render timechart with(series=labels)

Stored

series_rate_fl(series_metric_fl(demo_prometheus, 'TimeStamp', 'Name', 'Labels', 'Val', 'writes', '"disk":"sda1"', lookback=2h, offset=now()-datetime(2020-12-08 00:00)), n_bins=10)
| render timechart with(series=labels)

Output

Graph showing rate per second of main disk write metric in the last two hours with 10 bins gap.

5.51 - series_rolling_fl()

This article describes the series_rolling_fl() user-defined function.

The function series_rolling_fl() is a user-defined function (UDF) that applies rolling aggregation on a series. It takes a table containing multiple series (dynamic numerical array) and applies, for each series, a rolling aggregation function.

Syntax

T | invoke series_rolling_fl(y_series, y_rolling_series, n, aggr, aggr_params, center)

Parameters

NameTypeRequiredDescription
y_seriesstring✔️The name of the column that contains the series to fit.
y_rolling_seriesstring✔️The name of the column to store the rolling aggregation series.
nint✔️The width of the rolling window.
aggrstring✔️The name of the aggregation function to use. See aggregation functions.
aggr_paramsstringOptional parameters for the aggregation function.
centerboolIndicates whether the rolling window is applied symmetrically before and after the current point or applied from the current point backwards. By default, center is false, for calculation on streaming data.

Aggregation functions

This function supports any aggregation function from numpy or scipy.stats that calculates a scalar out of a series. The following list isn’t exhaustive:

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let series_rolling_fl = (tbl:(*), y_series:string, y_rolling_series:string, n:int, aggr:string, aggr_params:dynamic=dynamic([null]), center:bool=true)
{
    let kwargs = bag_pack('y_series', y_series, 'y_rolling_series', y_rolling_series, 'n', n, 'aggr', aggr, 'aggr_params', aggr_params, 'center', center);
    let code = ```if 1:
        y_series = kargs["y_series"]
        y_rolling_series = kargs["y_rolling_series"]
        n = kargs["n"]
        aggr = kargs["aggr"]
        aggr_params = kargs["aggr_params"]
        center = kargs["center"]
        result = df
        in_s = df[y_series]
        func = getattr(np, aggr, None)
        if not func:
            import scipy.stats
            func = getattr(scipy.stats, aggr)
        if func:
            result[y_rolling_series] = list(pd.Series(in_s[i]).rolling(n, center=center, min_periods=1).apply(func, args=tuple(aggr_params)).values for i in range(len(in_s)))
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Series", docstring = "Rolling window functions on a series")
series_rolling_fl(tbl:(*), y_series:string, y_rolling_series:string, n:int, aggr:string, aggr_params:dynamic, center:bool=true)
{
    let kwargs = bag_pack('y_series', y_series, 'y_rolling_series', y_rolling_series, 'n', n, 'aggr', aggr, 'aggr_params', aggr_params, 'center', center);
    let code = ```if 1:
        y_series = kargs["y_series"]
        y_rolling_series = kargs["y_rolling_series"]
        n = kargs["n"]
        aggr = kargs["aggr"]
        aggr_params = kargs["aggr_params"]
        center = kargs["center"]
        result = df
        in_s = df[y_series]
        func = getattr(np, aggr, None)
        if not func:
            import scipy.stats
            func = getattr(scipy.stats, aggr)
        if func:
            result[y_rolling_series] = list(pd.Series(in_s[i]).rolling(n, center=center, min_periods=1).apply(func, args=tuple(aggr_params)).values for i in range(len(in_s)))
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
}

Examples

The following examples use the invoke operator to run the function.

Calculate rolling median of 9 elements

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let series_rolling_fl = (tbl:(*), y_series:string, y_rolling_series:string, n:int, aggr:string, aggr_params:dynamic=dynamic([null]), center:bool=true)
{
    let kwargs = bag_pack('y_series', y_series, 'y_rolling_series', y_rolling_series, 'n', n, 'aggr', aggr, 'aggr_params', aggr_params, 'center', center);
    let code = ```if 1:
        y_series = kargs["y_series"]
        y_rolling_series = kargs["y_rolling_series"]
        n = kargs["n"]
        aggr = kargs["aggr"]
        aggr_params = kargs["aggr_params"]
        center = kargs["center"]
        result = df
        in_s = df[y_series]
        func = getattr(np, aggr, None)
        if not func:
            import scipy.stats
            func = getattr(scipy.stats, aggr)
        if func:
            result[y_rolling_series] = list(pd.Series(in_s[i]).rolling(n, center=center, min_periods=1).apply(func, args=tuple(aggr_params)).values for i in range(len(in_s)))
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
};
//
//  Calculate rolling median of 9 elements
//
demo_make_series1
| make-series num=count() on TimeStamp step 1h by OsVer
| extend rolling_med = dynamic(null)
| invoke series_rolling_fl('num', 'rolling_med', 9, 'median')
| render timechart

Stored

//
//  Calculate rolling median of 9 elements
//
demo_make_series1
| make-series num=count() on TimeStamp step 1h by OsVer
| extend rolling_med = dynamic(null)
| invoke series_rolling_fl('num', 'rolling_med', 9, 'median', dynamic([null]))
| render timechart

Output

Graph depicting rolling median of 9 elements.

Calculate rolling min, max & 75th percentile of 15 elements

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let series_rolling_fl = (tbl:(*), y_series:string, y_rolling_series:string, n:int, aggr:string, aggr_params:dynamic=dynamic([null]), center:bool=true)
{
    let kwargs = bag_pack('y_series', y_series, 'y_rolling_series', y_rolling_series, 'n', n, 'aggr', aggr, 'aggr_params', aggr_params, 'center', center);
    let code = ```if 1:
        y_series = kargs["y_series"]
        y_rolling_series = kargs["y_rolling_series"]
        n = kargs["n"]
        aggr = kargs["aggr"]
        aggr_params = kargs["aggr_params"]
        center = kargs["center"]
        result = df
        in_s = df[y_series]
        func = getattr(np, aggr, None)
        if not func:
            import scipy.stats
            func = getattr(scipy.stats, aggr)
        if func:
            result[y_rolling_series] = list(pd.Series(in_s[i]).rolling(n, center=center, min_periods=1).apply(func, args=tuple(aggr_params)).values for i in range(len(in_s)))
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
};
//
//  Calculate rolling min, max & 75th percentile of 15 elements
//
demo_make_series1
| make-series num=count() on TimeStamp step 1h by OsVer
| extend rolling_min = dynamic(null), rolling_max = dynamic(null), rolling_pct = dynamic(null)
| invoke series_rolling_fl('num', 'rolling_min', 15, 'min', dynamic([null]))
| invoke series_rolling_fl('num', 'rolling_max', 15, 'max', dynamic([null]))
| invoke series_rolling_fl('num', 'rolling_pct', 15, 'percentile', dynamic([75]))
| render timechart

Stored

//
//  Calculate rolling min, max & 75th percentile of 15 elements
//
demo_make_series1
| make-series num=count() on TimeStamp step 1h by OsVer
| extend rolling_min = dynamic(null), rolling_max = dynamic(null), rolling_pct = dynamic(null)
| invoke series_rolling_fl('num', 'rolling_min', 15, 'min', dynamic([null]))
| invoke series_rolling_fl('num', 'rolling_max', 15, 'max', dynamic([null]))
| invoke series_rolling_fl('num', 'rolling_pct', 15, 'percentile', dynamic([75]))
| render timechart

Output

Graph depicting rolling min, max & 75th percentile of 15 elements.

Calculate the rolling trimmed mean

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let series_rolling_fl = (tbl:(*), y_series:string, y_rolling_series:string, n:int, aggr:string, aggr_params:dynamic=dynamic([null]), center:bool=true)
{
    let kwargs = bag_pack('y_series', y_series, 'y_rolling_series', y_rolling_series, 'n', n, 'aggr', aggr, 'aggr_params', aggr_params, 'center', center);
    let code = ```if 1:
        y_series = kargs["y_series"]
        y_rolling_series = kargs["y_rolling_series"]
        n = kargs["n"]
        aggr = kargs["aggr"]
        aggr_params = kargs["aggr_params"]
        center = kargs["center"]
        result = df
        in_s = df[y_series]
        func = getattr(np, aggr, None)
        if not func:
            import scipy.stats
            func = getattr(scipy.stats, aggr)
        if func:
            result[y_rolling_series] = list(pd.Series(in_s[i]).rolling(n, center=center, min_periods=1).apply(func, args=tuple(aggr_params)).values for i in range(len(in_s)))
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
};
range x from 1 to 100 step 1
| extend y=iff(x % 13 == 0, 2.0, iff(x % 23 == 0, -2.0, rand()))
| summarize x=make_list(x), y=make_list(y)
| extend yr = dynamic(null)
| invoke series_rolling_fl('y', 'yr', 7, 'tmean', pack_array(pack_array(-2, 2), pack_array(false, false))) //  trimmed mean: ignoring values outside [-2,2] inclusive
| render linechart

Stored

range x from 1 to 100 step 1
| extend y=iff(x % 13 == 0, 2.0, iff(x % 23 == 0, -2.0, rand()))
| summarize x=make_list(x), y=make_list(y)
| extend yr = dynamic(null)
| invoke series_rolling_fl('y', 'yr', 7, 'tmean', pack_array(pack_array(-2, 2), pack_array(false, false))) //  trimmed mean: ignoring values outside [-2,2] inclusive
| render linechart

Output

Graph depicting rolling trimmed mean.

5.52 - series_shapes_fl()

This article describes the series_shapes_fl() user-defined function.

The function series_shapes_fl() is a user-defined function (UDF) that detects positive/negative trend or jump in a series. This function takes a table containing multiple time series (dynamic numerical array), and calculates trend and jump scores for each series. The output is a dictionary (dynamic) containing the scores.

Syntax

T | extend series_shapes_fl(y_series, advanced)

Parameters

NameTypeRequiredDescription
y_seriesdynamic✔️An array cell of numeric values.
advancedboolThe default is false. Set to true to output additional calculated parameters.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let series_shapes_fl=(series:dynamic, advanced:bool=false)
{
    let n = array_length(series);
//  calculate normal dynamic range between 10th and 90th percentiles
    let xs = array_sort_asc(series);
    let low_idx = tolong(n*0.1);
    let high_idx = tolong(n*0.9);
    let low_pct = todouble(xs[low_idx]);
    let high_pct = todouble(xs[high_idx]);
    let norm_range = high_pct-low_pct;
//  trend score
    let lf = series_fit_line_dynamic(series);
    let slope = todouble(lf.slope);
    let rsquare = todouble(lf.rsquare);
    let rel_slope = abs(n*slope/norm_range);
    let sign_slope = iff(slope >= 0.0, 1.0, -1.0);
    let norm_slope = sign_slope*rel_slope/(rel_slope+0.1);  //  map rel_slope from [-Inf, +Inf] to [-1, 1]; 0.1 is a clibration constant
    let trend_score = norm_slope*rsquare;
//  jump score
    let lf2=series_fit_2lines_dynamic(series);
    let lslope = todouble(lf2.left.slope);
    let rslope = todouble(lf2.right.slope);
    let rsquare2 = todouble(lf2.rsquare);
    let split_idx = tolong(lf2.split_idx);
    let last_left = todouble(lf2.left.interception)+lslope*split_idx;
    let first_right = todouble(lf2.right.interception)+rslope;
    let jump = first_right-last_left;
    let rel_jump = abs(jump/norm_range);
    let sign_jump = iff(first_right >= last_left, 1.0, -1.0);
    let norm_jump = sign_jump*rel_jump/(rel_jump+0.1);  //  map rel_jump from [-Inf, +Inf] to [-1, 1]; 0.1 is a clibration constant
    let jump_score1 = norm_jump*rsquare2;
//  filter for jumps that are not close to the series edges and the right slope has the same direction
    let norm_rslope = abs(rslope/norm_range);
    let jump_score = iff((sign_jump*rslope >= 0.0 or norm_rslope < 0.02) and split_idx between((0.1*n)..(0.9*n)), jump_score1, 0.0);
    let res = iff(advanced, bag_pack("n", n, "low_pct", low_pct, "high_pct", high_pct, "norm_range", norm_range, "slope", slope, "rsquare", rsquare, "rel_slope", rel_slope, "norm_slope", norm_slope,
                              "trend_score", trend_score, "split_idx", split_idx, "jump", jump, "rsquare2", rsquare2, "last_left", last_left, "first_right", first_right, "rel_jump", rel_jump,
                              "lslope", lslope, "rslope", rslope, "norm_rslope", norm_rslope, "norm_jump", norm_jump, "jump_score", jump_score)
                              , bag_pack("trend_score", trend_score, "jump_score", jump_score));
    res
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Series", docstring = "Series detector for positive/negative trend or step. Returns a dynamic with trend and jump scores")
series_shapes_fl(series:dynamic, advanced:bool=false)
{
    let n = array_length(series);
//  calculate normal dynamic range between 10th and 90th percentiles
    let xs = array_sort_asc(series);
    let low_idx = tolong(n*0.1);
    let high_idx = tolong(n*0.9);
    let low_pct = todouble(xs[low_idx]);
    let high_pct = todouble(xs[high_idx]);
    let norm_range = high_pct-low_pct;
//  trend score
    let lf = series_fit_line_dynamic(series);
    let slope = todouble(lf.slope);
    let rsquare = todouble(lf.rsquare);
    let rel_slope = abs(n*slope/norm_range);
    let sign_slope = iff(slope >= 0.0, 1.0, -1.0);
    let norm_slope = sign_slope*rel_slope/(rel_slope+0.1);  //  map rel_slope from [-Inf, +Inf] to [-1, 1]; 0.1 is a clibration constant
    let trend_score = norm_slope*rsquare;
//  jump score
    let lf2=series_fit_2lines_dynamic(series);
    let lslope = todouble(lf2.left.slope);
    let rslope = todouble(lf2.right.slope);
    let rsquare2 = todouble(lf2.rsquare);
    let split_idx = tolong(lf2.split_idx);
    let last_left = todouble(lf2.left.interception)+lslope*split_idx;
    let first_right = todouble(lf2.right.interception)+rslope;
    let jump = first_right-last_left;
    let rel_jump = abs(jump/norm_range);
    let sign_jump = iff(first_right >= last_left, 1.0, -1.0);
    let norm_jump = sign_jump*rel_jump/(rel_jump+0.1);  //  map rel_jump from [-Inf, +Inf] to [-1, 1]; 0.1 is a clibration constant
    let jump_score1 = norm_jump*rsquare2;
//  filter for jumps that are not close to the series edges and the right slope has the same direction
    let norm_rslope = abs(rslope/norm_range);
    let jump_score = iff((sign_jump*rslope >= 0.0 or norm_rslope < 0.02) and split_idx between((0.1*n)..(0.9*n)), jump_score1, 0.0);
    let res = iff(advanced, bag_pack("n", n, "low_pct", low_pct, "high_pct", high_pct, "norm_range", norm_range, "slope", slope, "rsquare", rsquare, "rel_slope", rel_slope, "norm_slope", norm_slope,
                              "trend_score", trend_score, "split_idx", split_idx, "jump", jump, "rsquare2", rsquare2, "last_left", last_left, "first_right", first_right, "rel_jump", rel_jump,
                              "lslope", lslope, "rslope", rslope, "norm_rslope", norm_rslope, "norm_jump", norm_jump, "jump_score", jump_score)
                              , bag_pack("trend_score", trend_score, "jump_score", jump_score));
    res
}

Example

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let series_shapes_fl=(series:dynamic, advanced:bool=false)
{
    let n = array_length(series);
//  calculate normal dynamic range between 10th and 90th percentiles
    let xs = array_sort_asc(series);
    let low_idx = tolong(n*0.1);
    let high_idx = tolong(n*0.9);
    let low_pct = todouble(xs[low_idx]);
    let high_pct = todouble(xs[high_idx]);
    let norm_range = high_pct-low_pct;
//  trend score
    let lf = series_fit_line_dynamic(series);
    let slope = todouble(lf.slope);
    let rsquare = todouble(lf.rsquare);
    let rel_slope = abs(n*slope/norm_range);
    let sign_slope = iff(slope >= 0.0, 1.0, -1.0);
    let norm_slope = sign_slope*rel_slope/(rel_slope+0.1);  //  map rel_slope from [-Inf, +Inf] to [-1, 1]; 0.1 is a clibration constant
    let trend_score = norm_slope*rsquare;
//  jump score
    let lf2=series_fit_2lines_dynamic(series);
    let lslope = todouble(lf2.left.slope);
    let rslope = todouble(lf2.right.slope);
    let rsquare2 = todouble(lf2.rsquare);
    let split_idx = tolong(lf2.split_idx);
    let last_left = todouble(lf2.left.interception)+lslope*split_idx;
    let first_right = todouble(lf2.right.interception)+rslope;
    let jump = first_right-last_left;
    let rel_jump = abs(jump/norm_range);
    let sign_jump = iff(first_right >= last_left, 1.0, -1.0);
    let norm_jump = sign_jump*rel_jump/(rel_jump+0.1);  //  map rel_jump from [-Inf, +Inf] to [-1, 1]; 0.1 is a clibration constant
    let jump_score1 = norm_jump*rsquare2;
//  filter for jumps that are not close to the series edges and the right slope has the same direction
    let norm_rslope = abs(rslope/norm_range);
    let jump_score = iff((sign_jump*rslope >= 0.0 or norm_rslope < 0.02) and split_idx between((0.1*n)..(0.9*n)), jump_score1, 0.0);
    let res = iff(advanced, bag_pack("n", n, "low_pct", low_pct, "high_pct", high_pct, "norm_range", norm_range, "slope", slope, "rsquare", rsquare, "rel_slope", rel_slope, "norm_slope", norm_slope,
                              "trend_score", trend_score, "split_idx", split_idx, "jump", jump, "rsquare2", rsquare2, "last_left", last_left, "first_right", first_right, "rel_jump", rel_jump,
                              "lslope", lslope, "rslope", rslope, "norm_rslope", norm_rslope, "norm_jump", norm_jump, "jump_score", jump_score)
                              , bag_pack("trend_score", trend_score, "jump_score", jump_score));
    res
};
let ts_len = 100;
let noise_pct = 2;
let noise_gain = 3;
union
(print tsid=1 | extend y = array_concat(repeat(20, ts_len/2), repeat(150, ts_len/2))),
(print tsid=2 | extend y = array_concat(repeat(0, ts_len*3/4), repeat(-50, ts_len/4))),
(print tsid=3 | extend y = range(40, 139, 1)),
(print tsid=4 | extend y = range(-20, -109, -1))
| extend x = range(1, array_length(y), 1)
//
| extend shapes = series_shapes_fl(y)
| order by tsid asc 
| fork (take 4) (project tsid, shapes)
| render timechart with(series=tsid, xcolumn=x, ycolumns=y)

Stored

let ts_len = 100;
let noise_pct = 2;
let noise_gain = 3;
union
(print tsid=1 | extend y = array_concat(repeat(20, ts_len/2), repeat(150, ts_len/2))),
(print tsid=2 | extend y = array_concat(repeat(0, ts_len*3/4), repeat(-50, ts_len/4))),
(print tsid=3 | extend y = range(40, 139, 1)),
(print tsid=4 | extend y = range(-20, -109, -1))
| extend x = range(1, array_length(y), 1)
//
| extend shapes = series_shapes_fl(y)
| order by tsid asc 
| fork (take 4) (project tsid, shapes)
| render timechart with(series=tsid, xcolumn=x, ycolumns=y)

Output

Graph showing 4 time series with trends and jumps.

The respective trend and jump scores:

tsid	shapes
1	    {
          "trend_score": 0.703199714530169,
          "jump_score": 0.90909090909090906
        }
2	    {
          "trend_score": -0.51663751343174869,
          "jump_score": -0.90909090909090906
        }
3	    {
          "trend_score": 0.92592592592592582,
          "jump_score": 0.0
        }
4	    {
          "trend_score": -0.92592592592592582,
          "jump_score": 0.0
        }

5.53 - series_uv_anomalies_fl()

This article describes the series_uv_anomalies_fl() user-defined function.

The function series_uv_anomalies_fl() is a user-defined function (UDF) that detects anomalies in time series by calling the Univariate Anomaly Detection API, part of Azure Cognitive Services. The function accepts a limited set of time series as numerical dynamic arrays and the required anomaly detection sensitivity level. Each time series is converted into the required JSON format and posts it to the Anomaly Detector service endpoint. The service response contains dynamic arrays of high/low/all anomalies, the modeled baseline time series, its normal high/low boundaries (a value above or below the high/low boundary is an anomaly) and the detected seasonality.

Prerequisites

In the following function example, replace YOUR-AD-RESOURCE-NAME in the uri and YOUR-KEY in the Ocp-Apim-Subscription-Key of the header with your Anomaly Detector resource name and key.

Syntax

T | invoke series_uv_anomalies_fl(y_series [, sensitivity [, tsid]])

Parameters

NameTypeRequiredDescription
y_seriesstring✔️The name of the input table column containing the values of the series to be anomaly detected.
sensitivityintegerAn integer in the range [0-100] specifying the anomaly detection sensitivity. 0 is the least sensitive detection, while 100 is the most sensitive indicating even a small deviation from the expected baseline would be tagged as anomaly. Default value: 85
tsidstringThe name of the input table column containing the time series ID. Can be omitted when analyzing a single time series.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let series_uv_anomalies_fl=(tbl:(*), y_series:string, sensitivity:int=85, tsid:string='_tsid')
{
    let uri = 'https://YOUR-AD-RESOURCE-NAME.cognitiveservices.azure.com/anomalydetector/v1.0/timeseries/entire/detect';
    let headers=dynamic({'Ocp-Apim-Subscription-Key': h'YOUR-KEY'});
    let kwargs = bag_pack('y_series', y_series, 'sensitivity', sensitivity);
    let code = ```if 1:
        import json
        y_series = kargs["y_series"]
        sensitivity = kargs["sensitivity"]
        json_str = []
        for i in range(len(df)):
            row = df.iloc[i, :]
            ts = [{'value':row[y_series][j]} for j in range(len(row[y_series]))]
            json_data = {'series': ts, "sensitivity":sensitivity}     # auto-detect period, or we can force 'period': 84. We can also add 'maxAnomalyRatio':0.25 for maximum 25% anomalies
            json_str = json_str + [json.dumps(json_data)]
        result = df
        result['json_str'] = json_str
    ```;
    tbl
    | evaluate python(typeof(*, json_str:string), code, kwargs)
    | extend _tsid = column_ifexists(tsid, 1)
    | partition by _tsid (
       project json_str
       | evaluate http_request_post(uri, headers, dynamic(null))
       | project period=ResponseBody.period, baseline_ama=ResponseBody.expectedValues, ad_ama=series_add(0, ResponseBody.isAnomaly), pos_ad_ama=series_add(0, ResponseBody.isPositiveAnomaly)
       , neg_ad_ama=series_add(0, ResponseBody.isNegativeAnomaly), upper_ama=series_add(ResponseBody.expectedValues, ResponseBody.upperMargins), lower_ama=series_subtract(ResponseBody.expectedValues, ResponseBody.lowerMargins)
       | extend _tsid=toscalar(_tsid)
      )
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Series", docstring = "Time Series Anomaly Detection by Azure Cognitive Service")
series_uv_anomalies_fl(tbl:(*), y_series:string, sensitivity:int=85, tsid:string='_tsid')
{
    let uri = 'https://YOUR-AD-RESOURCE-NAME.cognitiveservices.azure.com/anomalydetector/v1.0/timeseries/entire/detect';
    let headers=dynamic({'Ocp-Apim-Subscription-Key': h'YOUR-KEY'});
    let kwargs = bag_pack('y_series', y_series, 'sensitivity', sensitivity);
    let code = ```if 1:
        import json
        y_series = kargs["y_series"]
        sensitivity = kargs["sensitivity"]
        json_str = []
        for i in range(len(df)):
            row = df.iloc[i, :]
            ts = [{'value':row[y_series][j]} for j in range(len(row[y_series]))]
            json_data = {'series': ts, "sensitivity":sensitivity}     # auto-detect period, or we can force 'period': 84. We can also add 'maxAnomalyRatio':0.25 for maximum 25% anomalies
            json_str = json_str + [json.dumps(json_data)]
        result = df
        result['json_str'] = json_str
    ```;
    tbl
    | evaluate python(typeof(*, json_str:string), code, kwargs)
    | extend _tsid = column_ifexists(tsid, 1)
    | partition by _tsid (
       project json_str
       | evaluate http_request_post(uri, headers, dynamic(null))
       | project period=ResponseBody.period, baseline_ama=ResponseBody.expectedValues, ad_ama=series_add(0, ResponseBody.isAnomaly), pos_ad_ama=series_add(0, ResponseBody.isPositiveAnomaly)
       , neg_ad_ama=series_add(0, ResponseBody.isNegativeAnomaly), upper_ama=series_add(ResponseBody.expectedValues, ResponseBody.upperMargins), lower_ama=series_subtract(ResponseBody.expectedValues, ResponseBody.lowerMargins)
       | extend _tsid=toscalar(_tsid)
      )
}

Examples

The following examples use the invoke operator to run the function.

Use series_uv_anomalies_fl() to detect anomalies

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let series_uv_anomalies_fl=(tbl:(*), y_series:string, sensitivity:int=85, tsid:string='_tsid')
{
    let uri = 'https://YOUR-AD-RESOURCE-NAME.cognitiveservices.azure.com/anomalydetector/v1.0/timeseries/entire/detect';
    let headers=dynamic({'Ocp-Apim-Subscription-Key': h'YOUR-KEY'});
    let kwargs = bag_pack('y_series', y_series, 'sensitivity', sensitivity);
    let code = ```if 1:
        import json
        y_series = kargs["y_series"]
        sensitivity = kargs["sensitivity"]
        json_str = []
        for i in range(len(df)):
            row = df.iloc[i, :]
            ts = [{'value':row[y_series][j]} for j in range(len(row[y_series]))]
            json_data = {'series': ts, "sensitivity":sensitivity}     # auto-detect period, or we can force 'period': 84. We can also add 'maxAnomalyRatio':0.25 for maximum 25% anomalies
            json_str = json_str + [json.dumps(json_data)]
        result = df
        result['json_str'] = json_str
    ```;
    tbl
    | evaluate python(typeof(*, json_str:string), code, kwargs)
    | extend _tsid = column_ifexists(tsid, 1)
    | partition by _tsid (
       project json_str
       | evaluate http_request_post(uri, headers, dynamic(null))
       | project period=ResponseBody.period, baseline_ama=ResponseBody.expectedValues, ad_ama=series_add(0, ResponseBody.isAnomaly), pos_ad_ama=series_add(0, ResponseBody.isPositiveAnomaly)
       , neg_ad_ama=series_add(0, ResponseBody.isNegativeAnomaly), upper_ama=series_add(ResponseBody.expectedValues, ResponseBody.upperMargins), lower_ama=series_subtract(ResponseBody.expectedValues, ResponseBody.lowerMargins)
       | extend _tsid=toscalar(_tsid)
      )
};
let etime=datetime(2017-03-02);
let stime=datetime(2017-01-01);
let dt=1h;
let ts = requests
| make-series value=avg(value) on timestamp from stime to etime step dt
| extend _tsid='TS1';
ts
| invoke series_uv_anomalies_fl('value')
| lookup ts on _tsid
| render anomalychart with(xcolumn=timestamp, ycolumns=value, anomalycolumns=ad_ama)

Stored

let etime=datetime(2017-03-02);
let stime=datetime(2017-01-01);
let dt=1h;
let ts = requests
| make-series value=avg(value) on timestamp from stime to etime step dt
| extend _tsid='TS1';
ts
| invoke series_uv_anomalies_fl('value')
| lookup ts on _tsid
| render anomalychart with(xcolumn=timestamp, ycolumns=value, anomalycolumns=ad_ama)

Output

Graph showing anomalies on a time series.

Compare series_uv_anomalies_fl() and native series_decompose_anomalies()

The following example compares the Univariate Anomaly Detection API to the native series_decompose_anomalies() function over three time series and assumes the series_uv_anomalies_fl() function is already defined in the database:

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let series_uv_anomalies_fl=(tbl:(*), y_series:string, sensitivity:int=85, tsid:string='_tsid')
{
    let uri = 'https://YOUR-AD-RESOURCE-NAME.cognitiveservices.azure.com/anomalydetector/v1.0/timeseries/entire/detect';
    let headers=dynamic({'Ocp-Apim-Subscription-Key': h'YOUR-KEY'});
    let kwargs = bag_pack('y_series', y_series, 'sensitivity', sensitivity);
    let code = ```if 1:
        import json
        y_series = kargs["y_series"]
        sensitivity = kargs["sensitivity"]
        json_str = []
        for i in range(len(df)):
            row = df.iloc[i, :]
            ts = [{'value':row[y_series][j]} for j in range(len(row[y_series]))]
            json_data = {'series': ts, "sensitivity":sensitivity}     # auto-detect period, or we can force 'period': 84. We can also add 'maxAnomalyRatio':0.25 for maximum 25% anomalies
            json_str = json_str + [json.dumps(json_data)]
        result = df
        result['json_str'] = json_str
    ```;
    tbl
    | evaluate python(typeof(*, json_str:string), code, kwargs)
    | extend _tsid = column_ifexists(tsid, 1)
    | partition by _tsid (
       project json_str
       | evaluate http_request_post(uri, headers, dynamic(null))
       | project period=ResponseBody.period, baseline_ama=ResponseBody.expectedValues, ad_ama=series_add(0, ResponseBody.isAnomaly), pos_ad_ama=series_add(0, ResponseBody.isPositiveAnomaly)
       , neg_ad_ama=series_add(0, ResponseBody.isNegativeAnomaly), upper_ama=series_add(ResponseBody.expectedValues, ResponseBody.upperMargins), lower_ama=series_subtract(ResponseBody.expectedValues, ResponseBody.lowerMargins)
       | extend _tsid=toscalar(_tsid)
      )
};
let ts = demo_make_series2
| summarize TimeStamp=make_list(TimeStamp), num=make_list(num) by sid;
ts
| invoke series_uv_anomalies_fl('num', 'sid', 90)
| join ts on $left._tsid == $right.sid
| project-away _tsid
| extend (ad_adx, score_adx, baseline_adx)=series_decompose_anomalies(num, 1.5, -1, 'linefit')
| project-reorder num, *
| render anomalychart with(series=sid, xcolumn=TimeStamp, ycolumns=num, baseline_adx, baseline_ama, lower_ama, upper_ama, anomalycolumns=ad_adx, ad_ama)

Stored

let ts = demo_make_series2
| summarize TimeStamp=make_list(TimeStamp), num=make_list(num) by sid;
ts
| invoke series_uv_anomalies_fl('num', 'sid', 90)
| join ts on $left._tsid == $right.sid
| project-away _tsid
| extend (ad_adx, score_adx, baseline_adx)=series_decompose_anomalies(num, 1.5, -1, 'linefit')
| project-reorder num, *
| render anomalychart with(series=sid, xcolumn=TimeStamp, ycolumns=num, baseline_adx, baseline_ama, lower_ama, upper_ama, anomalycolumns=ad_adx, ad_ama)

Output

The following graph shows anomalies detected by the Univariate Anomaly Detection API on TS1. You can also select TS2 or TS3 in the chart filter box.

Graph showing anomalies using the Univariate A P I on a time series.

The following graph shows the anomalies detected by native function on TS1.

Graph showing anomalies using the native function on a time series.

5.54 - series_uv_change_points_fl()

This article describes the series_uv_change_points_fl() user-defined function.

The function series_uv_change_points_fl() is a user-defined function (UDF) that finds change points in time series by calling the Univariate Anomaly Detection API, part of Azure Cognitive Services. The function accepts a limited set of time series as numerical dynamic arrays, the change point detection threshold, and the minimum size of the stable trend window. Each time series is converted into the required JSON format and posts it to the Anomaly Detector service endpoint. The service response contains dynamic arrays of change points, their respective confidence, and the detected seasonality.

Prerequisites

Syntax

T | invoke series_uv_change_points_fl(y_series [, score_threshold [, trend_window [, tsid]]])

Parameters

NameTypeRequiredDescription
y_seriesstring✔️The name of the input table column containing the values of the series to be anomaly detected.
score_thresholdrealA value specifying the minimum confidence to declare a change point. Each point whose confidence is above the threshold is defined as a change point. Default value: 0.9
trend_windowintegerA value specifying the minimal window size for robust calculation of trend changes. Default value: 5
tsidstringThe name of the input table column containing the time series ID. Can be omitted when analyzing a single time series.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required. In the following function definition, replace YOUR-AD-RESOURCE-NAME in the uri and YOUR-KEY in the Ocp-Apim-Subscription-Key of the header with your Anomaly Detector resource name and key.

let series_uv_change_points_fl=(tbl:(*), y_series:string, score_threshold:real=0.9, trend_window:int=5, tsid:string='_tsid')
{
    let uri = 'https://YOUR-AD-RESOURCE-NAME.cognitiveservices.azure.com/anomalydetector/v1.0/timeseries/changepoint/detect';
    let headers=dynamic({'Ocp-Apim-Subscription-Key': h'YOUR-KEY'});
    let kwargs = bag_pack('y_series', y_series, 'score_threshold', score_threshold, 'trend_window', trend_window);
    let code = ```if 1:
        import json
        y_series = kargs["y_series"]
        score_threshold = kargs["score_threshold"]
        trend_window = kargs["trend_window"]
        json_str = []
        for i in range(len(df)):
            row = df.iloc[i, :]
            ts = [{'value':row[y_series][j]} for j in range(len(row[y_series]))]
            json_data = {'series': ts, "threshold":score_threshold, "stableTrendWindow": trend_window}     # auto-detect period, or we can force 'period': 84
            json_str = json_str + [json.dumps(json_data)]
        result = df
        result['json_str'] = json_str
    ```;
    tbl
    | evaluate python(typeof(*, json_str:string), code, kwargs)
    | extend _tsid = column_ifexists(tsid, 1)
    | partition by _tsid (
       project json_str
       | evaluate http_request_post(uri, headers, dynamic(null))
        | project period=ResponseBody.period, change_point=series_add(0, ResponseBody.isChangePoint), confidence=ResponseBody.confidenceScores
        | extend _tsid=toscalar(_tsid)
       )
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required. In the following function definition, replace YOUR-AD-RESOURCE-NAME in the uri and YOUR-KEY in the Ocp-Apim-Subscription-Key of the header with your Anomaly Detector resource name and key.

.create-or-alter function with (folder = "Packages\\Series", docstring = "Time Series Change Points Detection by Azure Cognitive Service")
series_uv_change_points_fl(tbl:(*), y_series:string, score_threshold:real=0.9, trend_window:int=5, tsid:string='_tsid')
{
    let uri = 'https://YOUR-AD-RESOURCE-NAME.cognitiveservices.azure.com/anomalydetector/v1.0/timeseries/changepoint/detect';
    let headers=dynamic({'Ocp-Apim-Subscription-Key': h'YOUR-KEY'});
    let kwargs = bag_pack('y_series', y_series, 'score_threshold', score_threshold, 'trend_window', trend_window);
    let code = ```if 1:
        import json
        y_series = kargs["y_series"]
        score_threshold = kargs["score_threshold"]
        trend_window = kargs["trend_window"]
        json_str = []
        for i in range(len(df)):
            row = df.iloc[i, :]
            ts = [{'value':row[y_series][j]} for j in range(len(row[y_series]))]
            json_data = {'series': ts, "threshold":score_threshold, "stableTrendWindow": trend_window}     # auto-detect period, or we can force 'period': 84
            json_str = json_str + [json.dumps(json_data)]
        result = df
        result['json_str'] = json_str
    ```;
    tbl
    | evaluate python(typeof(*, json_str:string), code, kwargs)
    | extend _tsid = column_ifexists(tsid, 1)
    | partition by _tsid (
       project json_str
       | evaluate http_request_post(uri, headers, dynamic(null))
        | project period=ResponseBody.period, change_point=series_add(0, ResponseBody.isChangePoint), confidence=ResponseBody.confidenceScores
        | extend _tsid=toscalar(_tsid)
       )
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let series_uv_change_points_fl=(tbl:(*), y_series:string, score_threshold:real=0.9, trend_window:int=5, tsid:string='_tsid')
{
    let uri = 'https://YOUR-AD-RESOURCE-NAME.cognitiveservices.azure.com/anomalydetector/v1.0/timeseries/changepoint/detect';
    let headers=dynamic({'Ocp-Apim-Subscription-Key': h'YOUR-KEY'});
    let kwargs = bag_pack('y_series', y_series, 'score_threshold', score_threshold, 'trend_window', trend_window);
    let code = ```if 1:
        import json
        y_series = kargs["y_series"]
        score_threshold = kargs["score_threshold"]
        trend_window = kargs["trend_window"]
        json_str = []
        for i in range(len(df)):
            row = df.iloc[i, :]
            ts = [{'value':row[y_series][j]} for j in range(len(row[y_series]))]
            json_data = {'series': ts, "threshold":score_threshold, "stableTrendWindow": trend_window}     # auto-detect period, or we can force 'period': 84
            json_str = json_str + [json.dumps(json_data)]
        result = df
        result['json_str'] = json_str
    ```;
    tbl
    | evaluate python(typeof(*, json_str:string), code, kwargs)
    | extend _tsid = column_ifexists(tsid, 1)
    | partition by _tsid (
       project json_str
       | evaluate http_request_post(uri, headers, dynamic(null))
        | project period=ResponseBody.period, change_point=series_add(0, ResponseBody.isChangePoint), confidence=ResponseBody.confidenceScores
        | extend _tsid=toscalar(_tsid)
       )
};
let ts = range x from 1 to 300 step 1
| extend y=iff(x between (100 .. 110) or x between (200 .. 220), 20, 5)
| extend ts=datetime(2021-01-01)+x*1d
| extend y=y+4*rand()
| summarize ts=make_list(ts), y=make_list(y)
| extend sid=1;
ts
| invoke series_uv_change_points_fl('y', 0.8, 10, 'sid')
| join ts on $left._tsid == $right.sid
| project-away _tsid
| project-reorder y, *      //  just to visualize the anomalies on top of y series
| render anomalychart with(xcolumn=ts, ycolumns=y, confidence, anomalycolumns=change_point)

Stored

let ts = range x from 1 to 300 step 1
| extend y=iff(x between (100 .. 110) or x between (200 .. 220), 20, 5)
| extend ts=datetime(2021-01-01)+x*1d
| extend y=y+4*rand()
| summarize ts=make_list(ts), y=make_list(y)
| extend sid=1;
ts
| invoke series_uv_change_points_fl('y', 0.8, 10, 'sid')
| join ts on $left._tsid == $right.sid
| project-away _tsid
| project-reorder y, *      //  just to visualize the anomalies on top of y series
| render anomalychart with(xcolumn=ts, ycolumns=y, confidence, anomalycolumns=change_point)

Output

The following graph shows change points on a time series.

Graph showing change points on a time series.

5.55 - time_weighted_avg_fl()

This article describes time_weighted_avg_fl() user-defined function.

The function time_weighted_avg_fl() is a user-defined function (UDF) that calculates the time weighted average of a metric in a given time window, over input time bins. This function is similar to summarize operator. The function aggregates the metric by time bins, but instead of calculating simple avg() of the metric value in each bin, it weights each value by its duration. The duration is defined from the timestamp of the current value to the timestamp of the next value.

There are two options to calculate time weighted average. This function fills forward the value from the current sample until the next one. Alternatively time_weighted_avg2_fl() linearly interpolates the metric value between consecutive samples.

Syntax

T | invoke time_weighted_avg_fl(t_col, y_col, key_col, stime, etime, dt)

Parameters

NameTypeRequiredDescription
t_colstring✔️The name of the column containing the time stamp of the records.
y_colstring✔️The name of the column containing the metric value of the records.
key_colstring✔️The name of the column containing the partition key of the records.
stimedatetime✔️The start time of the aggregation window.
etimedatetime✔️The end time of the aggregation window.
dttimespan✔️The aggregation time bin.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let time_weighted_avg_fl=(tbl:(*), t_col:string, y_col:string, key_col:string, stime:datetime, etime:datetime, dt:timespan)
{
    let tbl_ex = tbl | extend _ts = column_ifexists(t_col, datetime(null)), _val = column_ifexists(y_col, 0.0), _key = column_ifexists(key_col, '');
    let _etime = etime + dt;
    let gridTimes = range _ts from stime to _etime step dt | extend _val=real(null), dummy=1;
    let keys = materialize(tbl_ex | summarize by _key | extend dummy=1);
    gridTimes
    | join kind=fullouter keys on dummy
    | project-away dummy, dummy1
    | union tbl_ex
    | where _ts between (stime.._etime)
    | partition hint.strategy=native by _key (
        order by _ts asc, _val nulls last
        | scan declare(f_value:real=0.0) with (step s: true => f_value = iff(isnull(_val), s.f_value, _val);) // fill forward null values
        | extend diff_t=(next(_ts)-_ts)/1m
    )
    | where isnotnull(diff_t)
    | summarize tw_sum=sum(f_value*diff_t), t_sum =sum(diff_t) by bin_at(_ts, dt, stime), _key
    | where t_sum > 0 and _ts <= etime
    | extend tw_avg = tw_sum/t_sum
    | project-away tw_sum, t_sum
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Series", docstring = "Time weighted average of a metric using fill forward interpolation")
time_weighted_avg_fl(tbl:(*), t_col:string, y_col:string, key_col:string, stime:datetime, etime:datetime, dt:timespan)
{
    let tbl_ex = tbl | extend _ts = column_ifexists(t_col, datetime(null)), _val = column_ifexists(y_col, 0.0), _key = column_ifexists(key_col, '');
    let _etime = etime + dt;
    let gridTimes = range _ts from stime to _etime step dt | extend _val=real(null), dummy=1;
    let keys = materialize(tbl_ex | summarize by _key | extend dummy=1);
    gridTimes
    | join kind=fullouter keys on dummy
    | project-away dummy, dummy1
    | union tbl_ex
    | where _ts between (stime.._etime)
    | partition hint.strategy=native by _key (
        order by _ts asc, _val nulls last
        | scan declare(f_value:real=0.0) with (step s: true => f_value = iff(isnull(_val), s.f_value, _val);) // fill forward null values
        | extend diff_t=(next(_ts)-_ts)/1m
    )
    | where isnotnull(diff_t)
    | summarize tw_sum=sum(f_value*diff_t), t_sum =sum(diff_t) by bin_at(_ts, dt, stime), _key
    | where t_sum > 0 and _ts <= etime
    | extend tw_avg = tw_sum/t_sum
    | project-away tw_sum, t_sum
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let time_weighted_avg_fl=(tbl:(*), t_col:string, y_col:string, key_col:string, stime:datetime, etime:datetime, dt:timespan)
{
    let tbl_ex = tbl | extend _ts = column_ifexists(t_col, datetime(null)), _val = column_ifexists(y_col, 0.0), _key = column_ifexists(key_col, '');
    let _etime = etime + dt;
    let gridTimes = range _ts from stime to _etime step dt | extend _val=real(null), dummy=1;
    let keys = materialize(tbl_ex | summarize by _key | extend dummy=1);
    gridTimes
    | join kind=fullouter keys on dummy
    | project-away dummy, dummy1
    | union tbl_ex
    | where _ts between (stime.._etime)
    | partition hint.strategy=native by _key (
        order by _ts asc, _val nulls last
        | scan declare(f_value:real=0.0) with (step s: true => f_value = iff(isnull(_val), s.f_value, _val);) // fill forward null values
        | extend diff_t=(next(_ts)-_ts)/1m
    )
    | where isnotnull(diff_t)
    | summarize tw_sum=sum(f_value*diff_t), t_sum =sum(diff_t) by bin_at(_ts, dt, stime), _key
    | where t_sum > 0 and _ts <= etime
    | extend tw_avg = tw_sum/t_sum
    | project-away tw_sum, t_sum
};
let tbl = datatable(ts:datetime,  val:real, key:string) [
    datetime(2021-04-26 00:00), 100, 'Device1',
    datetime(2021-04-26 00:45), 300, 'Device1',
    datetime(2021-04-26 01:15), 200, 'Device1',
    datetime(2021-04-26 00:00), 600, 'Device2',
    datetime(2021-04-26 00:30), 400, 'Device2',
    datetime(2021-04-26 01:30), 500, 'Device2',
    datetime(2021-04-26 01:45), 300, 'Device2'
];
let minmax=materialize(tbl | summarize mint=min(ts), maxt=max(ts));
let stime=toscalar(minmax | project mint);
let etime=toscalar(minmax | project maxt);
let dt = 1h;
tbl
| invoke time_weighted_avg_fl('ts', 'val', 'key', stime, etime, dt)
| project-rename val = tw_avg
| order by _key asc, _ts asc

Stored

let tbl = datatable(ts:datetime,  val:real, key:string) [
    datetime(2021-04-26 00:00), 100, 'Device1',
    datetime(2021-04-26 00:45), 300, 'Device1',
    datetime(2021-04-26 01:15), 200, 'Device1',
    datetime(2021-04-26 00:00), 600, 'Device2',
    datetime(2021-04-26 00:30), 400, 'Device2',
    datetime(2021-04-26 01:30), 500, 'Device2',
    datetime(2021-04-26 01:45), 300, 'Device2'
];
let minmax=materialize(tbl | summarize mint=min(ts), maxt=max(ts));
let stime=toscalar(minmax | project mint);
let etime=toscalar(minmax | project maxt);
let dt = 1h;
tbl
| invoke time_weighted_avg_fl('ts', 'val', 'key', stime, etime, dt)
| project-rename val = tw_avg
| order by _key asc, _ts asc

Output

_ts_keyval
2021-04-26 00:00:00.0000000Device1150
2021-04-26 01:00:00.0000000Device1225
2021-04-26 00:00:00.0000000Device2500
2021-04-26 01:00:00.0000000Device2400

The first value of Device1 is (45m*100 + 15m*300)/60m = 150, the second value is (15m*300 + 45m*200)/60m = 225.
The first value of Device2 is (30m*600 + 30m*400)/60m = 500, the second value is (30m*400 + 15m*500 + 15m*300)/60m = 400.

5.56 - time_weighted_avg2_fl()

This article describes time_weighted_avg2_fl() user-defined function.

The function time_weighted_avg2_fl() is a user-defined function (UDF) that calculates the time weighted average of a metric in a given time window, over input time bins. This function is similar to summarize operator. The function aggregates the metric by time bins, but instead of calculating simple avg() of the metric value in each bin, it weights each value by its duration. The duration is defined from the timestamp of the current value to the timestamp of the next value.

There are two options to calculate time weighted average. This function linearly interpolates the metric value between consecutive samples. Alternatively time_weighted_avg_fl() fills forward the value from the current sample until the next one.

Syntax

T | invoke time_weighted_avg2_fl(t_col, y_col, key_col, stime, etime, dt)

Parameters

NameTypeRequiredDescription
t_colstring✔️The name of the column containing the time stamp of the records.
y_colstring✔️The name of the column containing the metric value of the records.
key_colstring✔️The name of the column containing the partition key of the records.
stimedatetime✔️The start time of the aggregation window.
etimedatetime✔️The end time of the aggregation window.
dttimespan✔️The aggregation time bin.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let time_weighted_avg2_fl=(tbl:(*), t_col:string, y_col:string, key_col:string, stime:datetime, etime:datetime, dt:timespan)
{
    let tbl_ex = tbl | extend _ts = column_ifexists(t_col, datetime(null)), _val = column_ifexists(y_col, 0.0), _key = column_ifexists(key_col, '');
    let _etime = etime + dt;
    let gridTimes = range _ts from stime to _etime step dt | extend _val=real(null), dummy=1;
    let keys = materialize(tbl_ex | summarize by _key | extend dummy=1);
    gridTimes
    | join kind=fullouter keys on dummy
    | project-away dummy, dummy1
    | union tbl_ex
    | where _ts between (stime.._etime)
    | partition hint.strategy=native by _key (
      order by _ts desc, _val nulls last
    | scan declare(val1:real=0.0, t1:datetime) with (                // fill backward null values
        step s: true => val1=iff(isnull(_val), s.val1, _val), t1=iff(isnull(_val), s.t1, _ts);)
    | extend dt1=(t1-_ts)/1m
    | order by _ts asc, _val nulls last
    | scan declare(val0:real=0.0, t0:datetime) with (                // fill forward null values
        step s: true => val0=iff(isnull(_val), s.val0, _val), t0=iff(isnull(_val), s.t0, _ts);)
    | extend dt0=(_ts-t0)/1m
    | extend _twa_val=iff(dt0+dt1 == 0, _val, ((val0*dt1)+(val1*dt0))/(dt0+dt1))
    | scan with (                                                    // fill forward null twa values
        step s: true => _twa_val=iff(isnull(_twa_val), s._twa_val, _twa_val);)
    | extend diff_t=(next(_ts)-_ts)/1m
    )
    | where isnotnull(diff_t)
    | order by _key asc, _ts asc
    | extend next_twa_val=iff(_key == next(_key), next(_twa_val), _twa_val)
    | summarize tw_sum=sum((_twa_val+next_twa_val)*diff_t/2.0), t_sum =sum(diff_t) by bin_at(_ts, dt, stime), _key
    | where t_sum > 0 and _ts <= etime
    | extend tw_avg = tw_sum/t_sum
    | project-away tw_sum, t_sum
    | order by _key asc, _ts asc 
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Series", docstring = "Time weighted average of a metric using linear interpolation")
time_weighted_avg2_fl(tbl:(*), t_col:string, y_col:string, key_col:string, stime:datetime, etime:datetime, dt:timespan)
{
    let tbl_ex = tbl | extend _ts = column_ifexists(t_col, datetime(null)), _val = column_ifexists(y_col, 0.0), _key = column_ifexists(key_col, '');
    let _etime = etime + dt;
    let gridTimes = range _ts from stime to _etime step dt | extend _val=real(null), dummy=1;
    let keys = materialize(tbl_ex | summarize by _key | extend dummy=1);
    gridTimes
    | join kind=fullouter keys on dummy
    | project-away dummy, dummy1
    | union tbl_ex
    | where _ts between (stime.._etime)
    | partition hint.strategy=native by _key (
      order by _ts desc, _val nulls last
    | scan declare(val1:real=0.0, t1:datetime) with (                // fill backward null values
        step s: true => val1=iff(isnull(_val), s.val1, _val), t1=iff(isnull(_val), s.t1, _ts);)
    | extend dt1=(t1-_ts)/1m
    | order by _ts asc, _val nulls last
    | scan declare(val0:real=0.0, t0:datetime) with (                // fill forward null values
        step s: true => val0=iff(isnull(_val), s.val0, _val), t0=iff(isnull(_val), s.t0, _ts);)
    | extend dt0=(_ts-t0)/1m
    | extend _twa_val=iff(dt0+dt1 == 0, _val, ((val0*dt1)+(val1*dt0))/(dt0+dt1))
    | scan with (                                                    // fill forward null twa values
        step s: true => _twa_val=iff(isnull(_twa_val), s._twa_val, _twa_val);)
    | extend diff_t=(next(_ts)-_ts)/1m
    )
    | where isnotnull(diff_t)
    | order by _key asc, _ts asc
    | extend next_twa_val=iff(_key == next(_key), next(_twa_val), _twa_val)
    | summarize tw_sum=sum((_twa_val+next_twa_val)*diff_t/2.0), t_sum =sum(diff_t) by bin_at(_ts, dt, stime), _key
    | where t_sum > 0 and _ts <= etime
    | extend tw_avg = tw_sum/t_sum
    | project-away tw_sum, t_sum
    | order by _key asc, _ts asc 
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let time_weighted_avg2_fl=(tbl:(*), t_col:string, y_col:string, key_col:string, stime:datetime, etime:datetime, dt:timespan)
{
    let tbl_ex = tbl | extend _ts = column_ifexists(t_col, datetime(null)), _val = column_ifexists(y_col, 0.0), _key = column_ifexists(key_col, '');
    let _etime = etime + dt;
    let gridTimes = range _ts from stime to _etime step dt | extend _val=real(null), dummy=1;
    let keys = materialize(tbl_ex | summarize by _key | extend dummy=1);
    gridTimes
    | join kind=fullouter keys on dummy
    | project-away dummy, dummy1
    | union tbl_ex
    | where _ts between (stime.._etime)
    | partition hint.strategy=native by _key (
      order by _ts desc, _val nulls last
    | scan declare(val1:real=0.0, t1:datetime) with (                // fill backward null values
        step s: true => val1=iff(isnull(_val), s.val1, _val), t1=iff(isnull(_val), s.t1, _ts);)
    | extend dt1=(t1-_ts)/1m
    | order by _ts asc, _val nulls last
    | scan declare(val0:real=0.0, t0:datetime) with (                // fill forward null values
        step s: true => val0=iff(isnull(_val), s.val0, _val), t0=iff(isnull(_val), s.t0, _ts);)
    | extend dt0=(_ts-t0)/1m
    | extend _twa_val=iff(dt0+dt1 == 0, _val, ((val0*dt1)+(val1*dt0))/(dt0+dt1))
    | scan with (                                                    // fill forward null twa values
        step s: true => _twa_val=iff(isnull(_twa_val), s._twa_val, _twa_val);)
    | extend diff_t=(next(_ts)-_ts)/1m
    )
    | where isnotnull(diff_t)
    | order by _key asc, _ts asc
    | extend next_twa_val=iff(_key == next(_key), next(_twa_val), _twa_val)
    | summarize tw_sum=sum((_twa_val+next_twa_val)*diff_t/2.0), t_sum =sum(diff_t) by bin_at(_ts, dt, stime), _key
    | where t_sum > 0 and _ts <= etime
    | extend tw_avg = tw_sum/t_sum
    | project-away tw_sum, t_sum
    | order by _key asc, _ts asc 
};
let tbl = datatable(ts:datetime,  val:real, key:string) [
    datetime(2021-04-26 00:00), 100, 'Device1',
    datetime(2021-04-26 00:45), 300, 'Device1',
    datetime(2021-04-26 01:15), 200, 'Device1',
    datetime(2021-04-26 00:00), 600, 'Device2',
    datetime(2021-04-26 00:30), 400, 'Device2',
    datetime(2021-04-26 01:30), 500, 'Device2',
    datetime(2021-04-26 01:45), 300, 'Device2'
];
let minmax=materialize(tbl | summarize mint=min(ts), maxt=max(ts));
let stime=toscalar(minmax | project mint);
let etime=toscalar(minmax | project maxt);
let dt = 1h;
tbl
| invoke time_weighted_avg2_fl('ts', 'val', 'key', stime, etime, dt)
| project-rename val = tw_avg
| order by _key asc, _ts asc

Stored

let tbl = datatable(ts:datetime,  val:real, key:string) [
    datetime(2021-04-26 00:00), 100, 'Device1',
    datetime(2021-04-26 00:45), 300, 'Device1',
    datetime(2021-04-26 01:15), 200, 'Device1',
    datetime(2021-04-26 00:00), 600, 'Device2',
    datetime(2021-04-26 00:30), 400, 'Device2',
    datetime(2021-04-26 01:30), 500, 'Device2',
    datetime(2021-04-26 01:45), 300, 'Device2'
];
let minmax=materialize(tbl | summarize mint=min(ts), maxt=max(ts));
let stime=toscalar(minmax | project mint);
let etime=toscalar(minmax | project maxt);
let dt = 1h;
tbl
| invoke time_weighted_avg2_fl('ts', 'val', 'key', stime, etime, dt)
| project-rename val = tw_avg
| order by _key asc, _ts asc

Output

_ts_keyval
2021-04-26 00:00:00.0000000Device1218.75
2021-04-26 01:00:00.0000000Device1206.25
2021-04-26 00:00:00.0000000Device2462.5
2021-04-26 01:00:00.0000000Device2412.5

The first value of Device1 is (45m*(100+300)/2 + 15m*(300+250)/2)/60m = 218.75, the second value is (15m*(250+200)/2 + 45m*200)/60m = 206.25.
The first value of Device2 is (30m*(600+400)/2 + 30m*(400+450)/2)/60m = 462.5, the second value is (30m*(450+500)/2 + 15m*(500+300)/2 + 15m*300)/60m = 412.5.

5.57 - time_weighted_val_fl()

This article describes time_weighted_val_fl() user-defined function.

The function time_weighted_val_fl() is a user-defined function (UDF) that linearly interpolates metric value by time weighted average of the values of its previous point and its next point.

Syntax

T | invoke time_weighted_avg_fl(t_col, y_col, key_col, stime, etime, dt)

Parameters

NameTypeRequiredDescription
t_colstring✔️The name of the column containing the time stamp of the records.
y_colstring✔️The name of the column containing the metric value of the records.
key_colstring✔️The name of the column containing the partition key of the records.
stimedatetime✔️The start time of the aggregation window.
etimedatetime✔️The end time of the aggregation window.
dttimespan✔️The aggregation time bin.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let time_weighted_val_fl=(tbl:(*), t_col:string, y_col:string, key_col:string, stime:datetime, etime:datetime, dt:timespan)
{
    let tbl_ex = tbl | extend _ts = column_ifexists(t_col, datetime(null)), _val = column_ifexists(y_col, 0.0), _key = column_ifexists(key_col, '');
    let gridTimes = range _ts from stime to etime step dt | extend _val=real(null), grid=1, dummy=1;
    let keys = materialize(tbl_ex | summarize by _key | extend dummy=1);
    gridTimes
    | join kind=fullouter keys on dummy
    | project-away dummy, dummy1
    | union (tbl_ex | extend grid=0)
    | where _ts between (stime..etime)
    | partition hint.strategy=native by _key (
      order by _ts desc, _val nulls last
    | scan declare(val1:real=0.0, t1:datetime) with (                // fill backward null values
        step s: true => val1=iff(isnull(_val), s.val1, _val), t1=iff(isnull(_val), s.t1, _ts);)
    | extend dt1=(t1-_ts)/1m
    | order by _ts asc, _val nulls last
    | scan declare(val0:real=0.0, t0:datetime) with (                // fill forward null values
        step s: true => val0=iff(isnull(_val), s.val0, _val), t0=iff(isnull(_val), s.t0, _ts);)
    | extend dt0=(_ts-t0)/1m
    | extend _twa_val=iff(dt0+dt1 == 0, _val, ((val0*dt1)+(val1*dt0))/(dt0+dt1))
    | scan with (                                                    // fill forward null twa values
        step s: true => _twa_val=iff(isnull(_twa_val), s._twa_val, _twa_val);)
    | where grid == 0 or (grid == 1 and _ts != prev(_ts))
    )
    | project _ts, _key, _twa_val, orig_val=iff(grid == 1, 0, 1)
    | order by _key asc, _ts asc
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Series", docstring = "Linear interpolation of metric value by time weighted average")
time_weighted_val_fl(tbl:(*), t_col:string, y_col:string, key_col:string, stime:datetime, etime:datetime, dt:timespan)
{
    let tbl_ex = tbl | extend _ts = column_ifexists(t_col, datetime(null)), _val = column_ifexists(y_col, 0.0), _key = column_ifexists(key_col, '');
    let gridTimes = range _ts from stime to etime step dt | extend _val=real(null), grid=1, dummy=1;
    let keys = materialize(tbl_ex | summarize by _key | extend dummy=1);
    gridTimes
    | join kind=fullouter keys on dummy
    | project-away dummy, dummy1
    | union (tbl_ex | extend grid=0)
    | where _ts between (stime..etime)
    | partition hint.strategy=native by _key (
      order by _ts desc, _val nulls last
    | scan declare(val1:real=0.0, t1:datetime) with (                // fill backward null values
        step s: true => val1=iff(isnull(_val), s.val1, _val), t1=iff(isnull(_val), s.t1, _ts);)
    | extend dt1=(t1-_ts)/1m
    | order by _ts asc, _val nulls last
    | scan declare(val0:real=0.0, t0:datetime) with (                // fill forward null values
        step s: true => val0=iff(isnull(_val), s.val0, _val), t0=iff(isnull(_val), s.t0, _ts);)
    | extend dt0=(_ts-t0)/1m
    | extend _twa_val=iff(dt0+dt1 == 0, _val, ((val0*dt1)+(val1*dt0))/(dt0+dt1))
    | scan with (                                                    // fill forward null twa values
        step s: true => _twa_val=iff(isnull(_twa_val), s._twa_val, _twa_val);)
    | where grid == 0 or (grid == 1 and _ts != prev(_ts))
    )
    | project _ts, _key, _twa_val, orig_val=iff(grid == 1, 0, 1)
    | order by _key asc, _ts asc
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let time_weighted_val_fl=(tbl:(*), t_col:string, y_col:string, key_col:string, stime:datetime, etime:datetime, dt:timespan)
{
    let tbl_ex = tbl | extend _ts = column_ifexists(t_col, datetime(null)), _val = column_ifexists(y_col, 0.0), _key = column_ifexists(key_col, '');
    let gridTimes = range _ts from stime to etime step dt | extend _val=real(null), grid=1, dummy=1;
    let keys = materialize(tbl_ex | summarize by _key | extend dummy=1);
    gridTimes
    | join kind=fullouter keys on dummy
    | project-away dummy, dummy1
    | union (tbl_ex | extend grid=0)
    | where _ts between (stime..etime)
    | partition hint.strategy=native by _key (
      order by _ts desc, _val nulls last
    | scan declare(val1:real=0.0, t1:datetime) with (                // fill backward null values
        step s: true => val1=iff(isnull(_val), s.val1, _val), t1=iff(isnull(_val), s.t1, _ts);)
    | extend dt1=(t1-_ts)/1m
    | order by _ts asc, _val nulls last
    | scan declare(val0:real=0.0, t0:datetime) with (                // fill forward null values
        step s: true => val0=iff(isnull(_val), s.val0, _val), t0=iff(isnull(_val), s.t0, _ts);)
    | extend dt0=(_ts-t0)/1m
    | extend _twa_val=iff(dt0+dt1 == 0, _val, ((val0*dt1)+(val1*dt0))/(dt0+dt1))
    | scan with (                                                    // fill forward null twa values
        step s: true => _twa_val=iff(isnull(_twa_val), s._twa_val, _twa_val);)
    | where grid == 0 or (grid == 1 and _ts != prev(_ts))
    )
    | project _ts, _key, _twa_val, orig_val=iff(grid == 1, 0, 1)
    | order by _key asc, _ts asc
};
let tbl = datatable(ts:datetime,  val:real, key:string) [
    datetime(2021-04-26 00:00), 100, 'Device1',
    datetime(2021-04-26 00:45), 300, 'Device1',
    datetime(2021-04-26 01:15), 200, 'Device1',
    datetime(2021-04-26 00:00), 600, 'Device2',
    datetime(2021-04-26 00:30), 400, 'Device2',
    datetime(2021-04-26 01:30), 500, 'Device2',
    datetime(2021-04-26 01:45), 300, 'Device2'
];
let minmax=materialize(tbl | summarize mint=min(ts), maxt=max(ts));
let stime=toscalar(minmax | project mint);
let etime=toscalar(minmax | project maxt);
let dt = 1h;
tbl
| invoke time_weighted_val_fl('ts', 'val', 'key', stime, etime, dt)
| project-rename val = _twa_val
| order by _key asc, _ts asc

Stored

let tbl = datatable(ts:datetime,  val:real, key:string) [
    datetime(2021-04-26 00:00), 100, 'Device1',
    datetime(2021-04-26 00:45), 300, 'Device1',
    datetime(2021-04-26 01:15), 200, 'Device1',
    datetime(2021-04-26 00:00), 600, 'Device2',
    datetime(2021-04-26 00:30), 400, 'Device2',
    datetime(2021-04-26 01:30), 500, 'Device2',
    datetime(2021-04-26 01:45), 300, 'Device2'
];
let minmax=materialize(tbl | summarize mint=min(ts), maxt=max(ts));
let stime=toscalar(minmax | project mint);
let etime=toscalar(minmax | project maxt);
let dt = 1h;
tbl
| invoke time_weighted_val_fl('ts', 'val', 'key', stime, etime, dt)
| project-rename val = _twa_val
| order by _key asc, _ts asc

Output

_ts_keyvalorig_val
2021-04-26 00:00:00.0000000Device11001
2021-04-26 00:45:00.0000000Device13001
2021-04-26 01:00:00.0000000Device12500
2021-04-26 01:15:00.0000000Device12001
2021-04-26 00:00:00.0000000Device26001
2021-04-26 00:30:00.0000000Device24001
2021-04-26 01:00:00.0000000Device24500
2021-04-26 01:30:00.0000000Device25001
2021-04-26 01:45:00.0000000Device23001

5.58 - time_window_rolling_avg_fl()

This article describes time_window_rolling_avg_fl() user-defined function.

The function time_window_rolling_avg_fl() is a user-defined function (UDF) that calculates the rolling average of the required value over a constant duration time window.

Calculating rolling average over a constant time window for regular time series (that is, having constant intervals) can be achieved using series_fir(), as the constant time window can be converted to a fixed width filter of equal coefficients. However, calculating it for irregular time series is more complex, as the actual number of samples in the window varies. Still it can be achieved using the powerful scan operator.

This type of rolling window calculation is required for use cases where the metric values are emitted only when changed (and not in constant intervals). For example in IoT, where edge devices send metrics to the cloud only upon changes, optimizing communication bandwidth.

Syntax

T | invoke time_window_rolling_avg_fl(t_col, y_col, key_col, dt [, direction ])

Parameters

NameTypeRequiredDescription
t_colstring✔️The name of the column containing the time stamp of the records.
y_colstring✔️The name of the column containing the metric value of the records.
key_colstring✔️The name of the column containing the partition key of the records.
dttimespan✔️The duration of the rolling window.
directionintThe aggregation direction. The possible values are +1 or -1. A rolling window is set from current time forward/backward respectively. Default is -1, as backward rolling window is the only possible method for streaming scenarios.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let time_window_rolling_avg_fl=(tbl:(*), t_col:string, y_col:string, key_col:string, dt:timespan, direction:int=int(-1))
{
    let tbl_ex = tbl | extend timestamp = column_ifexists(t_col, datetime(null)), value = column_ifexists(y_col, 0.0), key = column_ifexists(key_col, '');
    tbl_ex 
    | partition hint.strategy=shuffle by key 
    (
        extend timestamp=pack_array(timestamp, timestamp - direction*dt), delta = pack_array(-direction, direction)
        | mv-expand timestamp to typeof(datetime), delta to typeof(long)
        | sort by timestamp asc, delta desc    
        | scan declare (cum_sum:double=0.0, cum_count:long=0) with 
        (
            step s: true => cum_count = s.cum_count + delta, 
                            cum_sum = s.cum_sum + delta * value; 
        )
        | extend avg_value = iff(direction == 1, prev(cum_sum)/prev(cum_count), cum_sum/cum_count)
        | where delta == -direction 
        | project timestamp, value, avg_value, key
    )
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Series", docstring = "Time based rolling average of a metric")
time_window_rolling_avg_fl(tbl:(*), t_col:string, y_col:string, key_col:string, dt:timespan, direction:int=int(-1))
{
    let tbl_ex = tbl | extend timestamp = column_ifexists(t_col, datetime(null)), value = column_ifexists(y_col, 0.0), key = column_ifexists(key_col, '');
    tbl_ex 
    | partition hint.strategy=shuffle by key 
    (
        extend timestamp=pack_array(timestamp, timestamp - direction*dt), delta = pack_array(-direction, direction)
        | mv-expand timestamp to typeof(datetime), delta to typeof(long)
        | sort by timestamp asc, delta desc    
        | scan declare (cum_sum:double=0.0, cum_count:long=0) with 
        (
            step s: true => cum_count = s.cum_count + delta, 
                            cum_sum = s.cum_sum + delta * value; 
        )
        | extend avg_value = iff(direction == 1, prev(cum_sum)/prev(cum_count), cum_sum/cum_count)
        | where delta == -direction 
        | project timestamp, value, avg_value, key
    )
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let time_window_rolling_avg_fl=(tbl:(*), t_col:string, y_col:string, key_col:string, dt:timespan, direction:int=int(-1))
{
    let tbl_ex = tbl | extend timestamp = column_ifexists(t_col, datetime(null)), value = column_ifexists(y_col, 0.0), key = column_ifexists(key_col, '');
    tbl_ex 
    | partition hint.strategy=shuffle by key 
    (
        extend timestamp=pack_array(timestamp, timestamp - direction*dt), delta = pack_array(-direction, direction)
        | mv-expand timestamp to typeof(datetime), delta to typeof(long)
        | sort by timestamp asc, delta desc    
        | scan declare (cum_sum:double=0.0, cum_count:long=0) with 
        (
            step s: true => cum_count = s.cum_count + delta, 
                            cum_sum = s.cum_sum + delta * value; 
        )
        | extend avg_value = iff(direction == 1, prev(cum_sum)/prev(cum_count), cum_sum/cum_count)
        | where delta == -direction 
        | project timestamp, value, avg_value, key
    )
};
let tbl = datatable(ts:datetime,  val:real, key:string) [
    datetime(8:00), 1, 'Device1',
    datetime(8:01), 2, 'Device1',
    datetime(8:05), 3, 'Device1',
    datetime(8:05), 10, 'Device2',
    datetime(8:09), 20, 'Device2',
    datetime(8:40), 4, 'Device1',
    datetime(9:00), 5, 'Device1',
    datetime(9:01), 6, 'Device1',
    datetime(9:05), 30, 'Device2',
    datetime(9:50), 7, 'Device1'
];
tbl
| invoke time_window_rolling_avg_fl('ts', 'val', 'key', 10m)

Stored

let tbl = datatable(ts:datetime,  val:real, key:string) [
    datetime(8:00), 1, 'Device1',
    datetime(8:01), 2, 'Device1',
    datetime(8:05), 3, 'Device1',
    datetime(8:05), 10, 'Device2',
    datetime(8:09), 20, 'Device2',
    datetime(8:40), 4, 'Device1',
    datetime(9:00), 5, 'Device1',
    datetime(9:01), 6, 'Device1',
    datetime(9:05), 30, 'Device2',
    datetime(9:50), 7, 'Device1'
];
tbl
| invoke time_window_rolling_avg_fl('ts', 'val', 'key', 10m)

Output

timestampvalueavg_valuekey
2021-11-29 08:05:00.00000001010Device2
2021-11-29 08:09:00.00000002015Device2
2021-11-29 09:05:00.00000003030Device2
2021-11-29 08:00:00.000000011Device1
2021-11-29 08:01:00.000000021.5Device1
2021-11-29 08:05:00.000000032Device1
2021-11-29 08:40:00.000000044Device1
2021-11-29 09:00:00.000000055Device1
2021-11-29 09:01:00.000000065.5Device1
2021-11-29 09:50:00.000000077Device1

The first value (10) at 8:05 contains only a single value, which fell in the 10-minute backward window, the second value (15) is the average of two samples at 8:09 and at 8:05, etc.

5.59 - two_sample_t_test_fl()

This article describes the two_sample_t_test_fl() user-defined function.

The function two_sample_t_test_fl() is a user-defined function (UDF) that performs the Two-Sample T-Test.

Syntax

T | invoke two_sample_t_test_fl(data1, data2, test_statistic,p_value, equal_var)

Parameters

NameTypeRequiredDescription
data1string✔️The name of the column containing the first set of data to be used for the test.
data2string✔️The name of the column containing the second set of data to be used for the test.
test_statisticstring✔️The name of the column to store test statistic value for the results.
p_valuestring✔️The name of the column to store p-value for the results.
equal_varboolIf true (default), performs a standard independent 2 sample test that assumes equal population variances. If false, performs Welch’s t-test, which does not assume equal population variance. As mentioned above, consider using the native welch_test().

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let two_sample_t_test_fl = (tbl:(*), data1:string, data2:string, test_statistic:string, p_value:string, equal_var:bool=true)
{
    let kwargs = bag_pack('data1', data1, 'data2', data2, 'test_statistic', test_statistic, 'p_value', p_value, 'equal_var', equal_var);
    let code = ```if 1:
        from scipy import stats
        import pandas
        
        data1 = kargs["data1"]
        data2 = kargs["data2"]
        test_statistic = kargs["test_statistic"]
        p_value = kargs["p_value"]
        equal_var = kargs["equal_var"]
        
        def func(row):
            statistics = stats.ttest_ind(row[data1], row[data2], equal_var=equal_var)
            return statistics[0], statistics[1]
        result = df
        result[[test_statistic, p_value]]  = df.apply(func, axis=1, result_type = "expand")
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Stats", docstring = "Two-Sample t-Test")
two_sample_t_test_fl(tbl:(*), data1:string, data2:string, test_statistic:string, p_value:string, equal_var:bool=true)
{
    let kwargs = bag_pack('data1', data1, 'data2', data2, 'test_statistic', test_statistic, 'p_value', p_value, 'equal_var', equal_var);
    let code = ```if 1:
        from scipy import stats
        import pandas
        
        data1 = kargs["data1"]
        data2 = kargs["data2"]
        test_statistic = kargs["test_statistic"]
        p_value = kargs["p_value"]
        equal_var = kargs["equal_var"]
        
        def func(row):
            statistics = stats.ttest_ind(row[data1], row[data2], equal_var=equal_var)
            return statistics[0], statistics[1]
        result = df
        result[[test_statistic, p_value]]  = df.apply(func, axis=1, result_type = "expand")
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let two_sample_t_test_fl = (tbl:(*), data1:string, data2:string, test_statistic:string, p_value:string, equal_var:bool=true)
{
    let kwargs = bag_pack('data1', data1, 'data2', data2, 'test_statistic', test_statistic, 'p_value', p_value, 'equal_var', equal_var);
    let code = ```if 1:
        from scipy import stats
        import pandas
        
        data1 = kargs["data1"]
        data2 = kargs["data2"]
        test_statistic = kargs["test_statistic"]
        p_value = kargs["p_value"]
        equal_var = kargs["equal_var"]
        
        def func(row):
            statistics = stats.ttest_ind(row[data1], row[data2], equal_var=equal_var)
            return statistics[0], statistics[1]
        result = df
        result[[test_statistic, p_value]]  = df.apply(func, axis=1, result_type = "expand")
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
};
datatable(id:string, sample1:dynamic, sample2:dynamic) [
'Test #1', dynamic([23.64, 20.57, 20.42]), dynamic([27.1, 22.12, 33.56]),
'Test #2', dynamic([20.85, 21.89, 23.41]), dynamic([35.09, 30.02, 26.52]),
'Test #3', dynamic([20.13, 20.5, 21.7, 22.02]), dynamic([32.2, 32.79, 33.9, 34.22])
]
| extend test_stat= 0.0, p_val = 0.0
| invoke two_sample_t_test_fl('sample1', 'sample2', 'test_stat', 'p_val')

Stored

datatable(id:string, sample1:dynamic, sample2:dynamic) [
'Test #1', dynamic([23.64, 20.57, 20.42]), dynamic([27.1, 22.12, 33.56]),
'Test #2', dynamic([20.85, 21.89, 23.41]), dynamic([35.09, 30.02, 26.52]),
'Test #3', dynamic([20.13, 20.5, 21.7, 22.02]), dynamic([32.2, 32.79, 33.9, 34.22])
]
| extend test_stat= 0.0, p_val = 0.0
| invoke two_sample_t_test_fl('sample1', 'sample2', 'test_stat', 'p_val')

Output

IDsample1sample2test_statp_val
Test #1[23.64, 20.57, 20.42][27.1, 22.12, 33.56]-1.74156754575656450.15655096653487446
Test #2[20.85, 21.89, 23.41][35.09, 30.02, 26.52], -3.27116734910225790.030755331219276136
Test #3[20.13, 20.5, 21.7, 22.02][32.2, 32.79, 33.9, 34.22]-18.55159462017421.5823717131966134E-06

5.60 - User-defined functions

This article describes user-defined functions (scalar and views).

User-defined functions are reusable subqueries that can be defined as part of the query itself (query-defined functions), or stored as part of the database metadata (stored functions). User-defined functions are invoked through a name, are provided with zero or more input arguments (which can be scalar or tabular), and produce a single value (which can be scalar or tabular) based on the function body.

A user-defined function belongs to one of two categories:

  • Scalar functions
  • Tabular functions

The function’s input arguments and output determine whether it’s scalar or tabular, which then establishes how it might be used.

To optimize multiple uses of the user-defined functions within a single query, see Optimize queries that use named expressions.

We’ve created an assortment of user-defined functions that you can use in your queries. For more information, see Functions library.

Scalar function

  • Has zero input arguments, or all its input arguments are scalar values
  • Produces a single scalar value
  • Can be used wherever a scalar expression is allowed
  • May only use the row context in which it’s defined
  • Can only refer to tables (and views) that are in the accessible schema

Tabular function

  • Accepts one or more tabular input arguments, and zero or more scalar input arguments, and/or:
  • Produces a single tabular value

Function names

Valid user-defined function names must follow the same identifier naming rules as other entities.

The name must also be unique in its scope of definition.

Input arguments

Valid user-defined functions follow these rules:

  • A user-defined function has a strongly typed list of zero or more input arguments.
  • An input argument has a name, a type, and (for scalar arguments) a default value.
  • The name of an input argument is an identifier.
  • The type of an input argument is either one of the scalar data types, or a tabular schema.

Syntactically, the input arguments list is a comma-separated list of argument definitions, wrapped in parenthesis. Each argument definition is specified as

ArgName:ArgType [= ArgDefaultValue]

For tabular arguments, ArgType has the same syntax as the table definition (parenthesis and a list of column name/type pairs), with the addition of a solitary (*) indicating “any tabular schema”.

For example:

SyntaxInput arguments list description
()No arguments
(s:string)Single scalar argument called s taking a value of type string
(a:long, b:bool=true)Two scalar arguments, the second of which has a default value
(T1:(*), T2:(r:real), b:bool)Three arguments (two tabular arguments and one scalar argument)

Examples

Scalar function

let Add7 = (arg0:long = 5) { arg0 + 7 };
range x from 1 to 10 step 1
| extend x_plus_7 = Add7(x), five_plus_seven = Add7()

Tabular function with no arguments

let tenNumbers = () { range x from 1 to 10 step 1};
tenNumbers
| extend x_plus_7 = x + 7

Tabular function with arguments

let MyFilter = (T:(x:long), v:long) {
  T | where x >= v
};
MyFilter((range x from 1 to 10 step 1), 9)

Output

x
9
10

A tabular function that uses a tabular input with no column specified. Any table can be passed to a function, and no table columns can be referenced inside the function.

let MyDistinct = (T:(*)) {
  T | distinct *
};
MyDistinct((range x from 1 to 3 step 1))

Output

x
1
2
3

Declaring user-defined functions

The declaration of a user-defined function provides:

  • Function name
  • Function schema (parameters it accepts, if any)
  • Function body
let f=(s:string, i:long) {
    tolong(s) * i
};

The function body includes:

  • Exactly one expression, which provides the function’s return value (scalar or tabular value).
  • Any number (zero or more) of let statements, whose scope is that of the function body. If specified, the let statements must precede the expression defining the function’s return value.
  • Any number (zero or more) of query parameters statements, which declare query parameters used by the function. If specified, they must precede the expression defining the function’s return value.

Examples of user-defined functions

The following section shows examples of how to use user-defined functions.

User-defined function that uses a let statement

The following example shows a user-defined function (lambda) that accepts a parameter named ID. The function is bound to the name Test and makes use of three let statements, in which the Test3 definition uses the ID parameter. When run, the output from the query is 70:

let Test = (id: int) {
  let Test2 = 10;
  let Test3 = 10 + Test2 + id;
  let Test4 = (arg: int) {
      let Test5 = 20;
      Test2 + Test3 + Test5 + arg
  };
  Test4(10)
};
range x from 1 to Test(10) step 1
| count

User-defined function that defines a default value for a parameter

The following example shows a function that accepts three arguments. The latter two have a default value and don’t have to be present at the call site.

let f = (a:long, b:string = "b.default", c:long = 0) {
  strcat(a, "-", b, "-", c)
};
print f(12, c=7) // Returns "12-b.default-7"

Invoking a user-defined function

The method to invoke a user-defined function depends on the arguments that the function expects to receive. The following sections cover how to invoke a UDF without arguments, invoke a UDF with scalar arguments, and invoke a UDF with tabular arguments.

Invoke a UDF without arguments

A user-defined function that takes no arguments and can be invoked either by its name, or by its name and an empty argument list in parentheses.

// Bind the identifier a to a user-defined function (lambda) that takes
// no arguments and returns a constant of type long:
let a=(){123};
// Invoke the function in two equivalent ways:
range x from 1 to 10 step 1
| extend y = x * a, z = x * a()
// Bind the identifier T to a user-defined function (lambda) that takes
// no arguments and returns a random two-by-two table:
let T=(){
  range x from 1 to 2 step 1
  | project x1 = rand(), x2 = rand()
};
// Invoke the function in two equivalent ways:
// (Note that the second invocation must be itself wrapped in
// an additional set of parentheses, as the union operator
// differentiates between "plain" names and expressions)
union T, (T())

Invoke a UDF with scalar arguments

A user-defined function that takes one or more scalar arguments can be invoked by using the function name and a concrete argument list in parentheses:

let f=(a:string, b:string) {
  strcat(a, " (la la la)", b)
};
print f("hello", "world")

Invoke a UDF with tabular arguments

A user-defined function that takes one or more table arguments (with any number of scalar arguments) and can be invoked using the function name and a concrete argument list in parentheses:

let MyFilter = (T:(x:long), v:long) {
  T | where x >= v
};
MyFilter((range x from 1 to 10 step 1), 9)

You can also use the operator invoke to invoke a user-defined function that takes one or more table arguments and returns a table. This function is useful when the first concrete table argument to the function is the source of the invoke operator:

let append_to_column_a=(T:(a:string), what:string) {
    T | extend a=strcat(a, " ", what)
};
datatable (a:string) ["sad", "really", "sad"]
| invoke append_to_column_a(":-)")

Default values

Functions may provide default values to some of their parameters under the following conditions:

  • Default values may be provided for scalar parameters only.
  • Default values are always literals (constants). They can’t be arbitrary calculations.
  • Parameters with no default value always precede parameters that do have a default value.
  • Callers must provide the value of all parameters with no default values arranged in the same order as the function declaration.
  • Callers don’t need to provide the value for parameters with default values, but may do so.
  • Callers may provide arguments in an order that doesn’t match the order of the parameters. If so, they must name their arguments.

The following example returns a table with two identical records. In the first invocation of f, the arguments are completely “scrambled”, so each one is explicitly given a name:

let f = (a:long, b:string = "b.default", c:long = 0) {
  strcat(a, "-", b, "-", c)
};
union
  (print x=f(c=7, a=12)), // "12-b.default-7"
  (print x=f(12, c=7))    // "12-b.default-7"

Output

x
12-b.default-7
12-b.default-7

View functions

A user-defined function that takes no arguments and returns a tabular expression can be marked as a view. Marking a user-defined function as a view means that the function behaves like a table whenever a wildcard table name resolution is performed.

The following example shows two user-defined functions, T_view and T_notview, and shows how only the first one is resolved by the wildcard reference in the union:

let T_view = view () { print x=1 };
let T_notview = () { print x=2 };
union T*

Restrictions

The following restrictions apply:

  • User-defined functions can’t pass into toscalar() invocation information that depends on the row-context in which the function is called.
  • User-defined functions that return a tabular expression can’t be invoked with an argument that varies with the row context.
  • A function taking at least one tabular input can’t be invoked on a remote cluster.
  • A scalar function can’t be invoked on a remote cluster.

The only place a user-defined function may be invoked with an argument that varies with the row context is when the user-defined function is composed of scalar functions only and doesn’t use toscalar().

Examples

Supported scalar function

The following query is supported because f is a scalar function that doesn’t reference any tabular expression.

let Table1 = datatable(xdate:datetime)[datetime(1970-01-01)];
let Table2 = datatable(Column:long)[1235];
let f = (hours:long) { now() + hours*1h };
Table2 | where Column != 123 | project d = f(10)

The following query is supported because f is a scalar function that references the tabular expression Table1 but is invoked with no reference to the current row context f(10):

let Table1 = datatable(xdate:datetime)[datetime(1970-01-01)];
let Table2 = datatable(Column:long)[1235];
let f = (hours:long) { toscalar(Table1 | summarize min(xdate) - hours*1h) };
Table2 | where Column != 123 | project d = f(10)

Unsupported scalar function

The following query isn’t supported because f is a scalar function that references the tabular expression Table1, and is invoked with a reference to the current row context f(Column):

let Table1 = datatable(xdate:datetime)[datetime(1970-01-01)];
let Table2 = datatable(Column:long)[1235];
let f = (hours:long) { toscalar(Table1 | summarize min(xdate) - hours*1h) };
Table2 | where Column != 123 | project d = f(Column)

Unsupported tabular function

The following query isn’t supported because f is a tabular function that is invoked in a context that expects a scalar value.

let Table1 = datatable(xdate:datetime)[datetime(1970-01-01)];
let Table2 = datatable(Column:long)[1235];
let f = (hours:long) { range x from 1 to hours step 1 | summarize make_list(x) };
Table2 | where Column != 123 | project d = f(Column)

Features that are currently unsupported by user-defined functions

For completeness, here are some commonly requested features for user-defined functions that are currently not supported:

  1. Function overloading: There’s currently no way to overload a function (a way to create multiple functions with the same name and different input schema).

  2. Default values: The default value for a scalar parameter to a function must be a scalar literal (constant).

5.61 - wilcoxon_test_fl()

This article describes the wilcoxon_test_fl() user-defined function.

The function wilcoxon_test_fl() is a user-defined function (UDF) that performs the Wilcoxon Test.

Syntax

T | invoke wilcoxon_test_fl()(data, test_statistic,p_value)

Parameters

NameTypeRequiredDescription
datastring✔️The name of the column containing the data to be used for the test.
test_statisticstring✔️The name of the column to store test statistic value for the results.
p_valuestring✔️The name of the column to store p-value for the results.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined

Define the function using the following let statement. No permissions are required.

let wilcoxon_test_fl = (tbl:(*), data:string, test_statistic:string, p_value:string)
{
    let kwargs = bag_pack('data', data, 'test_statistic', test_statistic, 'p_value', p_value);
    let code = ```if 1:
        from scipy import stats
        data = kargs["data"]
        test_statistic = kargs["test_statistic"]
        p_value = kargs["p_value"]
        def func(row):
            statistics = stats.wilcoxon(row[data])
            return statistics[0], statistics[1]
        result = df
        result[[test_statistic, p_value]]  = df.apply(func, axis=1, result_type = "expand")
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
};
// Write your query to use the function here.

Stored

Define the stored function once using the following .create function. Database User permissions are required.

.create-or-alter function with (folder = "Packages\\Stats", docstring = "Wilcoxon Test")
wilcoxon_test_fl(tbl:(*), data:string, test_statistic:string, p_value:string)
{
    let kwargs = bag_pack('data', data, 'test_statistic', test_statistic, 'p_value', p_value);
    let code = ```if 1:
        from scipy import stats
        data = kargs["data"]
        test_statistic = kargs["test_statistic"]
        p_value = kargs["p_value"]
        def func(row):
            statistics = stats.wilcoxon(row[data])
            return statistics[0], statistics[1]
        result = df
        result[[test_statistic, p_value]]  = df.apply(func, axis=1, result_type = "expand")
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
}

Example

The following example uses the invoke operator to run the function.

Query-defined

To use a query-defined function, invoke it after the embedded function definition.

let wilcoxon_test_fl = (tbl:(*), data:string, test_statistic:string, p_value:string)
{
    let kwargs = bag_pack('data', data, 'test_statistic', test_statistic, 'p_value', p_value);
    let code = ```if 1:
        from scipy import stats
        data = kargs["data"]
        test_statistic = kargs["test_statistic"]
        p_value = kargs["p_value"]
        def func(row):
            statistics = stats.wilcoxon(row[data])
            return statistics[0], statistics[1]
        result = df
        result[[test_statistic, p_value]]  = df.apply(func, axis=1, result_type = "expand")
    ```;
    tbl
    | evaluate python(typeof(*), code, kwargs)
};
datatable(id:string, sample1:dynamic) [
'Test #1', dynamic([23.64, 20.57, 20.42]),
'Test #2', dynamic([20.85, 21.89, 23.41]),
'Test #3', dynamic([20.13, 20.5, 21.7, 22.02])
]
| extend test_stat= 0.0, p_val = 0.0
| invoke wilcoxon_test_fl('sample1', 'test_stat', 'p_val') -->

Stored

datatable(id:string, sample1:dynamic) [
'Test #1', dynamic([23.64, 20.57, 20.42]),
'Test #2', dynamic([20.85, 21.89, 23.41]),
'Test #3', dynamic([20.13, 20.5, 21.7, 22.02])
]
| extend test_stat= 0.0, p_val = 0.0
| invoke wilcoxon_test_fl('sample1', 'test_stat', 'p_val')

Output

IDsample1test_statp_val
Test #1[23.64, 20.57, 20.42]0, 0.10880943004054568
Test #2[20.85, 21.89, 23.41]0, 0.10880943004054568
Test #3[20.13, 20.5, 21.7, 22.02]0, 0.06788915486182899

6 - Geospatial

6.1 - geo_angle()

Learn how to use the geo_angle() function to calculate the angle between two lines on Earth.

Calculates clockwise angle in radians between two lines on Earth. The first line is [point1, point2] and the second line is [point2, point3].

Syntax

geo_angle(p1_longitude,p1_latitude,p2_longitude,p2_latitude,p3_longitude,p3_latitude)

Parameters

NameTypeRequiredDescription
p1_longitudereal✔️The longitude value in degrees of the first geospatial coordinate. A valid value is in the range [-180, +180].
p1_latitudereal✔️The latitude value in degrees of the first geospatial coordinate. A valid value is in the range [-90, +90].
p2_longitudereal✔️The longitude value in degrees of the second geospatial coordinate. A valid value is in the range [-180, +180].
p2_latitudereal✔️The latitude value in degrees of the second geospatial coordinate. A valid value is in the range [-90, +90].
p3_longitudereal✔️The longitude value in degrees of the second geospatial coordinate. A valid value is in the range [-180, +180].
p3_latitudereal✔️The latitude value in degrees of the second geospatial coordinate. A valid value is in the range [-90, +90].

Returns

An angle in radians in range [0, 2pi) between two lines [p1, p2] and [p2, p3]. The angle is measured CW from the first line to the Second line.

Examples

The following example calculates the angle in radians.

print angle_in_radians = geo_angle(0, 10, 0,5, 3,-10)

Output

angle_in_radians
2.94493843406882

The following example calculates the angle in degrees.

let angle_in_radians = geo_angle(0, 10, 0,5, 3,-10);
print angle_in_degrees = degrees(angle_in_radians)

Output

angle_in_degrees
168.732543198009

The following example returns null because 1st point equals to 2nd point.

print is_null = isnull(geo_angle(0, 10, 0, 10, 3, -10))

Output

is_null
True

6.2 - geo_azimuth()

Learn how to use the geo_azimuth() function to calculate the angle between the true north and a line on Earth.

Calculates clockwise angle in radians between the line from point1 to true north and a line from point1 to point2 on Earth.

Syntax

geo_azimuth(p1_longitude,p1_latitude,p2_longitude,p2_latitude)

Parameters

NameTypeRequiredDescription
p1_longitudereal✔️The longitude value in degrees of the first geospatial coordinate. A valid value is in the range [-180, +180].
p1_latitudereal✔️The latitude value in degrees of the first geospatial coordinate. A valid value is in the range [-90, +90].
p2_longitudereal✔️The longitude value in degrees of the second geospatial coordinate. A valid value is in the range [-180, +180].
p2_latitudereal✔️The latitude value in degrees of the second geospatial coordinate. A valid value is in the range [-90, +90].

Returns

An angle in radians between the line from point p1 to true north and line [p1, p2]. The angle is measured clockwise.

Examples

The following example calculates azimuth in radians.

print azimuth_in_radians = geo_azimuth(5, 10, 10, -40)

Output

azimuth_in_radians
3.05459939796449

The following example calculates azimuth in degrees.

let azimuth_in_radians = geo_azimuth(5, 10, 10, -40);
print azimuth_in_degrees = degrees(azimuth_in_radians);

Output

azimuth_in_degrees
175.015653606568

The following example considers a truck that emits telemetry of its location while it travels and looks for its travel direction.

let get_direction = (azimuth:real)
{
    let pi = pi();
    iff(azimuth < pi/2,   "North-East",
    iff(azimuth < pi,     "South-East",
    iff(azimuth < 3*pi/2, "South-West",
                          "North-West")));
};
datatable(timestamp:datetime, lng:real, lat:real)
[
    datetime(2024-01-01T00:01:53.048506Z), -115.4036607693417, 36.40551631046261,
    datetime(2024-01-01T00:02:53.048506Z), -115.3256807623232, 36.34102142760111,
    datetime(2024-01-01T00:03:53.048506Z), -115.2732290602112, 36.28458914829917,
    datetime(2024-01-01T00:04:53.048506Z), -115.2513186233914, 36.27622394664352,
    datetime(2024-01-01T00:05:53.048506Z), -115.2352055633212, 36.27545547038515,
    datetime(2024-01-01T00:06:53.048506Z), -115.1894341934856, 36.28266934431671,
    datetime(2024-01-01T00:07:53.048506Z), -115.1054318118468, 36.28957085435267,
    datetime(2024-01-01T00:08:53.048506Z), -115.0648614339413, 36.28110743285072,
    datetime(2024-01-01T00:09:53.048506Z), -114.9858032867736, 36.29780696509714,
    datetime(2024-01-01T00:10:53.048506Z), -114.9016966527561, 36.36556196813566,
]
| sort by timestamp asc 
| extend prev_lng = prev(lng), prev_lat = prev(lat)
| where isnotnull(prev_lng) and isnotnull(prev_lat)
| extend direction = get_direction(geo_azimuth(prev_lng, prev_lat, lng, lat))
| project direction, lng, lat
| render scatterchart with (kind = map)

Output

Azimuth between two consecutive locations.

The following example returns true because the first point equals the second point.

print is_null = isnull(geo_azimuth(5, 10, 5, 10))

Output

is_null
true

6.3 - geo_distance_2points()

Learn how to use the geo_distance_2points() function to calculate the shortest distance between two geospatial coordinates on Earth.

Calculates the shortest distance in meters between two geospatial coordinates on Earth.

Syntax

geo_distance_2points(p1_longitude,p1_latitude,p2_longitude,p2_latitude)

Parameters

NameTypeRequiredDescription
p1_longitudereal✔️The longitude value in degrees of the first geospatial coordinate. A valid value is in the range [-180, +180].
p1_latitudereal✔️The latitude value in degrees of the first geospatial coordinate. A valid value is in the range [-90, +90].
p2_longitudereal✔️The longitude value in degrees of the second geospatial coordinate. A valid value is in the range [-180, +180].
p2_latitudereal✔️The latitude value in degrees of the second geospatial coordinate. A valid value is in the range [-90, +90].

Returns

The shortest distance, in meters, between two geographic locations on Earth. If the coordinates are invalid, the query produces a null result.

Examples

The following example finds the shortest distance between Seattle and Los Angeles.

Distance between Seattle and Los Angeles.

print distance_in_meters = geo_distance_2points(-122.407628, 47.578557, -118.275287, 34.019056)

Output

distance_in_meters
1546754.35197381

The following example finds an approximation of the shortest path from Seattle to London. The line consists of coordinates along the LineString and within 500 meters from it.

range i from 1 to 1000000 step 1
| project lng = rand() * real(-122), lat = rand() * 90
| where lng between(real(-122) .. 0) and lat between(47 .. 90)
| where geo_distance_point_to_line(lng,lat,dynamic({"type":"LineString","coordinates":[[-122,47],[0,51]]})) < 500
| render scatterchart with (kind=map)

Output

Screenshot of the Seattle to London LineString.

The following example finds all rows in which the shortest distance between two coordinates is between one meter and 11 meters.

StormEvents
| extend distance_1_to_11m = geo_distance_2points(BeginLon, BeginLat, EndLon, EndLat)
| where distance_1_to_11m between (1 .. 11)
| project distance_1_to_11m

Output

distance_1_to_11m
10.5723100154958
7.92153588248414

The following example returns a null result because of the invalid coordinate input.

print distance = geo_distance_2points(300,1,1,1)

Output

distance

6.4 - geo_distance_point_to_line()

Learn how to use the geo_distance_point_to_line() function to calculate the shortest distance between a coordinate and a line or multiline on Earth.

Calculates the shortest distance in meters between a coordinate and a line or multiline on Earth.

Syntax

geo_distance_point_to_line(longitude,latitude,lineString)

Parameters

NameTypeRequiredDescription
longitudereal✔️The geospatial coordinate longitude value in degrees. A valid value is in the range [-180, +180].
latitudereal✔️The geospatial coordinate latitude value in degrees. A valid value is in the range [-90, +90].
lineStringdynamic✔️A line or multiline in the GeoJSON format.

Returns

The shortest distance, in meters, between a coordinate and a line or multiline on Earth. If the coordinate or lineString are invalid, the query produces a null result.

LineString definition and constraints

dynamic({“type”: “LineString”,“coordinates”: [[lng_1,lat_1], [lng_2,lat_2],…, [lng_N,lat_N]]})

dynamic({“type”: “MultiLineString”,“coordinates”: [[line_1, line_2, …, line_N]]})

  • LineString coordinates array must contain at least two entries.
  • Coordinates [longitude, latitude] must be valid where longitude is a real number in the range [-180, +180] and latitude is a real number in the range [-90, +90].
  • Edge length must be less than 180 degrees. The shortest edge between the two vertices is chosen.

Examples

Shortest distance to airport

The following example finds the shortest distance between North Las Vegas Airport and a nearby road.

Screenshot of a map showing the distance between North Las Vegas Airport and a specific road.

print distance_in_meters = geo_distance_point_to_line(-115.199625, 36.210419, dynamic({ "type":"LineString","coordinates":[[-115.115385,36.229195],[-115.136995,36.200366],[-115.140252,36.192470],[-115.143558,36.188523],[-115.144076,36.181954],[-115.154662,36.174483],[-115.166431,36.176388],[-115.183289,36.175007],[-115.192612,36.176736],[-115.202485,36.173439],[-115.225355,36.174365]]}))

Output

distance_in_meters
3797.88887253334

Storm events across the south coast

The following example finds storm events along the US south coast filtered by a maximum distance of 5 km from the defined shore line.

let southCoast = dynamic({"type":"LineString","coordinates":[[-97.18505859374999,25.997549919572112],[-97.58056640625,26.96124577052697],[-97.119140625,27.955591004642553],[-94.04296874999999,29.726222319395504],[-92.98828125,29.82158272057499],[-89.18701171875,29.11377539511439],[-89.384765625,30.315987718557867],[-87.5830078125,30.221101852485987],[-86.484375,30.4297295750316],[-85.1220703125,29.6880527498568],[-84.00146484374999,30.14512718337613],[-82.6611328125,28.806173508854776],[-82.81494140625,28.033197847676377],[-82.177734375,26.52956523826758],[-80.9912109375,25.20494115356912]]});
StormEvents
| project BeginLon, BeginLat, EventType
| where geo_distance_point_to_line(BeginLon, BeginLat, southCoast) < 5000
| render scatterchart with (kind=map)

Output

Screenshot of rendered storm events along the south coast of the US.

New York taxi pickups

The following example finds New York taxi pickups filtered by a maximum distance of 0.1 meters from the defined multiline.

let MadisonAve = dynamic({"type":"MultiLineString","coordinates":[[[-73.9879823,40.7408625],[-73.9876492,40.7413345],[-73.9874982,40.7415046],[-73.9870343,40.7421446],[-73.9865812,40.7427655],[-73.9861292,40.7433756],[-73.9856813,40.7439956],[-73.9854932,40.7442606],[-73.9852232,40.7446216],[-73.9847903,40.7452305],[-73.9846232,40.7454536],[-73.9844803,40.7456606],[-73.9843413,40.7458585],[-73.9839533,40.7463955],[-73.9839002,40.7464696],[-73.9837683,40.7466566],[-73.9834342,40.7471015],[-73.9833833,40.7471746],[-73.9829712,40.7477686],[-73.9824752,40.7484255],[-73.9820262,40.7490436],[-73.9815623,40.7496566],[-73.9811212,40.7502796],[-73.9809762,40.7504976],[-73.9806982,40.7509255],[-73.9802752,40.7515216],[-73.9798033,40.7521795],[-73.9795863,40.7524656],[-73.9793082,40.7528316],[-73.9787872,40.7534725],[-73.9783433,40.7540976],[-73.9778912,40.7547256],[-73.9774213,40.7553365],[-73.9769402,40.7559816],[-73.9764622,40.7565766],[-73.9760073,40.7572036],[-73.9755592,40.7578366],[-73.9751013,40.7584665],[-73.9746532,40.7590866],[-73.9741902,40.7597326],[-73.9737632,40.7603566],[-73.9733032,40.7609866],[-73.9728472,40.7616205],[-73.9723422,40.7622826],[-73.9718672,40.7629556],[-73.9714042,40.7635726],[-73.9709362,40.7642185],[-73.9705282,40.7647636],[-73.9704903,40.7648196],[-73.9703342,40.7650355],[-73.9701562,40.7652826],[-73.9700322,40.7654535],[-73.9695742,40.7660886],[-73.9691232,40.7667166],[-73.9686672,40.7673375],[-73.9682142,40.7679605],[-73.9677482,40.7685786],[-73.9672883,40.7692076],[-73.9668412,40.7698296],[-73.9663882,40.7704605],[-73.9659222,40.7710936],[-73.9654262,40.7717756],[-73.9649292,40.7724595],[-73.9644662,40.7730955],[-73.9640012,40.7737285],[-73.9635382,40.7743615],[-73.9630692,40.7749936],[-73.9626122,40.7756275],[-73.9621172,40.7763106],[-73.9616111,40.7769896],[-73.9611552,40.7776245],[-73.9606891,40.7782625],[-73.9602212,40.7788866],[-73.9597532,40.7795236],[-73.9595842,40.7797445],[-73.9592942,40.7801635],[-73.9591122,40.7804105],[-73.9587982,40.7808305],[-73.9582992,40.7815116],[-73.9578452,40.7821455],[-73.9573802,40.7827706],[-73.9569262,40.7833965],[-73.9564802,40.7840315],[-73.9560102,40.7846486],[-73.9555601,40.7852755],[-73.9551221,40.7859005],[-73.9546752,40.7865426],[-73.9542571,40.7871505],[-73.9541771,40.7872335],[-73.9540892,40.7873366],[-73.9536971,40.7879115],[-73.9532792,40.7884706],[-73.9532142,40.7885205],[-73.9531522,40.7885826],[-73.9527382,40.7891785],[-73.9523081,40.7897545],[-73.9518332,40.7904115],[-73.9513721,40.7910435],[-73.9509082,40.7916695],[-73.9504602,40.7922995],[-73.9499882,40.7929195],[-73.9495051,40.7936045],[-73.9490071,40.7942835],[-73.9485542,40.7949065],[-73.9480832,40.7955345],[-73.9476372,40.7961425],[-73.9471772,40.7967915],[-73.9466841,40.7974475],[-73.9453432,40.7992905],[-73.9448332,40.7999835],[-73.9443442,40.8006565],[-73.9438862,40.8012945],[-73.9434262,40.8019196],[-73.9431412,40.8023325],[-73.9429842,40.8025585],[-73.9425691,40.8031855],[-73.9424401,40.8033609],[-73.9422987,40.8035533],[-73.9422013,40.8036857],[-73.9421022,40.8038205],[-73.9420024,40.8039552],[-73.9416372,40.8044485],[-73.9411562,40.8050725],[-73.9406471,40.8057176],[-73.9401481,40.8064135],[-73.9397022,40.8070255],[-73.9394081,40.8074155],[-73.9392351,40.8076495],[-73.9387842,40.8082715],[-73.9384681,40.8087086],[-73.9383211,40.8089025],[-73.9378792,40.8095215],[-73.9374011,40.8101795],[-73.936405,40.8115707],[-73.9362328,40.8118098]],[[-73.9362328,40.8118098],[-73.9362432,40.8118567],[-73.9361239,40.8120222],[-73.9360302,40.8120805]],[[-73.9362328,40.8118098],[-73.9361571,40.8118294],[-73.9360443,40.8119993],[-73.9360302,40.8120805]],[[-73.9360302,40.8120805],[-73.9359423,40.8121378],[-73.9358551,40.8122385],[-73.9352181,40.8130815],[-73.9348702,40.8135515],[-73.9347541,40.8137145],[-73.9346332,40.8138615],[-73.9345542,40.8139595],[-73.9344981,40.8139945],[-73.9344571,40.8140165],[-73.9343962,40.8140445],[-73.9343642,40.8140585],[-73.9343081,40.8140725],[-73.9341971,40.8140895],[-73.9341041,40.8141005],[-73.9340022,40.8140965],[-73.9338442,40.8141005],[-73.9333712,40.8140895],[-73.9325541,40.8140755],[-73.9324561,40.8140705],[-73.9324022,40.8140695]],[[-73.9360302,40.8120805],[-73.93605,40.8121667],[-73.9359632,40.8122805],[-73.9353631,40.8130795],[-73.9351482,40.8133625],[-73.9350072,40.8135415],[-73.9347441,40.8139168],[-73.9346611,40.8140125],[-73.9346101,40.8140515],[-73.9345401,40.8140965],[-73.9344381,40.8141385],[-73.9343451,40.8141555],[-73.9342991,40.8141675],[-73.9341552,40.8141985],[-73.9338601,40.8141885],[-73.9333991,40.8141815],[-73.9323981,40.8141665]]]});
nyc_taxi
| project pickup_longitude, pickup_latitude
| where geo_distance_point_to_line(pickup_longitude, pickup_latitude, MadisonAve) <= 0.1
| take 100
| render scatterchart with (kind=map)

Output

Screenshot of rendered NYC taxi pickups on Madison Ave.

The following example folds many lines into one multiline and queries this multiline. The query finds all taxi pickups that happened 10 km away from all roads in Manhattan.

let ManhattanRoads =
    datatable(features:dynamic)
    [
        dynamic({"type":"Feature","properties":{"Label":"145thStreetBrg"},"geometry":{"type":"MultiLineString","coordinates":[[[-73.9322259,40.8194635],[-73.9323259,40.8194743],[-73.9323973,40.8194779]]]}}),
        dynamic({"type":"Feature","properties":{"Label":"W120thSt"},"geometry":{"type":"MultiLineString","coordinates":[[[-73.9619541,40.8104844],[-73.9621542,40.8105725],[-73.9630542,40.8109455],[-73.9635902,40.8111714],[-73.9639492,40.8113174],[-73.9640502,40.8113705]]]}}),
        dynamic({"type":"Feature","properties":{"Label":"1stAve"},"geometry":{"type":"MultiLineString","coordinates":[[[-73.9704124,40.748033],[-73.9702043,40.7480906],[-73.9696892,40.7487346],[-73.9695012,40.7491976],[-73.9694522,40.7493196]],[[-73.9699932,40.7488636],[-73.9694522,40.7493196]],[[-73.9694522,40.7493196],[-73.9693113,40.7494946],[-73.9688832,40.7501056],[-73.9686562,40.7504196],[-73.9684231,40.7507476],[-73.9679832,40.7513586],[-73.9678702,40.7514986]],[[-73.9676833,40.7520426],[-73.9675462,40.7522286],[-73.9673532,40.7524976],[-73.9672892,40.7525906],[-73.9672122,40.7526806]]]}})
        // ... more roads ...
    ];
let allRoads=toscalar(
    ManhattanRoads
    | project road_coordinates=features.geometry.coordinates
    | summarize make_list(road_coordinates)
    | project multiline = bag_pack("type","MultiLineString", "coordinates", list_road_coordinates));
nyc_taxi
| project pickup_longitude, pickup_latitude
| where pickup_longitude != 0 and pickup_latitude != 0
| where geo_distance_point_to_line(pickup_longitude, pickup_latitude, parse_json(allRoads)) > 10000
| take 10
| render scatterchart with (kind=map)

Output

Screenshot of a query map rendering example of lines folded into a multiline. The example is all taxi pickups 10 km away from all Manhattan roads.

Invalid LineString

The following example returns a null result because of the invalid LineString input.

print distance_in_meters = geo_distance_point_to_line(1,1, dynamic({ "type":"LineString"}))

Output

distance_in_meters

Invalid coordinate

The following example returns a null result because of the invalid coordinate input.

print distance_in_meters = geo_distance_point_to_line(300, 3, dynamic({ "type":"LineString","coordinates":[[1,1],[2,2]]}))

Output

distance_in_meters

6.5 - geo_distance_point_to_polygon()

Learn how to use the geo_distance_point_to_polygon() function to calculate the shortest distance between a coordinate and a polygon or a multipolygon on Earth.

Calculates the shortest distance between a coordinate and a polygon or a multipolygon on Earth.

Syntax

geo_distance_point_to_polygon(longitude,latitude,polygon)

Parameters

NameTypeRequiredDescription
longitudereal✔️Geospatial coordinate, longitude value in degrees. Valid value is a real number and in the range [-180, +180].
latitudereal✔️Geospatial coordinate, latitude value in degrees. Valid value is a real number and in the range [-90, +90].
polygondynamic✔️Polygon or multipolygon in the GeoJSON format.

Returns

The shortest distance, in meters, between a coordinate and a polygon or a multipolygon on Earth. If polygon contains point, the distance will be 0. If the coordinates or polygons are invalid, the query will produce a null result.

Polygon definition and constraints

dynamic({“type”: “Polygon”,“coordinates”: [LinearRingShell, LinearRingHole_1, …, LinearRingHole_N]})

dynamic({“type”: “MultiPolygon”,“coordinates”: [[LinearRingShell, LinearRingHole_1,…, LinearRingHole_N],…, [LinearRingShell, LinearRingHole_1,…, LinearRingHole_M]]})

  • LinearRingShell is required and defined as a counterclockwise ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be only one shell.
  • LinearRingHole is optional and defined as a clockwise ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be any number of interior rings and holes.
  • LinearRing vertices must be distinct with at least three coordinates. The first coordinate must be equal to the last. At least four entries are required.
  • Coordinates [longitude, latitude] must be valid. Longitude must be a real number in the range [-180, +180] and latitude must be a real number in the range [-90, +90].
  • LinearRingShell encloses at most half of the sphere. LinearRing divides the sphere into two regions. The smaller of the two regions will be chosen.
  • LinearRing edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.
  • LinearRings must not cross and must not share edges. LinearRings may share vertices.
  • Polygon doesn’t necessarily contain its vertices.

Examples

The following example calculates shortest distance in meters from some location in NYC to Central Park.

let central_park = dynamic({"type":"Polygon","coordinates":[[[-73.9495,40.7969],[-73.95807266235352,40.80068603561921],[-73.98201942443848,40.76825672305777],[-73.97317886352539,40.76455136505513],[-73.9495,40.7969]]]});
print geo_distance_point_to_polygon(-73.9839, 40.7705, central_park)

Output

print_0
259.940756070596

The following example enriches the data with distance.

let multipolygon = dynamic({"type":"MultiPolygon","coordinates":[[[[-73.991460000000131,40.731738000000206],[-73.992854491775518,40.730082566051351],[-73.996772,40.725432000000154],[-73.997634685522883,40.725786309886963],[-74.002855946639244,40.728346630056791],[-74.001413,40.731065000000207],[-73.996796995070824,40.73736378205173],[-73.991724524037934,40.735245208931886],[-73.990703782359589,40.734781896080477],[-73.991460000000131,40.731738000000206]]],[[[-73.958357552055688,40.800369095633819],[-73.98143901556422,40.768762584141953],[-73.981548752788598,40.7685590292784],[-73.981565335901905,40.768307084720796],[-73.981754418060945,40.768399727738668],[-73.982038573548124,40.768387823012056],[-73.982268248204349,40.768298621883247],[-73.982384797518051,40.768097213086911],[-73.982320919746599,40.767894461792181],[-73.982155532845766,40.767756204474757],[-73.98238873834039,40.767411004834273],[-73.993650353659021,40.772145571634361],[-73.99415893763998,40.772493009137818],[-73.993831082030937,40.772931787850908],[-73.993891252437052,40.772955194876722],[-73.993962585514595,40.772944653908901],[-73.99401262480508,40.772882846631894],[-73.994122058082397,40.77292405902601],[-73.994136652588594,40.772901870174394],[-73.994301342391154,40.772970028663913],[-73.994281535134448,40.77299380206933],[-73.994376552751078,40.77303955110149],[-73.994294029824005,40.773156243992048],[-73.995023275860802,40.773481196576356],[-73.99508939189289,40.773388475039134],[-73.995013963716758,40.773358035426909],[-73.995050284699261,40.773297153189958],[-73.996240651898916,40.773789791397689],[-73.996195837470992,40.773852356184044],[-73.996098807369748,40.773951805299085],[-73.996179459973888,40.773986954351571],[-73.996095245226442,40.774086186437756],[-73.995572265161172,40.773870731394297],[-73.994017424135961,40.77321375261053],[-73.993935876811335,40.773179512586211],[-73.993861942928888,40.773269531698837],[-73.993822393527211,40.773381758622882],[-73.993767019318497,40.773483981224835],[-73.993698463744295,40.773562141052594],[-73.993358326468751,40.773926888327956],[-73.992622663865575,40.774974056037109],[-73.992577842766124,40.774956016359418],[-73.992527743951555,40.775002110439829],[-73.992469745815342,40.775024159551755],[-73.992403837191887,40.775018140390664],[-73.99226708903538,40.775116033858794],[-73.99217809026365,40.775279293897171],[-73.992059084937338,40.775497598192516],[-73.992125372394938,40.775509075053385],[-73.992226867797001,40.775482211026116],[-73.992329346608813,40.775468900958522],[-73.992361756801131,40.775501899766638],[-73.992386042960277,40.775557180424634],[-73.992087684712729,40.775983970821372],[-73.990927174149746,40.777566878763238],[-73.99039616003671,40.777585065679204],[-73.989461267506471,40.778875124584417],[-73.989175778438053,40.779287524015778],[-73.988868617400072,40.779692922911607],[-73.988871874499793,40.779713738253008],[-73.989219022880576,40.779697895209402],[-73.98927785904425,40.779723439271038],[-73.989409054180143,40.779737706471963],[-73.989498614927044,40.779725044389757],[-73.989596493388234,40.779698146683387],[-73.989679812902509,40.779677568658038],[-73.989752702937935,40.779671244211556],[-73.989842247806507,40.779680752670664],[-73.990040102120489,40.779707677698219],[-73.990137977524839,40.779699769704784],[-73.99033584033225,40.779661794394983],[-73.990430598697046,40.779664973055503],[-73.990622199396725,40.779676064914298],[-73.990745069505479,40.779671328184051],[-73.990872114282197,40.779646007643876],[-73.990961672224358,40.779639683751753],[-73.991057472829539,40.779652352625774],[-73.991157429497036,40.779669775606465],[-73.991242817404469,40.779671367084504],[-73.991255318289745,40.779650782516491],[-73.991294887120119,40.779630209208889],[-73.991321967649895,40.779631796041372],[-73.991359455569423,40.779585883337383],[-73.991551059227476,40.779574821437407],[-73.99141982585985,40.779755280287233],[-73.988886144117032,40.779878898532999],[-73.988939656706265,40.779956178440393],[-73.988926103530844,40.780059292013632],[-73.988911680264692,40.780096037146606],[-73.988919261468567,40.780226094343945],[-73.988381050202634,40.780981074045783],[-73.988232413846987,40.781233144215555],[-73.988210420831663,40.781225482542055],[-73.988140000000143,40.781409000000224],[-73.988041288067166,40.781585961353777],[-73.98810029382463,40.781602878305286],[-73.988076449145055,40.781650935001608],[-73.988018059972219,40.781634188810422],[-73.987960792842145,40.781770987031535],[-73.985465811970457,40.785360700575431],[-73.986172704965611,40.786068452258647],[-73.986455862401996,40.785919219081421],[-73.987072345615601,40.785189638820121],[-73.98711901394276,40.785210319004058],[-73.986497781023601,40.785951202887254],[-73.986164628806279,40.786121882448327],[-73.986128422486075,40.786239001331111],[-73.986071135219746,40.786240706026611],[-73.986027274789123,40.786228964236727],[-73.986097637849426,40.78605822569795],[-73.985429321269592,40.785413942184597],[-73.985081137732209,40.785921935110366],[-73.985198833254501,40.785966552197777],[-73.985170502389906,40.78601333415817],[-73.985216218673656,40.786030501816427],[-73.98525509797993,40.785976205511588],[-73.98524273937646,40.785972572653328],[-73.98524962933017,40.785963139855845],[-73.985281779186749,40.785978620950075],[-73.985240032884533,40.786035858136792],[-73.985683885242182,40.786222123919686],[-73.985717529004575,40.786175994668795],[-73.985765660297687,40.786196274858618],[-73.985682871922691,40.786309786213067],[-73.985636270930442,40.786290150649279],[-73.985670722564691,40.786242911993817],[-73.98520511880038,40.786047669212785],[-73.985211035607492,40.786039554883686],[-73.985162639946992,40.786020999769754],[-73.985131636312062,40.786060297019972],[-73.985016964065125,40.78601423719563],[-73.984655078830457,40.786534741807841],[-73.985743787901043,40.786570082854738],[-73.98589227228328,40.786426529019593],[-73.985942854994988,40.786452847880334],[-73.985949561556794,40.78648711396653],[-73.985812373526713,40.786616865357047],[-73.985135209703174,40.78658761889551],[-73.984619428584324,40.786586016349787],[-73.981952458164173,40.790393724337193],[-73.972823037363767,40.803428052816756],[-73.971036786332192,40.805918478839672],[-73.966701,40.804169000000186],[-73.959647,40.801156000000113],[-73.958508540159471,40.800682279767472],[-73.95853274080838,40.800491362464697],[-73.958357552055688,40.800369095633819]]],[[[-73.943592454622546,40.782747908206574],[-73.943648235390199,40.782656161333449],[-73.943870759887162,40.781273026571704],[-73.94345932494096,40.780048275653243],[-73.943213862652243,40.779317588660199],[-73.943004239504688,40.779639495474292],[-73.942716005450905,40.779544169476175],[-73.942712374762181,40.779214856940001],[-73.942535563208608,40.779090956062532],[-73.942893408188027,40.778614093246276],[-73.942438481745029,40.777315235766039],[-73.942244919522594,40.777104088947254],[-73.942074188038887,40.776917846977142],[-73.942002667222781,40.776185317382648],[-73.942620205199006,40.775180871576474],[-73.94285645694552,40.774796600349191],[-73.94293043781397,40.774676268036011],[-73.945870899588215,40.771692257932997],[-73.946618690150586,40.77093339256956],[-73.948664164778933,40.768857624399587],[-73.950069793030679,40.767025088383498],[-73.954418260786071,40.762184104951245],[-73.95650786241211,40.760285256574043],[-73.958787773424007,40.758213471309809],[-73.973015157270069,40.764278692864671],[-73.955760332998182,40.787906554459667],[-73.944023,40.782960000000301],[-73.943592454622546,40.782747908206574]]]]});
let coordinates = 
    datatable(longitude:real, latitude:real, description:string)
    [
        real(-73.9741), 40.7914, 'Upper West Side',
        real(-73.9950), 40.7340, 'Greenwich Village',
        real(-73.8743), 40.7773, 'LaGuardia Airport',
    ];
coordinates
| extend distance = geo_distance_point_to_polygon(longitude, latitude, multipolygon)

Output

longitudelatitudedescriptiondistance
-73.974140.7914Upper West Side0
-73.99540.734Greenwich Village0
-73.874340.7773LaGuardia Airport5702.15731467514

The following example finds all states that are within 200-km distance, excluding state that contains the point.

US_States
| project name = features.properties.NAME, polygon = features.geometry
| project name, distance = ceiling(geo_distance_point_to_polygon(-111.905, 40.634, polygon) / 1000)
| where distance < 200 and distance > 0

Output

namedistance
Idaho152
Nevada181
Wyoming83

The following example will return a null result because of the invalid coordinate input.

print distance = geo_distance_point_to_polygon(500,1,dynamic({"type": "Polygon","coordinates": [[[0,0],[10,10],[10,1],[0,0]]]}))

Output

distance

The following example will return a null result because of the invalid polygon input.

print distance = geo_distance_point_to_polygon(1,1,dynamic({"type": "Polygon","coordinates": [[[0,0],[10,10],[10,10],[0,0]]]}))

Output

distance

6.6 - geo_geohash_neighbors()

Learn how to use the geo_geohash_neighbors() function to calculate geohash neighbors.

Calculates Geohash neighbors.

Read more about geohash.

Syntax

geo_geohash_neighbors(geohash)

Parameters

NameTypeRequiredDescription
geohashstring✔️A geohash value as it was calculated by geo_point_to_geohash(). The geohash string must be between 1 and 18 characters.

Returns

An array of Geohash neighbors. If the Geohash is invalid, the query produces a null result.

Examples

The following example calculates Geohash neighbors.

print neighbors = geo_geohash_neighbors('sunny')

Output

neighbors
[“sunnt”,“sunpj”,“sunnx”,“sunpn”,“sunnv”,“sunpp”,“sunnz”,“sunnw”]

The following example calculates an array of input Geohash with its neighbors.

let geohash = 'sunny';
print cells = array_concat(pack_array(geohash), geo_geohash_neighbors(geohash))

Output

cells
[“sunny”,“sunnt”,“sunpj”,“sunnx”,“sunpn”,“sunnv”,“sunpp”,“sunnz”,“sunnw”]

The following example calculates Geohash polygons GeoJSON geometry collection.

let geohash = 'sunny';
print cells = array_concat(pack_array(geohash), geo_geohash_neighbors(geohash))
| mv-expand cells to typeof(string)
| project polygons = geo_geohash_to_polygon(cells)
| summarize arr = make_list(polygons)
| project geojson = bag_pack("type", "Feature","geometry", bag_pack("type", "GeometryCollection", "geometries", arr), "properties", bag_pack("name", "polygons"))

Output

geojson
{“type”: “Feature”,“geometry”: {“type”: “GeometryCollection”,“geometries”: [
{“type”:“Polygon”,“coordinates”:[[[42.451171875,23.6865234375],[42.4951171875,23.6865234375],[42.4951171875,23.73046875],[42.451171875,23.73046875],[42.451171875,23.6865234375]]]},
{“type”:“Polygon”,“coordinates”:[[[42.4072265625,23.642578125],[42.451171875,23.642578125],[42.451171875,23.6865234375],[42.4072265625,23.6865234375],[42.4072265625,23.642578125]]]},
{“type”:“Polygon”,“coordinates”:[[[42.4072265625,23.73046875],[42.451171875,23.73046875],[42.451171875,23.7744140625],[42.4072265625,23.7744140625],[42.4072265625,23.73046875]]]},
{“type”:“Polygon”,“coordinates”:[[[42.4951171875,23.642578125],[42.5390625,23.642578125],[42.5390625,23.6865234375],[42.4951171875,23.6865234375],[42.4951171875,23.642578125]]]},
{“type”:“Polygon”,“coordinates”:[[[42.451171875,23.73046875],[42.4951171875,23.73046875],[42.4951171875,23.7744140625],[42.451171875,23.7744140625],[42.451171875,23.73046875]]]},
{“type”:“Polygon”,“coordinates”:[[[42.4072265625,23.6865234375],[42.451171875,23.6865234375],[42.451171875,23.73046875],[42.4072265625,23.73046875],[42.4072265625,23.6865234375]]]},
{“type”:“Polygon”,“coordinates”:[[[42.4951171875,23.73046875],[42.5390625,23.73046875],[42.5390625,23.7744140625],[42.4951171875,23.7744140625],[42.4951171875,23.73046875]]]},
{“type”:“Polygon”,“coordinates”:[[[42.4951171875,23.6865234375],[42.5390625,23.6865234375],[42.5390625,23.73046875],[42.4951171875,23.73046875],[42.4951171875,23.6865234375]]]},
{“type”:“Polygon”,“coordinates”:[[[42.451171875,23.642578125],[42.4951171875,23.642578125],[42.4951171875,23.6865234375],[42.451171875,23.6865234375],[42.451171875,23.642578125]]]}]},
“properties”: {“name”: “polygons”}}

The following example calculates polygon unions that represent Geohash and its neighbors.

let h3cell = 'sunny';
print cells = array_concat(pack_array(h3cell), geo_geohash_neighbors(h3cell))
| mv-expand cells to typeof(string)
| project polygons = geo_geohash_to_polygon(cells)
| summarize arr = make_list(polygons)
| project polygon = geo_union_polygons_array(arr)

Output

polygon
{“type”:“Polygon”,“coordinates”:[[[42.4072265625,23.642578125],[42.451171875,23.642578125],[42.4951171875,23.642578125],[42.5390625,23.642578125],[42.5390625,23.686523437500004],[42.5390625,23.730468750000004],[42.5390625,23.7744140625],[42.4951171875,23.7744140625],[42.451171875,23.7744140625],[42.407226562499993,23.7744140625],[42.4072265625,23.73046875],[42.4072265625,23.6865234375],[42.4072265625,23.642578125]]]}

The following example returns true because of the invalid Geohash token input.

print invalid = isnull(geo_geohash_neighbors('a'))

Output

invalid
1

6.7 - geo_geohash_to_central_point()

Learn how to use the geo_geohash_to_central() function to calculate the geospatial coordinates that represent the center of a geohash rectangular area.

Calculates the geospatial coordinates that represent the center of a geohash rectangular area.

Read more about geohash.

Syntax

geo_geohash_to_central_point(geohash)

Parameters

NameTypeRequiredDescription
geohashstring✔️A geohash value as it was calculated by geo_point_to_geohash(). The geohash string must be between 1 and 18 characters.

Returns

The geospatial coordinate values in GeoJSON Format and of a dynamic data type. If the geohash is invalid, the query will produce a null result.

Examples

print point = geo_geohash_to_central_point("sunny")
| extend coordinates = point.coordinates
| extend longitude = coordinates[0], latitude = coordinates[1]

Output

pointcoordinateslongitudelatitude
{
“type”: “Point”,
“coordinates”: [
42.47314453125,
23.70849609375
]
}
[
42.47314453125,
23.70849609375
]
42.4731445312523.70849609375

The following example returns a null result because of the invalid geohash input.

print geohash = geo_geohash_to_central_point("a")

Output

geohash

You can use the geohash value to create a deep-link URL to Bing Maps by pointing to the geohash center point:

// Use string concatenation to create Bing Map deep-link URL from a geo-point
let point_to_map_url = (_point:dynamic, _title:string) 
{
    strcat('https://www.bing.com/maps?sp=point.', _point.coordinates[1] ,'_', _point.coordinates[0], '_', url_encode(_title)) 
};
// Convert geohash to center point, and then use 'point_to_map_url' to create Bing Map deep-link
let geohash_to_map_url = (_geohash:string, _title:string)
{
    point_to_map_url(geo_geohash_to_central_point(_geohash), _title)
};
print geohash = 'sv8wzvy7'
| extend url = geohash_to_map_url(geohash, "You are here")

Output

geohashurl
sv8wzvy7https://www.bing.com/maps?sp=point.32.15620994567871_34.80245590209961_You+are+here

6.8 - geo_geohash_to_polygon()

Learn how to use the geo_geohash_to_polygon() function to calculate the polygon that represents the geohash rectangular area.

Calculates the polygon that represents the geohash rectangular area.

Read more about geohash.

Syntax

geo_geohash_to_polygon(geohash)

Parameters

NameTypeRequiredDescription
geohashstring✔️A geohash value as it was calculated by geo_point_to_geohash(). The geohash string must be between 1 and 18 characters.

Returns

Polygon in GeoJSON Format and of a dynamic data type. If the geohash is invalid, the query will produce a null result.

Examples

print GeohashPolygon = geo_geohash_to_polygon("dr5ru");

Output

GeohashPolygon
{
“type”: “Polygon”,
“coordinates”: [
[[-74.00390625, 40.7373046875], [-73.9599609375, 40.7373046875], [-73.9599609375, 40.78125], [-74.00390625, 40.78125], [-74.00390625, 40.7373046875]]]
}

The following example assembles GeoJSON geometry collection of geohash polygons.

// Geohash GeoJSON collection
datatable(lng:real, lat:real)
[
    -73.975212, 40.789608,
    -73.916869, 40.818314,
    -73.989148, 40.743273,
]
| project geohash = geo_point_to_geohash(lng, lat, 5)
| project geohash_polygon = geo_geohash_to_polygon(geohash)
| summarize geohash_polygon_lst = make_list(geohash_polygon)
| project bag_pack(
    "type", "Feature",
    "geometry", bag_pack("type", "GeometryCollection", "geometries", geohash_polygon_lst),
    "properties", bag_pack("name", "Geohash polygons collection"))

Output

Column1
{
“type”: “Feature”,
“geometry”: {“type”: “GeometryCollection”,“geometries”: [
{“type”: “Polygon”, “coordinates”: [[[-74.00390625, 40.78125], [-73.9599609375, 40.78125], [-73.9599609375, 40.8251953125],[ -74.00390625, 40.8251953125], [ -74.00390625, 40.78125]]]},
{“type”: “Polygon”, “coordinates”: [[[ -73.9599609375, 40.78125], [-73.916015625, 40.78125], [-73.916015625, 40.8251953125], [-73.9599609375, 40.8251953125], [-73.9599609375, 40.78125]]]},
{“type”: “Polygon”, “coordinates”: [[[-74.00390625, 40.7373046875], [-73.9599609375, 40.7373046875], [-73.9599609375, 40.78125], [-74.00390625, 40.78125], [-74.00390625, 40.7373046875]]]}]
},
“properties”: {“name”: “Geohash polygons collection”
}}

The following example returns a null result because of the invalid geohash input.

print GeohashPolygon = geo_geohash_to_polygon("a");

Output

GeohashPolygon

6.9 - geo_h3cell_children()

Learn how to use the geo_h3cell_children() function to calculate the H3 cell children.

Calculates the H3 cell children.

Read more about H3 Cell.

Syntax

geo_h3cell_children(h3cell,resolution)

Parameters

NameTypeRequiredDescription
h3cellstring✔️An H3 Cell token value as it was calculated by geo_point_to_h3cell().
resolutionintDefines the requested children cells resolution. Supported values are in the range [1, 15]. If unspecified, an immediate children token will be calculated.

Returns

Array of H3 Cell children tokens. If the H3 Cell is invalid or child resolution is lower than given cell, the query will produce a null result.

Examples

print children = geo_h3cell_children('862a1072fffffff')

Output

children
[ “872a10728ffffff”, “872a10729ffffff”, “872a1072affffff”, “872a1072bffffff”, “872a1072cffffff”, “872a1072dffffff”, “872a1072effffff” ]

The following example counts children 3 levels below a given cell.

let h3_cell = '862a1072fffffff'; 
print children_count = array_length(geo_h3cell_children(h3_cell, geo_h3cell_level(h3_cell) + 3))

Output

children_count
343

The following example assembles GeoJSON geometry collection of H3 Cell children polygons.

print children = geo_h3cell_children('862a1072fffffff')
| mv-expand children to typeof(string)
| project child = geo_h3cell_to_polygon(children)
| summarize h3_hash_polygon_lst = make_list(child)
| project geojson = bag_pack(
    "type", "Feature",
    "geometry", bag_pack("type", "GeometryCollection", "geometries", h3_hash_polygon_lst),
    "properties", bag_pack("name", "H3 polygons collection"))

Output

geojson
{ “type”: “Feature”, “geometry”: { “type”: “GeometryCollection”, “geometries”: [ … … … ] }, “properties”: { “name”: “H3 polygons collection” }}

The following example returns true because of the invalid cell.

print is_null = isnull(geo_h3cell_children('abc'))

Output

is_null
1

The following example returns true because the level difference between cell and its children is more than 5.

print is_null = isnull(geo_h3cell_children(geo_point_to_h3cell(1, 1, 9), 15))

Output

is_null
1

6.10 - geo_h3cell_level()

Learn how to use the geo_h3cell_level() function to calculate the H3 cell resolution.

Calculates the H3 cell resolution.

Read more about H3 Cell.

Syntax

geo_h3cell_level(h3cell)

Parameters

NameTypeRequiredDescription
h3cellstring✔️An H3 Cell token value as it was calculated by geo_point_to_h3cell().

Returns

An integer that represents H3 Cell level. Valid level is in range [0, 15]. If the H3 Cell is invalid, the query will produce a null result.

Examples

print cell_res = geo_h3cell_level('862a1072fffffff')

Output

cell_res
6
print cell_res = geo_h3cell_level(geo_point_to_h3cell(1,1,10))

Output

cell_res
10

The following example returns true because of the invalid H3 Cell token input.

print invalid_res = isnull(geo_h3cell_level('abc'))

Output

invalid_res
1

6.11 - geo_h3cell_neighbors()

Learn how to use the geo_h3cell_neighbors() function to calculate the H3 cell neighbors.

Calculates the H3 cell neighbors.

Read more about H3 Cell.

Syntax

geo_h3cell_neighbors(h3cell)

Parameters

NameTypeRequiredDescription
h3cellstring✔️An H3 Cell token value as it was calculated by geo_point_to_h3cell().

Returns

An array of H3 cell neighbors. If the H3 Cell is invalid, the query will produce a null result.

Examples

The following example calculates H3 cell neighbors.

print neighbors = geo_h3cell_neighbors('862a1072fffffff')

Output

neighbors
[“862a10727ffffff”,“862a10707ffffff”,“862a1070fffffff”,“862a10777ffffff”,“862a100dfffffff”,“862a100d7ffffff”]

The following example calculates an array of input H3 cell with its neighbors.

let h3cell = '862a1072fffffff';
print cells = array_concat(pack_array(h3cell), geo_h3cell_neighbors(h3cell))

Output

cells
[“862a1072fffffff”,“862a10727ffffff”,“862a10707ffffff”,“862a1070fffffff”,“862a10777ffffff”,“862a100dfffffff”,“862a100d7ffffff”]

The following example calculates H3 cells polygons GeoJSON geometry collection.

let h3cell = '862a1072fffffff';
print cells = array_concat(pack_array(h3cell), geo_h3cell_neighbors(h3cell))
| mv-expand cells to typeof(string)
| project polygons = geo_h3cell_to_polygon(cells)
| summarize arr = make_list(polygons)
| project geojson = bag_pack("type", "Feature","geometry", bag_pack("type", "GeometryCollection", "geometries", arr), "properties", bag_pack("name", "polygons"))

Output

geojson
{“type”: “Feature”,“geometry”: {“type”: “GeometryCollection”,“geometries”: [
{“type”:“Polygon”,“coordinates”:[[[-74.0022744646159,40.735376026215022],[-74.046908029686236,40.727986222489115],[-74.060610712223664,40.696775140349033],[-74.029724408156682,40.672970047595463],[-73.985140983708192,40.680349049267583],[-73.971393761028622,40.71154393543933],[-74.0022744646159,40.735376026215022]]]},
{“type”:“Polygon”,“coordinates”:[[[-74.019448383546617,40.790439140236963],[-74.064132193843633,40.783038509825],[-74.077839665342211,40.751803958414136],[-74.046908029686236,40.727986222489115],[-74.0022744646159,40.735376026215022],[-73.988522328408948,40.766594382212254],[-74.019448383546617,40.790439140236963]]]},
{“type”:“Polygon”,“coordinates”:[[[-74.077839665342211,40.751803958414136],[-74.1224794808745,40.744383587828388],[-74.1361375042681,40.713156370029125],[-74.1052004095288,40.689365648097258],[-74.060610712223664,40.696775140349033],[-74.046908029686236,40.727986222489115],[-74.077839665342211,40.751803958414136]]]},
{“type”:“Polygon”,“coordinates”:[[[-74.060610712223664,40.696775140349033],[-74.1052004095288,40.689365648097258],[-74.118853750491638,40.658161927046628],[-74.0879619670209,40.634383824229609],[-74.043422283844933,40.641782462872115],[-74.029724408156682,40.672970047595463],[-74.060610712223664,40.696775140349033]]]},
{“type”:“Polygon”,“coordinates”:[[[-73.985140983708192,40.680349049267583],[-74.029724408156682,40.672970047595463],[-74.043422283844933,40.641782462872115],[-74.012581189358343,40.617990065981623],[-73.968047801220749,40.625358290164748],[-73.954305509472675,40.656529678451555],[-73.985140983708192,40.680349049267583]]]},
{“type”:“Polygon”,“coordinates”:[[[-73.926766604813565,40.718903205013063],[-73.971393761028622,40.71154393543933],[-73.985140983708192,40.680349049267583],[-73.954305509472675,40.656529678451555],[-73.909728515658443,40.663878222244435],[-73.895936872069854,40.69505685239637],[-73.926766604813565,40.718903205013063]]]},
{“type”:“Polygon”,“coordinates”:[[[-73.943844904976629,40.773964402038523],[-73.988522328408948,40.766594382212254],[-74.0022744646159,40.735376026215022],[-73.971393761028622,40.71154393543933],[-73.926766604813565,40.718903205013063],[-73.912969923470314,40.750105305345329],[-73.943844904976629,40.773964402038523]]]}]},
“properties”: {“name”: “polygons”}}

The following example calculates polygon unions that represent H3 cell and its neighbors.

let h3cell = '862a1072fffffff';
print cells = array_concat(pack_array(h3cell), geo_h3cell_neighbors(h3cell))
| mv-expand cells to typeof(string)
| project polygons = geo_h3cell_to_polygon(cells)
| summarize arr = make_list(polygons)
| project polygon = geo_union_polygons_array(arr)

Output

polygon
{
“type”: “Polygon”,
“coordinates”: [[[ -73.926766604813565, 40.718903205013063],[ -73.912969923470314, 40.750105305345329],[ -73.943844904976629, 40.773964402038523],[ -73.988522328408948, 40.766594382212254],[ -74.019448383546617, 40.79043914023697],[ -74.064132193843633, 40.783038509825005],[ -74.077839665342211, 40.751803958414136],[ -74.1224794808745, 40.744383587828388],[ -74.1361375042681, 40.713156370029125],[ -74.1052004095288, 40.689365648097251],[ -74.118853750491638, 40.658161927046628],[ -74.0879619670209, 40.6343838242296],[ -74.043422283844933, 40.641782462872115],[ -74.012581189358343, 40.617990065981623],[ -73.968047801220749, 40.625358290164755],[ -73.954305509472675, 40.656529678451555],[ -73.909728515658443, 40.663878222244442],[ -73.895936872069854, 40.695056852396377],[ -73.926766604813565, 40.718903205013063]]]}

The following example returns true because of the invalid H3 Cell token input.

print invalid = isnull(geo_h3cell_neighbors('abc'))

Output

invalid
1

6.12 - geo_h3cell_parent()

Learn how to use the geo_h3cell_parent() function to calculate the H3 cell parent.

Calculates the H3 cell parent.

Read more about H3 Cell.

Syntax

geo_h3cell_parent(h3cell,resolution)

Parameters

NameTypeRequiredDescription
h3cellstring✔️An H3 Cell token value as it was calculated by geo_point_to_h3cell().
resolutionintDefines the requested children cells resolution. Supported values are in the range [0, 14]. If unspecified, an immediate children token will be calculated.

Returns

H3 Cell parent token string. If the H3 Cell is invalid or parent resolution is higher than given cell, the query will produce an empty result.

Examples

print parent_cell = geo_h3cell_parent('862a1072fffffff')

Output

parent_cell
852a1073fffffff

The following example calculates cell parent at level 1.

print parent_cell = geo_h3cell_parent('862a1072fffffff', 1)

Output

parent_cell
812a3ffffffffff
print parent_res = geo_h3cell_level(geo_h3cell_parent((geo_point_to_h3cell(1,1,10))))

Output

parent_res
9
print parent_res = geo_h3cell_level(geo_h3cell_parent(geo_point_to_h3cell(1,1,10), 3))

Output

parent_res
3

The following example produces an empty result because of the invalid cell input.

print invalid = isempty(geo_h3cell_parent('123'))

Output

invalid
1

The following example produces an empty result because of the invalid parent resolution.

print invalid = isempty(geo_h3cell_parent('862a1072fffffff', 100))

Output

invalid
1

The following example produces an empty result because parent can’t be of a higher resolution than child.

print invalid = isempty(geo_h3cell_parent('862a1072fffffff', 15))

Output

invalid
1

6.13 - geo_h3cell_rings()

Learn how to use the geo_h3cell_rings() function to calculate the H3 cell rings.

Calculates the H3 cell Rings.

Read more about H3 Cell.

Syntax

geo_h3cell_rings(h3cell,distance)

Parameters

NameTypeRequiredDescription
h3cellstring✔️An H3 Cell token value as it was calculated by geo_point_to_h3cell().
distanceint✔️Defines the maximum ring distance from given cell. Valid distance is in range [0, 142].

Returns

An ordered array of ring arrays where first ring contains the original cell, second ring contains neighboring cells, and so on. If either the H3 Cell or distance is invalid, the query produces a null result.

Examples

The following example produces rings up to distance 2.

print rings = geo_h3cell_rings('861f8894fffffff', 2)

Output

rings
[
[“861f8894fffffff”],
[“861f88947ffffff”,“861f8895fffffff”,“861f88867ffffff”,“861f8d497ffffff”,“861f8d4b7ffffff”,“861f8896fffffff”],
[“861f88967ffffff”,“861f88977ffffff”,“861f88957ffffff”,“861f8882fffffff”,“861f88877ffffff”,“861f88847ffffff”,“861f8886fffffff”,“861f8d49fffffff”,“861f8d487ffffff”,“861f8d4a7ffffff”,“861f8d59fffffff”,“861f8d597ffffff”]
]

The following example produces all cells at level 1 (all neighbors).

print neighbors = geo_h3cell_rings('861f8894fffffff', 1)[1]

Output

neighbors
[“861f88947ffffff”, “861f8895fffffff”, “861f88867ffffff”, “861f8d497ffffff”, “861f8d4b7ffffff”,“861f8896fffffff”]

The following example produces list of cells from all rings.

print rings = geo_h3cell_rings('861f8894fffffff', 1)
| mv-apply rings on 
(
  summarize cells = make_list(rings)
)

Output

cells
[“861f8894fffffff”,“861f88947ffffff”,“861f8895fffffff”,“861f88867ffffff”,“861f8d497ffffff”,“861f8d4b7ffffff”,“861f8896fffffff”]

The following example assembles GeoJSON geometry collection of all cells.

print rings = geo_h3cell_rings('861f8894fffffff', 1)
| mv-apply rings on 
(
  summarize make_list(rings)
)
| mv-expand list_rings to typeof(string)
| project polygon = geo_h3cell_to_polygon(list_rings)
| summarize polygon_lst = make_list(polygon)
| project geojson = bag_pack(
    "type", "Feature",
    "geometry", bag_pack("type", "GeometryCollection", "geometries", polygon_lst),
    "properties", bag_pack("name", "H3 polygons collection"))

Output

geojson
{ “type”: “Feature”, “geometry”: { “type”: “GeometryCollection”, “geometries”: [ … … … ]}, “properties”: { “name”: “H3 polygons collection” }}

The following example returns true because of the invalid cell.

print is_null = isnull(geo_h3cell_rings('abc', 3))

Output

is_null
1

The following example returns true because of the invalid distance.

print is_null = isnull(geo_h3cell_rings('861f8894fffffff', 150))

Output

is_null
1

6.14 - geo_h3cell_to_central_point()

Learn how to use the geo_h3cell_to_central_point() function to calculate the geospatial coordinates that represent the center of an H3 cell.

Calculates the geospatial coordinates that represent the center of an H3 Cell.

Read more about H3 Cell.

Syntax

geo_h3cell_to_central_point(h3cell)

Parameters

NameTypeRequiredDescription
h3cellstring✔️An H3 Cell token value as it was calculated by geo_point_to_h3cell().

Returns

The geospatial coordinate values in GeoJSON Format and of a dynamic data type. If the H3 cell token is invalid, the query will produce a null result.

Examples

print h3cell = geo_h3cell_to_central_point("862a1072fffffff")

Output

h3cell
{
“type”: “Point”,
“coordinates”: [-74.016008479792447, 40.7041679083504]
}

The following example returns the longitude of the H3 Cell center point:

print longitude = geo_h3cell_to_central_point("862a1072fffffff").coordinates[0]

Output

longitude
-74.0160084797924

The following example returns a null result because of the invalid H3 cell token input.

print h3cell = geo_h3cell_to_central_point("1")

Output

h3cell

6.15 - geo_h3cell_to_polygon()

Learn how to use the geo_h3cell_to_polygon() function to calculate the polygon that represents the H3 Cell rectangular area.

Calculates the polygon that represents the H3 Cell rectangular area.

Read more about H3 Cell.

Syntax

geo_h3cell_to_polygon(h3cell)

Parameters

NameTypeRequiredDescription
h3cellstring✔️An H3 Cell token value as it was calculated by geo_point_to_h3cell().

Returns

Polygon in GeoJSON Format and of a dynamic data type. If the H3 Cell is invalid, the query will produce a null result.

Examples

print geo_h3cell_to_polygon("862a1072fffffff")

Output

print_0
{
“type”: “Polygon”,
“coordinates”: [[[-74.0022744646159, 40.735376026215022], [-74.046908029686236, 40.727986222489115], [-74.060610712223664, 40.696775140349033],[ -74.029724408156682, 40.672970047595463], [-73.985140983708192, 40.680349049267583],[ -73.971393761028622, 40.71154393543933], [-74.0022744646159, 40.735376026215022]]]
}

The following example assembles GeoJSON geometry collection of H3 Cell polygons.

// H3 cell GeoJSON collection
datatable(lng:real, lat:real)
[
    -73.956683, 40.807907,
    -73.916869, 40.818314,
    -73.989148, 40.743273,
]
| project h3_hash = geo_point_to_h3cell(lng, lat, 6)
| project h3_hash_polygon = geo_h3cell_to_polygon(h3_hash)
| summarize h3_hash_polygon_lst = make_list(h3_hash_polygon)
| project bag_pack(
    "type", "Feature",
    "geometry", bag_pack("type", "GeometryCollection", "geometries", h3_hash_polygon_lst),
    "properties", bag_pack("name", "H3 polygons collection"))

Output

Column1
{
“type”: “Feature”,
“geometry”: {“type”: “GeometryCollection”, “geometries”: [{“type”: “Polygon”,“coordinates”: [[[-73.9609635556213, 40.829061732419916], [-74.005691351383675, 40.821680937801922], [-74.019448383546617, 40.790439140236963], [-73.988522328408948, 40.766594382212254], [-73.943844904976629, 40.773964402038523], [-73.930043202964953, 40.805189944379514], [-73.9609635556213, 40.829061732419916]]]},
{“type”: “Polygon”, “coordinates”: [[[-73.902385078754875, 40.867671551513595], [-73.94715685019348, 40.860310688399885], [-73.9609635556213, 40.829061732419916], [-73.930043202964953, 40.805189944379514], [-73.885321931061725, 40.812540084842404 ], [-73.871470551071766, 40.843772725733125], [ -73.902385078754875, 40.867671551513595]]]},
{“type”: “Polygon”,“coordinates”: [[[-73.943844904976629, 40.773964402038523], [-73.988522328408948, 40.766594382212254], [-74.0022744646159, 40.735376026215022], [-73.971393761028622, 40.71154393543933], [-73.926766604813565, 40.718903205013063], [ -73.912969923470314, 40.750105305345329 ], [-73.943844904976629, 40.773964402038523]]]}]
},
“properties”: {“name”: “H3 polygons collection”}
}

The following example returns a null result because of the invalid H3 Cell token input.

print geo_h3cell_to_polygon("@")

Output

print_0

6.16 - geo_intersection_2lines()

Learn how to use the geo_intersection_2lines() function to calculate the intersection of two line strings or multiline strings.

Calculates the intersection of two lines or multilines.

Syntax

geo_intersection_2lines(lineString1,lineString2)

Parameters

NameTypeRequiredDescription
lineString1dynamic✔️A line or multiline in the GeoJSON format.
lineString2dynamic✔️A line or multiline in the GeoJSON format.

Returns

Intersection in GeoJSON Format and of a dynamic data type. If LineString or a MultiLineString are invalid, the query will produce a null result.

LineString definition and constraints

dynamic({“type”: “LineString”,“coordinates”: [[lng_1,lat_1], [lng_2,lat_2],…, [lng_N,lat_N]]})

dynamic({“type”: “MultiLineString”,“coordinates”: [[line_1, line_2,…, line_N]]})

  • LineString coordinates array must contain at least two entries.
  • Coordinates [longitude, latitude] must be valid where longitude is a real number in the range [-180, +180] and latitude is a real number in the range [-90, +90].
  • Edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.

Examples

The following example calculates intersection between two lines. In this case, the result is a point.

let lineString1 = dynamic({"type":"LineString","coordinates":[[-73.978929,40.785155],[-73.980903,40.782621]]});
let lineString2 = dynamic({"type":"LineString","coordinates":[[-73.985195,40.788275],[-73.974552,40.779761]]});
print intersection = geo_intersection_2lines(lineString1, lineString2)

Output

intersection
{“type”: “Point”,“coordinates”: [-73.979837116670978,40.783989289772165]}

The following example calculates intersection between two lines. In this case, the result is a line.

let line = dynamic({"type":"LineString","coordinates":[[-73.978929,40.785155],[-73.980903,40.782621]]});
print intersection = geo_intersection_2lines(line, line)

Output

intersection
{“type”: “LineString”,“coordinates”: [[ -73.978929, 40.785155],[ -73.980903, 40.782621]]}

The following two lines don’t intersect.

let lineString1 = dynamic({"type":"LineString","coordinates":[[1, 1],[2, 2]]});
let lineString2 = dynamic({"type":"LineString","coordinates":[[3, 3],[4, 4]]});
print intersection = geo_intersection_2lines(lineString1, lineString2)

Output

intersection
{“type”: “GeometryCollection”, “geometries”: []}

The following example will return a null result because one of lines is invalid.

let lineString1 = dynamic({"type":"LineString","coordinates":[[1, 1],[2, 2]]});
let lineString2 = dynamic({"type":"LineString","coordinates":[[3, 3]]});
print invalid = isnull(geo_intersection_2lines(lineString1, lineString2))

Output

invalid
1

6.17 - geo_intersection_2polygons()

Learn how to use the geo_intersection_2polygons() function to calculate the intersection of two polygons or multipolygons.

Calculates the intersection of two polygons or multipolygons.

Syntax

geo_intersection_2polygons(polygon1,polygon1)

Parameters

NameTypeRequiredDescription
polygon1dynamic✔️Polygon or multipolygon in the GeoJSON format.
polygon2dynamic✔️Polygon or multipolygon in the GeoJSON format.

Returns

Intersection in GeoJSON Format and of a dynamic data type. If Polygon or a MultiPolygon are invalid, the query will produce a null result.

Polygon definition and constraints

dynamic({“type”: “Polygon”,“coordinates”: [LinearRingShell, LinearRingHole_1, …, LinearRingHole_N ]})

dynamic({“type”: “MultiPolygon”,“coordinates”: [[LinearRingShell, LinearRingHole_1, …, LinearRingHole_N ],…, [LinearRingShell, LinearRingHole_1, …, LinearRingHole_M]]})

  • LinearRingShell is required and defined as a counterclockwise ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be only one shell.
  • LinearRingHole is optional and defined as a clockwise ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be any number of interior rings and holes.
  • LinearRing vertices must be distinct with at least three coordinates. The first coordinate must be equal to the last. At least four entries are required.
  • Coordinates [longitude, latitude] must be valid. Longitude must be a real number in the range [-180, +180] and latitude must be a real number in the range [-90, +90].
  • LinearRingShell encloses at most half of the sphere. LinearRing divides the sphere into two regions. The smaller of the two regions will be chosen.
  • LinearRing edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.
  • LinearRings must not cross and must not share edges. LinearRings may share vertices.
  • Polygon contains its vertices.

Examples

The following example calculates intersection between two polygons. In this case, the result is a polygon.

let polygon1 = dynamic({"type":"Polygon","coordinates":[[[-73.9630937576294,40.77498840732385],[-73.963565826416,40.774383111780914],[-73.96205306053162,40.773745311181585],[-73.96160781383514,40.7743912365898],[-73.9630937576294,40.77498840732385]]]});
let polygon2 = dynamic({"type":"Polygon","coordinates":[[[-73.96213352680206,40.775045280447145],[-73.9631313085556,40.774578106920345],[-73.96207988262177,40.77416780398293],[-73.96213352680206,40.775045280447145]]]});
print intersection = geo_intersection_2polygons(polygon1, polygon2)

Output

intersection
{“type”: “Polygon”, “coordinates”: [[[-73.962105776437156,40.774591360999679],[-73.962642403166868,40.774807020251778],[-73.9631313085556,40.774578106920352],[-73.962079882621765,40.774167803982927],[-73.962105776437156,40.774591360999679]]]}

The following example calculates intersection between two polygons. In this case, the result is a point.

let polygon1 = dynamic({"type":"Polygon","coordinates":[[[2,45],[0,45],[1,44],[2,45]]]});
let polygon2 = dynamic({"type":"Polygon","coordinates":[[[3,44],[2,45],[2,43],[3,44]]]});
print intersection = geo_intersection_2polygons(polygon1, polygon2)

Output

intersection
{“type”: “Point”,“coordinates”: [2,45]}

The following two polygons intersection is a collection.

let polygon1 = dynamic({"type":"Polygon","coordinates":[[[2,45],[0,45],[1,44],[2,45]]]});
let polygon2 = dynamic({"type":"MultiPolygon","coordinates":[[[[3,44],[2,45],[2,43],[3,44]]],[[[1.192,45.265],[1.005,44.943],[1.356,44.937],[1.192,45.265]]]]});
print intersection = geo_intersection_2polygons(polygon1, polygon2)

Output

intersection
{“type”: “GeometryCollection”,“geometries”: [
{ “type”: “Point”, “coordinates”: [2, 45]},
{ “type”: “Polygon”, “coordinates”: [[[1.3227075526410679,45.003909145068739],[1.0404565374899824,45.004356403066552],[1.005,44.943],[1.356,44.937],[1.3227075526410679,45.003909145068739]]]}]}

The following two polygons don’t intersect.

let polygon1 = dynamic({"type":"Polygon","coordinates":[[[2,45],[0,45],[1,44],[2,45]]]});
let polygon2 = dynamic({"type":"Polygon","coordinates":[[[3,44],[3,45],[2,43],[3,44]]]});
print intersection = geo_intersection_2polygons(polygon1, polygon2)

Output

intersection
{“type”: “GeometryCollection”, “geometries”: []}

The following example finds all counties in USA that intersect with area of interest polygon.

let area_of_interest = dynamic({"type":"Polygon","coordinates":[[[-73.96213352680206,40.775045280447145],[-73.9631313085556,40.774578106920345],[-73.96207988262177,40.77416780398293],[-73.96213352680206,40.775045280447145]]]});
US_Counties
| project name = features.properties.NAME, county = features.geometry
| project name, intersection = geo_intersection_2polygons(county, area_of_interest)
| where array_length(intersection.geometries) != 0

Output

nameintersection
New York{“type”: “Polygon”,“coordinates”: [[[-73.96213352680206, 40.775045280447145], [-73.9631313085556, 40.774578106920345], [-73.96207988262177,40.77416780398293],[-73.96213352680206, 40.775045280447145]]]}

The following example will return a null result because one of the polygons is invalid.

let central_park_polygon = dynamic({"type":"Polygon","coordinates":[[[-73.9495,40.7969],[-73.95807266235352,40.80068603561921],[-73.98201942443848,40.76825672305777],[-73.97317886352539,40.76455136505513],[-73.9495,40.7969]]]});
let invalid_polygon = dynamic({"type":"Polygon"});
print isnull(geo_intersection_2polygons(invalid_polygon, central_park_polygon))

Output

print_0
1

6.18 - geo_intersection_line_with_polygon()

Learn how to use the geo_intersection_line_with_polygon() function to calculate the intersection of a line string or a multiline string with a polygon or a multipolygon.

Calculates the intersection of a line or a multiline with a polygon or a multipolygon.

Syntax

geo_intersection_line_with_polygon(lineString,polygon)

Parameters

NameTypeRequiredDescription
lineStringdynamic✔️A LineString or MultiLineString in the GeoJSON format.
polygondynamic✔️A Polygon or MultiPolygon in the GeoJSON format.

Returns

Intersection in GeoJSON Format and of a dynamic data type. If lineString or a multiLineString or a polygon or a multipolygon are invalid, the query will produce a null result.

LineString definition and constraints

dynamic({“type”: “LineString”,“coordinates”: [[lng_1,lat_1], [lng_2,lat_2], …, [lng_N,lat_N]]})

dynamic({“type”: “MultiLineString”,“coordinates”: [[line_1, line_2, …, line_N]]})

  • LineString coordinates array must contain at least two entries.
  • Coordinates [longitude, latitude] must be valid where longitude is a real number in the range [-180, +180] and latitude is a real number in the range [-90, +90].
  • Edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.

Polygon definition and constraints

dynamic({“type”: “Polygon”,“coordinates”: [LinearRingShell, LinearRingHole_1, …, LinearRingHole_N]})

dynamic({“type”: “MultiPolygon”,“coordinates”: [[LinearRingShell, LinearRingHole_1, …, LinearRingHole_N],…, [LinearRingShell, LinearRingHole_1, …, LinearRingHole_M]]})

  • LinearRingShell is required and defined as a counterclockwise ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be only one shell.
  • LinearRingHole is optional and defined as a clockwise ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be any number of interior rings and holes.
  • LinearRing vertices must be distinct with at least three coordinates. The first coordinate must be equal to the last. At least four entries are required.
  • Coordinates [longitude, latitude] must be valid. Longitude must be a real number in the range [-180, +180] and latitude must be a real number in the range [-90, +90].
  • LinearRingShell encloses at most half of the sphere. LinearRing divides the sphere into two regions. The smaller of the two regions will be chosen.
  • LinearRing edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.
  • LinearRings must not cross and must not share edges. LinearRings may share vertices.
  • Polygon contains its vertices.

Examples

The following example calculates intersection between line and polygon. In this case, the result is a line.

let lineString = dynamic({"type":"LineString","coordinates":[[-73.985195,40.788275],[-73.974552,40.779761]]});
let polygon = dynamic({"type":"Polygon","coordinates":[[[-73.9712905883789,40.78580561168767],[-73.98004531860352,40.775276834803655],[-73.97000312805176,40.77852663535664],[-73.9712905883789,40.78580561168767]]]});
print intersection = geo_intersection_line_with_polygon(lineString, polygon)

Output

intersection
{“type”: “LineString”,“coordinates”: [[-73.975611956578192,40.78060906714618],[-73.974552,40.779761]]}

The following example calculates intersection between line and polygon. In this case, the result is a multiline.

let lineString = dynamic({"type":"LineString","coordinates":[[-110.522, 39.198],[-91.428, 40.880]]});
let polygon = dynamic({"type":"Polygon","coordinates":[[[-90.263,36.738],[-102.041,45.274],[-109.335,36.527],[-90.263,36.738]],[[-100.393,41.705],[-103.139,38.925],[-97.558,39.113],[-100.393,41.705]]]});
print intersection = geo_intersection_line_with_polygon(lineString, polygon)

Output

intersection
{“type”: “MultiLineString”,“coordinates”: [[[ -106.89353655881905, 39.769226209776306],[ -101.74448553679453, 40.373506008712525]],[[-99.136499431328858, 40.589336512699994],[-95.284527737311791, 40.799060242246348]]]}

The following line and polygon don’t intersect.

let lineString = dynamic({"type":"LineString","coordinates":[[1, 1],[2, 2]]});
let polygon = dynamic({"type":"Polygon","coordinates":[[[-73.9712905883789,40.78580561168767],[-73.98004531860352,40.775276834803655],[-73.97000312805176,40.77852663535664],[-73.9712905883789,40.78580561168767]]]});
print intersection = geo_intersection_line_with_polygon(lineString, polygon)

Output

intersection
{“type”: “GeometryCollection”,“geometries”: []}

The following example finds all roads in the NYC GeoJSON roads table that intersects with the area of interest literal polygon.

let area_of_interest = dynamic({"type":"Polygon","coordinates":[[[-73.95768642425537,40.80065354924362],[-73.9582872390747,40.80089719667298],[-73.95869493484497,40.80050736035672],[-73.9580512046814,40.80019873831593],[-73.95768642425537,40.80065354924362]]]});
NY_Manhattan_Roads
| project name = features.properties.Label, road = features.geometry
| project name, intersection = geo_intersection_line_with_polygon(road, area_of_interest)
| where array_length(intersection.geometries) != 0

Output

nameintersection
CentralParkW{“type”:“MultiLineString”,“coordinates”:[[[-73.958295846836933,40.800316027289647],[-73.9582724,40.8003415]],[[-73.958413422194482,40.80037239620097],[-73.9584093,40.8003797]]]}
FrederickDouglassCir{“type”:“LineString”,“coordinates”:[[-73.9579272943862,40.800751229494182],[-73.9579019,40.8007238],[-73.9578688,40.8006749],[-73.9578508,40.8006203],[-73.9578459,40.800570199999996],[-73.9578484,40.80053310000001],[-73.9578627,40.800486700000008],[-73.957913,40.800421100000008],[-73.9579668,40.8003923],[-73.9580189,40.80037260000001],[-73.9580543,40.8003616],[-73.9581237,40.8003395],[-73.9581778,40.8003365],[-73.9582724,40.8003415],[-73.958308,40.8003466],[-73.9583328,40.8003517],[-73.9583757,40.8003645],[-73.9584093,40.8003797],[-73.9584535,40.80041099999999],[-73.9584818,40.8004536],[-73.958507000000012,40.8004955],[-73.9585217,40.800562400000004],[-73.9585282,40.8006155],[-73.958416200000016,40.8007325],[-73.9583541,40.8007785],[-73.9582772,40.800811499999995],[-73.9582151,40.8008285],[-73.958145918999392,40.800839887820239]]}
W110thSt{“type”:“MultiLineString”,“coordinates”:[[[-73.957828446036331,40.800476476316327],[-73.9578627,40.800486700000008]],[[-73.9585282,40.8006155],[-73.958565492035873,40.800631133466972]],[[-73.958416200000016,40.8007325],[-73.958446850928084,40.800744577466617]]]}
WestDr{“type”:“LineString”,“coordinates”:[[-73.9580543,40.8003616],[-73.958009693938735,40.800250494588468]]}

The following example finds all counties in the USA that intersect with area of interest literal LineString.

let area_of_interest = dynamic({"type":"LineString","coordinates":[[-73.97159099578857,40.794513338780895],[-73.96738529205322,40.792758888618756],[-73.96978855133057,40.789769718601505]]});
US_Counties
| project name = features.properties.NAME, county = features.geometry
| project name, intersection = geo_intersection_line_with_polygon(area_of_interest, county)
| where array_length(intersection.geometries) != 0

Output

nameintersection
New York{“type”: “LineString”,“coordinates”: [[-73.971590995788574, 40.794513338780895], [-73.967385292053223, 40.792758888618756],[-73.969788551330566, 40.789769718601512]]}

The following example will return a null result because the LineString is invalid.

let lineString = dynamic({"type":"LineString","coordinates":[[-73.985195,40.788275]]});
let polygon = dynamic({"type":"Polygon","coordinates":[[[-73.95768642425537,40.80065354924362],[-73.9582872390747,40.80089719667298],[-73.95869493484497,40.80050736035672],[-73.9580512046814,40.80019873831593],[-73.95768642425537,40.80065354924362]]]});
print is_invalid = isnull(geo_intersection_2lines(lineString, polygon))

Output

is_invalid
1

The following example will return a null result because the polygon is invalid.

let lineString = dynamic({"type":"LineString","coordinates":[[-73.97159099578857,40.794513338780895],[-73.96738529205322,40.792758888618756],[-73.96978855133057,40.789769718601505]]});
let polygon = dynamic({"type":"Polygon","coordinates":[]});
print is_invalid = isnull(geo_intersection_2lines(lineString, polygon))

Output

is_invalid
1

6.19 - geo_intersects_2lines()

Learn how to use the geo_intersects_2lines() function to check if two line strings or multiline strings intersect.

Calculates whether two lines or multilines intersect.

Syntax

geo_intersects_2lines(lineString1,lineString2)

Parameters

NameTypeRequiredDescription
lineString1dynamic✔️A line or multiline in the GeoJSON format.
lineString2dynamic✔️A line or multiline in the GeoJSON format.

Returns

Indicates whether two lines or multilines intersect. If lineString or a multiLineString are invalid, the query will produce a null result.

LineString definition and constraints

dynamic({“type”: “LineString”,“coordinates”: [[lng_1,lat_1], [lng_2,lat_2], …, [lng_N,lat_N]]})

dynamic({“type”: “MultiLineString”,“coordinates”: [[line_1, line_2, …, line_N]]})

  • LineString coordinates array must contain at least two entries.
  • Coordinates [longitude, latitude] must be valid where longitude is a real number in the range [-180, +180] and latitude is a real number in the range [-90, +90].
  • Edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.

Examples

The following example checks whether some two literal lines intersects.

let lineString1 = dynamic({"type":"LineString","coordinates":[[-73.978929,40.785155],[-73.980903,40.782621]]});
let lineString2 = dynamic({"type":"LineString","coordinates":[[-73.985195,40.788275],[-73.974552,40.779761]]});
print intersects = geo_intersects_2lines(lineString1, lineString2)

Output

intersects
True

The following example finds all roads in the NYC GeoJSON roads table that intersects with some lines of interest.

let my_road = dynamic({"type":"LineString","coordinates":[[-73.97892951965332,40.78515573551921],[-73.98090362548828,40.78262115769851]]});
NY_Manhattan_Roads
| project name = features.properties.Label, road = features.geometry
| where geo_intersects_2lines(road, my_road)
| project name

Output

name
Broadway
W 78th St
W 79th St
W 80th St
W 81st St

The following example will return a null result because one of lines is invalid.

let lineString1 = dynamic({"type":"LineString","coordinates":[[-73.978929,40.785155],[-73.980903,40.782621]]});
let lineString2 = dynamic({"type":"LineString","coordinates":[[-73.985195,40.788275]]});
print isnull(geo_intersects_2lines(lineString1, lineString2))

Output

print_0
True

6.20 - geo_intersects_2polygons()

Learn how to use the geo_intersects_2polygons() function to calculate whether two polygons or multipolygons intersect

Calculates whether two polygons or multipolygons intersect.

Syntax

geo_intersects_2polygons(polygon1,polygon1)

Parameters

NameTypeRequiredDescription
polygon1dynamic✔️Polygon or multipolygon in the GeoJSON format.
polygon2dynamic✔️Polygon or multipolygon in the GeoJSON format.

Returns

Indicates whether two polygons or multipolygons intersect. If the Polygon or the MultiPolygon are invalid, the query will produce a null result.

Polygon definition and constraints

dynamic({“type”: “Polygon”,“coordinates”: [LinearRingShell, LinearRingHole_1, …, LinearRingHole_N]})

dynamic({“type”: “MultiPolygon”,“coordinates”: [[LinearRingShell, LinearRingHole_1, …, LinearRingHole_N], …, [LinearRingShell, LinearRingHole_1, …, LinearRingHole_M]]})

  • LinearRingShell is required and defined as a counterclockwise ordered array of coordinates [[lng_1,lat_1], …, [lng_i,lat_i], …,[lng_j,lat_j], …,[lng_1,lat_1]]. There can be only one shell.
  • LinearRingHole is optional and defined as a clockwise ordered array of coordinates [[lng_1,lat_1], …,[lng_i,lat_i], …,[lng_j,lat_j], …,[lng_1,lat_1]]. There can be any number of interior rings and holes.
  • LinearRing vertices must be distinct with at least three coordinates. The first coordinate must be equal to the last. At least four entries are required.
  • Coordinates [longitude, latitude] must be valid. Longitude must be a real number in the range [-180, +180] and latitude must be a real number in the range [-90, +90].
  • LinearRingShell encloses at most half of the sphere. LinearRing divides the sphere into two regions. The smaller of the two regions will be chosen.
  • LinearRing edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.
  • LinearRings must not cross and must not share edges. LinearRings may share vertices.
  • Polygon contains its vertices.

Examples

The following example checks whether some two literal polygons intersects.

let polygon1 = dynamic({"type":"Polygon","coordinates":[[[-73.9630937576294,40.77498840732385],[-73.963565826416,40.774383111780914],[-73.96205306053162,40.773745311181585],[-73.96160781383514,40.7743912365898],[-73.9630937576294,40.77498840732385]]]});
let polygon2 = dynamic({"type":"Polygon","coordinates":[[[-73.96213352680206,40.775045280447145],[-73.9631313085556,40.774578106920345],[-73.96207988262177,40.77416780398293],[-73.96213352680206,40.775045280447145]]]});
print geo_intersects_2polygons(polygon1, polygon2)

Output

print_0
True

The following example finds all counties in the USA that intersect with area of interest literal polygon.

let area_of_interest = dynamic({"type":"Polygon","coordinates":[[[-73.96213352680206,40.775045280447145],[-73.9631313085556,40.774578106920345],[-73.96207988262177,40.77416780398293],[-73.96213352680206,40.775045280447145]]]});
US_Counties
| project name = features.properties.NAME, county = features.geometry
| where geo_intersects_2polygons(county, area_of_interest)
| project name

Output

name
New York

The following example will return a null result because one of the polygons is invalid.

let central_park_polygon = dynamic({"type":"Polygon","coordinates":[[[-73.9495,40.7969],[-73.95807266235352,40.80068603561921],[-73.98201942443848,40.76825672305777],[-73.97317886352539,40.76455136505513],[-73.9495,40.7969]]]});
let invalid_polygon = dynamic({"type":"Polygon"});
print isnull(geo_intersects_2polygons(invalid_polygon, central_park_polygon))

Output

print_0
True

6.21 - geo_intersects_line_with_polygon()

Learn how to use the geo_intersects_line_with_polygon() function to check if a line string or a multiline string intersect with a polygon or a multipolygon.

Calculates whether a line or multiline intersect with a polygon or a multipolygon.

Syntax

geo_intersects_line_with_polygon(lineString,polygon)

Parameters

NameTypeRequiredDescription
lineStringdynamic✔️A LineString or MultiLineString in the GeoJSON format.
polygondynamic✔️A Polygon or MultiPolygon in the GeoJSON format.

Returns

Indicates whether the line or multiline intersects with polygon or a multipolygon. If lineString or a multiLineString or a polygon or a multipolygon are invalid, the query will produce a null result.

LineString definition and constraints

dynamic({“type”: “LineString”,“coordinates”: [[lng_1,lat_1], [lng_2,lat_2], …, [lng_N,lat_N]]})

dynamic({“type”: “MultiLineString”,“coordinates”: [[line_1, line_2, …, line_N]]})

  • LineString coordinates array must contain at least two entries.
  • Coordinates [longitude, latitude] must be valid where longitude is a real number in the range [-180, +180] and latitude is a real number in the range [-90, +90].
  • Edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.

Polygon definition and constraints

dynamic({“type”: “Polygon”,“coordinates”: [ LinearRingShell, LinearRingHole_1, …, LinearRingHole_N]})

dynamic({“type”: “MultiPolygon”,“coordinates”: [[LinearRingShell, LinearRingHole_1, …, LinearRingHole_N], …, [LinearRingShell, LinearRingHole_1, …, LinearRingHole_M]]})

  • LinearRingShell is required and defined as a counterclockwise ordered array of coordinates [[lng_1,lat_1], …,[lng_i,lat_i], …,[lng_j,lat_j], …,[lng_1,lat_1]]. There can be only one shell.
  • LinearRingHole is optional and defined as a clockwise ordered array of coordinates [[lng_1,lat_1], …,[lng_i,lat_i], …,[lng_j,lat_j], …,[lng_1,lat_1]]. There can be any number of interior rings and holes.
  • LinearRing vertices must be distinct with at least three coordinates. The first coordinate must be equal to the last. At least four entries are required.
  • Coordinates [longitude, latitude] must be valid. Longitude must be a real number in the range [-180, +180] and latitude must be a real number in the range [-90, +90].
  • LinearRingShell encloses at most half of the sphere. LinearRing divides the sphere into two regions. The smaller of the two regions will be chosen.
  • LinearRing edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.
  • LinearRings must not cross and must not share edges. LinearRings may share vertices.
  • Polygon doesn’t necessarily contain its vertices.

Examples

The following example checks whether a literal LineString intersects with a Polygon.

let lineString = dynamic({"type":"LineString","coordinates":[[-73.985195,40.788275],[-73.974552,40.779761]]});
let polygon = dynamic({"type":"Polygon","coordinates":[[[-73.9712905883789,40.78580561168767],[-73.98004531860352,40.775276834803655],[-73.97000312805176,40.77852663535664],[-73.9712905883789,40.78580561168767]]]});
print intersects = geo_intersects_line_with_polygon(lineString, polygon)

Output

intersects
True

The following example finds all roads in the NYC GeoJSON roads table that intersect with area of interest literal polygon.

let area_of_interest = dynamic({"type":"Polygon","coordinates":[[[-73.95768642425537,40.80065354924362],[-73.9582872390747,40.80089719667298],[-73.95869493484497,40.80050736035672],[-73.9580512046814,40.80019873831593],[-73.95768642425537,40.80065354924362]]]});
NY_Manhattan_Roads
| project name = features.properties.Label, road = features.geometry
| where geo_intersects_line_with_polygon(road, area_of_interest)
| project name

Output

name
Central Park W
Frederick Douglass Cir
W 110th St
West Dr

The following example finds all counties in the USA that intersect with area of interest literal LineString.

let area_of_interest = dynamic({"type":"LineString","coordinates":[[-73.97159099578857,40.794513338780895],[-73.96738529205322,40.792758888618756],[-73.96978855133057,40.789769718601505]]});
US_Counties
| project name = features.properties.NAME, county = features.geometry
| where geo_intersects_line_with_polygon(area_of_interest, county)
| project name

Output

name
New York

The following example will return a null result because the LineString is invalid.

let lineString = dynamic({"type":"LineString","coordinates":[[-73.985195,40.788275]]});
let polygon = dynamic({"type":"Polygon","coordinates":[[[-73.95768642425537,40.80065354924362],[-73.9582872390747,40.80089719667298],[-73.95869493484497,40.80050736035672],[-73.9580512046814,40.80019873831593],[-73.95768642425537,40.80065354924362]]]});
print isnull(geo_intersects_2lines(lineString, polygon))

Output

print_0
True

The following example will return a null result because the polygon is invalid.

let lineString = dynamic({"type":"LineString","coordinates":[[-73.97159099578857,40.794513338780895],[-73.96738529205322,40.792758888618756],[-73.96978855133057,40.789769718601505]]});
let polygon = dynamic({"type":"Polygon","coordinates":[]});
print isnull(geo_intersects_2lines(lineString, polygon))

Output

print_0
True

6.22 - geo_line_buffer()

Learn how to use the geo_line_buffer() function to calculate line buffer

Calculates polygon or multipolygon that contains all points within the given radius of the input line or multiline on Earth.

Syntax

geo_line_buffer(lineString, radius, tolerance)

Parameters

NameTypeRequiredDescription
lineStringdynamic✔️A LineString or MultiLineString in the GeoJSON format.
radiusreal✔️Buffer radius in meters. Valid value must be positive.
tolerancerealDefines the tolerance in meters that determines how much a polygon can deviate from the ideal radius. If unspecified, the default value 10 is used. Tolerance should be no lower than 0.0001% of the radius. Specifying tolerance bigger than radius lowers the tolerance to biggest possible value below the radius.

Returns

Polygon or MultiPolygon around the input LineString or MultiLineString. If the coordinates or radius or tolerance is invalid, the query produces a null result.

LineString definition and constraints

dynamic({“type”: “LineString”,“coordinates”: [[lng_1,lat_1], [lng_2,lat_2], …, [lng_N,lat_N]]})

dynamic({“type”: “MultiLineString”,“coordinates”: [[line_1, line_2, …, line_N]]})

  • LineString coordinates array must contain at least two entries.
  • Coordinates [longitude, latitude] must be valid where longitude is a real number in the range [-180, +180] and latitude is a real number in the range [-90, +90].
  • Edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.

Examples

The following query calculates polygon around line, with radius of 4 meters and 0.1 meter tolerance

let line = dynamic({"type":"LineString","coordinates":[[-80.66634997047466,24.894526340592122],[-80.67373241820246,24.890808090321286]]});
print buffer = geo_line_buffer(line, 4, 0.1)
buffer
{“type”: “Polygon”, “coordinates”: [ … ]}

The following query calculates buffer around each line and unifies result

datatable(line:dynamic)
[
    dynamic({"type":"LineString","coordinates":[[14.429214068940496,50.10043066548272],[14.431184174126173,50.10046525983731]]}),
    dynamic({"type":"LineString","coordinates":[[14.43030222687753,50.100780677801936],[14.4303847111523,50.10020274910934]]})
]
| project buffer = geo_line_buffer(line, 2, 0.1)
| summarize polygons = make_list(buffer)
| project result = geo_union_polygons_array(polygons)
result
{“type”: “Polygon”,“coordinates”: [ … ]}

The following example will return true, due to invalid line.

print buffer = isnull(geo_line_buffer(dynamic({"type":"LineString"}), 5))
buffer
True

The following example will return true, due to invalid radius.

print buffer = isnull(geo_line_buffer(dynamic({"type":"LineString","coordinates":[[0,0],[1,1]]}), 0))
buffer
True

6.23 - geo_line_centroid()

Learn how to use the geo_line_centroid() function to calculate the centroid of a line or a multiline on Earth.

Calculates the centroid of a line or a multiline on Earth.

Syntax

geo_line_centroid(lineString)

Parameters

NameTypeRequiredDescription
lineStringdynamic✔️A LineString or MultiLineString in the GeoJSON format.

Returns

The centroid coordinate values in GeoJSON Format and of a dynamic data type. If the line or the multiline is invalid, the query produces a null result.

LineString definition and constraints

dynamic({“type”: “LineString”,“coordinates”: [[lng_1,lat_1], [lng_2,lat_2], …, [lng_N,lat_N]]})

dynamic({“type”: “MultiLineString”,“coordinates”: [[line_1, line_2, …, line_N]]})

  • LineString coordinates array must contain at least two entries.
  • Coordinates [longitude, latitude] must be valid where longitude is a real number in the range [-180, +180] and latitude is a real number in the range [-90, +90].
  • Edge length must be less than 180 degrees. The shortest edge between the two vertices is chosen.

Examples

The following example calculates line centroid.

let line = dynamic({"type":"LineString","coordinates":[[-73.95796, 40.80042], [-73.97317, 40.764486]]});
print centroid = geo_line_centroid(line);

Output

centroid
{“type”: “Point”, “coordinates”: [-73.965567057230942, 40.782453249627416]}

The following example calculates line centroid longitude.

let line = dynamic({"type":"LineString","coordinates":[[-73.95807266235352,40.800426144169315],[-73.94966125488281,40.79691751000055],[-73.97317886352539,40.764486356930334],[-73.98210525512695,40.76786669510221],[-73.96004676818848,40.7980870753293]]});
print centroid = geo_line_centroid(line)
| project lng = centroid.coordinates[0]

Output

lng
-73.9660675626837

The following example visualizes line centroid on a map.

let line = dynamic({"type":"MultiLineString","coordinates":[[[-73.95798683166502,40.800556090021466],[-73.98193359375,40.76819171855746]],[[-73.94940376281738,40.79691751000055],[-73.97317886352539,40.76435634049001]]]});
print centroid = geo_line_centroid(line)
| render scatterchart with (kind = map)

Screenshot of the New York City Central Park line centroid.

The following example returns true because of the invalid line.

print is_bad_line = isnull(geo_line_centroid(dynamic({"type":"LineString","coordinates":[[1, 1]]})))

Output

is_bad_line
true

6.24 - geo_line_densify()

Learn how to use the geo_line_densify() function to convert planar lines or multiline edges to geodesics.

Converts planar lines or multiline edges to geodesics by adding intermediate points.

Syntax

geo_line_densify(lineString, tolerance, [ preserve_crossing ])

Parameters

NameTypeRequiredDescription
lineStringdynamic✔️A LineString or MultiLineString in the GeoJSON format.
toleranceint, long, or realDefines maximum distance in meters between the original planar edge and the converted geodesic edge chain. Supported values are in the range [0.1, 10000]. If unspecified, the default value 10 is used.
preserve_crossingboolIf true, preserves edge crossing over antimeridian. If unspecified, the default value false is used.

Returns

Densified line in the GeoJSON format and of a dynamic data type. If either the line or tolerance is invalid, the query will produce a null result.

LineString definition

dynamic({“type”: “LineString”,“coordinates”: [[lng_1,lat_1], [lng_2,lat_2], …, [lng_N,lat_N]]})

dynamic({“type”: “MultiLineString”,“coordinates”: [[line_1, line_2, …, line_N]]})

  • LineString coordinates array must contain at least two entries.
  • The coordinates [longitude, latitude] must be valid. The longitude must be a real number in the range [-180, +180] and the latitude must be a real number in the range [-90, +90].
  • The edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.

Constraints

  • The maximum number of points in the densified line is limited to 10485760.
  • Storing lines in dynamic format has size limits.

Motivation

  • GeoJSON format defines an edge between two points as a straight cartesian line while geo_line_densify() uses geodesic.
  • The decision to use geodesic or planar edges might depend on the dataset and is especially relevant in long edges.

Examples

The following example densifies a road in Manhattan island. The edge is short and the distance between the planar edge and its geodesic counterpart is less than the distance specified by tolerance. As such, the result remains unchanged.

print densified_line = tostring(geo_line_densify(dynamic({"type":"LineString","coordinates":[[-73.949247, 40.796860],[-73.973017, 40.764323]]})))

Output

densified_line
{“type”:“LineString”,“coordinates”:[[-73.949247, 40.796860], [-73.973017, 40.764323]]}

The following example densifies an edge of ~130-km length

print densified_line = tostring(geo_line_densify(dynamic({"type":"LineString","coordinates":[[50, 50], [51, 51]]})))

Output

densified_line
{“type”:“LineString”,“coordinates”:[[50,50],[50.125,50.125],[50.25,50.25],[50.375,50.375],[50.5,50.5],[50.625,50.625],[50.75,50.75],[50.875,50.875],[51,51]]}

The following example returns a null result because of the invalid coordinate input.

print densified_line = geo_line_densify(dynamic({"type":"LineString","coordinates":[[300,1],[1,1]]}))

Output

densified_line

The following example returns a null result because of the invalid tolerance input.

print densified_line = geo_line_densify(dynamic({"type":"LineString","coordinates":[[1,1],[2,2]]}), 0)

Output

densified_line

6.25 - geo_line_length()

Learn how to use the geo_line_length() function to calculate the total length of a line string or a multiline string on Earth.

Calculates the total length of a line or a multiline on Earth.

Syntax

geo_line_length(lineString)

Parameters

NameTypeRequiredDescription
lineStringdynamic✔️A LineString or MultiLineString in the GeoJSON format.

Returns

The total length of a line or a multiline, in meters, on Earth. If the line or multiline is invalid, the query will produce a null result.

LineString definition and constraints

dynamic({“type”: “LineString”,“coordinates”: [[lng_1,lat_1], [lng_2,lat_2], …, [lng_N,lat_N]]})

dynamic({“type”: “MultiLineString”,“coordinates”: [[line_1, line_2, …, line_N]]})

  • LineString coordinates array must contain at least two entries.
  • Coordinates [longitude, latitude] must be valid where longitude is a real number in the range [-180, +180] and latitude is a real number in the range [-90, +90].
  • Edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.

Examples

The following example calculates the total line length, in meters.

let line = dynamic({"type":"LineString","coordinates":[[-73.95807266235352,40.800426144169315],[-73.94966125488281,40.79691751000055],[-73.97317886352539,40.764486356930334]]});
print length = geo_line_length(line)

Output

length
4922.48016992081

The following example calculates total multiline length, in meters.

let line = dynamic({"type":"MultiLineString","coordinates":[[[-73.95798683166502,40.800556090021466],[-73.98193359375,40.76819171855746]],[[-73.94940376281738,40.79691751000055],[-73.97317886352539,40.76435634049001]]]});
print length = geo_line_length(line)

Output

length
8262.24339753741

The following example returns True because of the invalid line.

print is_bad_line = isnull(geo_line_length(dynamic({"type":"LineString","coordinates":[[1, 1]]})))

Output

is_bad_line
True

6.26 - geo_line_simplify()

Learn how to use the geo_line_simplify() function to simplify a line string or a multiline string.

Simplifies a line or a multiline by replacing nearly straight chains of short edges with a single long edge on Earth.

Syntax

geo_line_simplify(lineString, tolerance)

Parameters

NameTypeRequiredDescription
lineStringdynamic✔️A LineString or MultiLineString in the GeoJSON format.
toleranceint, long, or realDefines minimum distance in meters between any two vertices. Supported values are in the range [0, ~7,800,000 meters]. If unspecified, the default value 10 is used.

Returns

Simplified line or a multiline in the GeoJSON format and of a dynamic data type, with no two vertices with distance less than tolerance. If either the line or tolerance is invalid, the query will produce a null result.

LineString definition and constraints

dynamic({“type”: “LineString”,“coordinates”: [[lng_1,lat_1], [lng_2,lat_2], …, [lng_N,lat_N]]})

dynamic({“type”: “MultiLineString”,“coordinates”: [[line_1, line_2, …, line_N]]})

  • LineString coordinates array must contain at least two entries.
  • Coordinates [longitude, latitude] must be valid where longitude is a real number in the range [-180, +180] and latitude is a real number in the range [-90, +90].
  • Edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.

Examples

The following example simplifies the line by removing vertices that are within a 10-meter distance from each other.

let line = dynamic({"type":"LineString","coordinates":[[-73.97033169865608,40.789063020152824],[-73.97039607167244,40.78897975920816],[-73.9704617857933,40.78888837512432],[-73.97052884101868,40.7887949601531],[-73.9706052839756,40.788698498903564],[-73.97065222263336,40.78862640672032],[-73.97072866559029,40.78852791445617],[-73.97079303860664,40.788434498977836]]});
print simplified = geo_line_simplify(line, 10)

Output

simplified
{“type”: “LineString”, “coordinates”: [[-73.97033169865608, 40.789063020152824], [-73.97079303860664, 40.788434498977836]]}

The following example simplifies lines and combines results into GeoJSON geometry collection.

NY_Manhattan_Roads
| project road = features.geometry
| project road_simplified = geo_line_simplify(road, 100)
| summarize roads_lst = make_list(road_simplified)
| project geojson = bag_pack("type", "Feature","geometry", bag_pack("type", "GeometryCollection", "geometries", roads_lst), "properties", bag_pack("name", "roads"))

Output

geojson
{“type”: “Feature”, “geometry”: {“type”: “GeometryCollection”, “geometries”: [ … ]}, “properties”: {“name”: “roads”}}

The following example simplifies lines and unifies result

NY_Manhattan_Roads
| project road = features.geometry
| project road_simplified = geo_line_simplify(road, 100)
| summarize roads_lst = make_list(road_simplified)
| project roads = geo_union_lines_array(roads_lst)

Output

roads
{“type”: “MultiLineString”, “coordinates”: [ … ]}

The following example returns True because of the invalid line.

print is_invalid_line = isnull(geo_line_simplify(dynamic({"type":"LineString","coordinates":[[1, 1]]})))

Output

is_invalid_line
True

The following example returns True because of the invalid tolerance.

print is_invalid_line = isnull(geo_line_simplify(dynamic({"type":"LineString","coordinates":[[1, 1],[2,2]]}), -1))

Output

is_invalid_line
True

The following example returns True because high tolerance causes small line to disappear.

print is_invalid_line = isnull(geo_line_simplify(dynamic({"type":"LineString","coordinates":[[1.1, 1.1],[1.2,1.2]]}), 100000))

Output

is_invalid_line
True

6.27 - geo_line_to_s2cells()

Learn how to use the geo_line_to_s2cells() function to calculate S2 cell tokens that cover a line or a multiline on Earth.

Calculates S2 cell tokens that cover a line or multiline on Earth. This function is a useful geospatial join tool.

Read more about S2 cell hierarchy.

Syntax

geo_line_to_s2cells(lineString [, level[ , radius]])

Parameters

NameTypeRequiredDescription
lineStringdynamic✔️Line or multiline in the GeoJSON format.
levelintDefines the requested cell level. Supported values are in the range [0, 30]. If unspecified, the default value 11 is used.
radiusrealBuffer radius in meters. If unspecified, the default value 0 is used.

Returns

Array of S2 cell token strings that cover a line or a multiline. If the radius is set to a positive value, then the covering will be of both input shape and all points within the radius of the input geometry.

If any of the following: line, level, radius is invalid, or the cell count exceeds the limit, the query will produce a null result.

Choosing the S2 cell level

  • Ideally we would want to cover every line with one or just a few unique cells such that no two lines share the same cell.
  • In practice, try covering with just a few cells, no more than a dozen. Covering with more than 10,000 cells might not yield good performance.
  • Query run time and memory consumption might differ greatly because of different S2 cell level values.

Performance improvement suggestions

  • If possible, reduce lines count due to nature of the data or business needs. Filter out unnecessary lines before join, scope to the area of interest or unify lines.
  • In case of very big lines, reduce their size using geo_line_simplify().
  • Changing S2 cell level may improve performance and memory consumption.
  • Changing join kind and hint may improve performance and memory consumption.
  • In case positive radius is set, reverting to radius 0 on buffered shape using geo_line_buffer() may improve performance.

Examples

The following query finds all tube stations within 500 meters of streets and aggregates tubes count by street name.

let radius = 500;
let tube_stations = datatable(tube_station_name:string, lng:real, lat: real)
[
    "St. James' Park",        -0.13451078568013486, 51.49919145858172,
     "London Bridge station", -0.08492752160134387, 51.504876316440914,
     // more points
];
let streets = datatable(street_name:string, line:dynamic)
[
    "Buckingham Palace", dynamic({"type":"LineString","coordinates":[[-0.1399656708283601,51.50190802248855],[-0.14088438832752104,51.50012082761452]]}),
    "London Bridge",    dynamic({"type":"LineString","coordinates":[[-0.087152,51.509596],[-0.088340,51.506110]]}),
    // more lines
];
let join_level = 14;
let lines = materialize(streets | extend id = new_guid());
let res = 
    lines
    | project id, covering = geo_line_to_s2cells(line, join_level, radius)
    | mv-expand covering to typeof(string)
    | join kind=inner hint.strategy=broadcast
    (
        tube_stations
        | extend covering = geo_point_to_s2cell(lng, lat, join_level)
    ) on covering;
res | lookup lines on id
| where geo_distance_point_to_line(lng, lat, line) <= radius
| summarize count = count() by name = street_name
namecount
Buckingham Palace1
London Bridge1

In case of invalid line, a null result will be returned.

let line = dynamic({"type":"LineString","coordinates":[[[0,0],[0,0]]]});
print isnull(geo_line_to_s2cells(line))
print_0
True

6.28 - geo_point_buffer()

Learn how to use the geo_point_buffer() function to calculate point buffer

Calculates polygon that contains all points within the given radius of the point on Earth.

Syntax

geo_point_buffer(longitude, latitude, radius, tolerance)

Parameters

NameTypeRequiredDescription
longitudereal✔️Geospatial coordinate longitude value in degrees. Valid value is a real number and in the range [-180, +180].
latitudereal✔️Geospatial coordinate latitude value in degrees. Valid value is a real number and in the range [-90, +90].
radiusreal✔️Buffer radius in meters. Valid value must be positive.
tolerancerealDefines the tolerance in meters that determines how much a polygon can deviate from the ideal radius. If unspecified, the default value 10 is used. Tolerance should be no lower than 0.0001% of the radius. Specifying tolerance bigger than radius lowers the tolerance to biggest possible value below the radius.

Returns

Polygon around the input point. If the coordinates or radius or tolerance is invalid, the query produces a null result.

Examples

The following query calculates polygon around [-115.1745008278, 36.1497251277] coordinates, with 20km radius.

print buffer = geo_point_buffer(-115.1745008278, 36.1497251277, 20000)
buffer
{“type”: “Polygon”,“coordinates”: [ … ]}

The following query calculates buffer around each point and unifies result

datatable(longitude:real, latitude:real, radius:real)
[
    real(-80.3212217992616), 25.268683367546604, 5000,
    real(-80.81717403605833), 24.82658441221962, 3000
]
| project buffer = geo_point_buffer(longitude, latitude, radius)
| summarize polygons = make_list(buffer)
| project result = geo_union_polygons_array(polygons)
result
{“type”: “MultiPolygon”,“coordinates”: [ … ]}

The following example returns true, due to invalid point.

print result = isnull(geo_point_buffer(200, 1,0.1))
result
True

The following example returns true, due to invalid radius.

print result = isnull(geo_point_buffer(10, 10, -1))
result
True

6.29 - geo_point_in_circle()

Learn how to use the geo_point_in_circle() function to check if the geospatial coordinates are inside a circle on Earth.

Calculates whether the geospatial coordinates are inside a circle on Earth.

Syntax

geo_point_in_circle(p_longitude, p_latitude, pc_longitude, pc_latitude, c_radius)

Parameters

NameTypeRequiredDescription
p_longitudereal✔️Geospatial coordinate longitude value in degrees. Valid value is a real number and in the range [-180, +180].
p_latitudereal✔️Geospatial coordinate latitude value in degrees. Valid value is a real number and in the range [-90, +90].
pc_longitudereal✔️Circle center geospatial coordinate longitude value in degrees. Valid value is a real number and in the range [-180, +180].
pc_latitudereal✔️circle center geospatial coordinate latitude value in degrees. Valid value is a real number and in the range [-90, +90].
c_radiusreal✔️Circle radius in meters. Valid value must be positive.

Returns

Indicates whether the geospatial coordinates are inside a circle. If the coordinates or circle is invalid, the query produces a null result.

Examples

The following example finds all the places in the area defined by the following circle: Radius of 18 km, center at [-122.317404, 47.609119] coordinates.

Screenshot of a map with places within 18 km of Seattle.

datatable(longitude:real, latitude:real, place:string)
[
    real(-122.317404), 47.609119, 'Seattle',                   // In circle 
    real(-123.497688), 47.458098, 'Olympic National Forest',   // In exterior of circle  
    real(-122.201741), 47.677084, 'Kirkland',                  // In circle
    real(-122.443663), 47.247092, 'Tacoma',                    // In exterior of circle
    real(-122.121975), 47.671345, 'Redmond',                   // In circle
]
| where geo_point_in_circle(longitude, latitude, -122.317404, 47.609119, 18000)
| project place

Output

place
Seattle
Kirkland
Redmond

The following example finds storm events in Orlando. The events are filtered by 100 km within Orlando coordinates, and aggregated by event type and hash.

StormEvents
| project BeginLon, BeginLat, EventType
| where geo_point_in_circle(BeginLon, BeginLat, real(-81.3891), 28.5346, 1000 * 100)
| summarize count() by EventType, hash = geo_point_to_s2cell(BeginLon, BeginLat)
| project geo_s2cell_to_central_point(hash), EventType, count_
| render piechart with (kind=map) // map pie rendering available in Kusto Explorer desktop

Output

Screenshot of storm events in Orlando rendered with pie chart points on a map.

The following example shows New York city taxi pickups within 10 meters of a particular location. Relevant pickups are aggregated by hash.

nyc_taxi
| project pickup_longitude, pickup_latitude
| where geo_point_in_circle( pickup_longitude, pickup_latitude, real(-73.9928), 40.7429, 10)
| summarize by hash = geo_point_to_s2cell(pickup_longitude, pickup_latitude, 22)
| project geo_s2cell_to_central_point(hash)
| render scatterchart with (kind = map)

Output

Screenshot of the rendered map showing nearby New York city taxi pickups, as defined in the query.

The following example returns true.

print in_circle = geo_point_in_circle(-122.143564, 47.535677, -122.100896, 47.527351, 3500)

Output

in_circle
true

The following example returns false.

print in_circle = geo_point_in_circle(-122.137575, 47.630683, -122.100896, 47.527351, 3500)

Output

in_circle
false

The following example returns a null result because of the invalid coordinate input.

print in_circle = geo_point_in_circle(200, 1, 1, 1, 1)

Output

in_circle

The following example returns a null result because of the invalid circle radius input.

print in_circle = geo_point_in_circle(1, 1, 1, 1, -1)

Output

in_circle

6.30 - geo_point_in_polygon()

Learn how to use the geo_point_in_polygon() function to check if the geospatial coordinates are inside a polygon or a multipolygon on Earth.

Calculates whether the geospatial coordinates are inside a polygon or a multipolygon on Earth.

Syntax

geo_point_in_polygon(longitude, latitude, polygon)

Parameters

NameTypeRequiredDescription
longitudereal✔️Geospatial coordinate, longitude value in degrees. Valid value is a real number and in the range [-180, +180].
latitudereal✔️Geospatial coordinate, latitude value in degrees. Valid value is a real number and in the range [-90, +90].
polygondynamic✔️Polygon or multipolygon in the GeoJSON format.

Returns

Indicates whether the geospatial coordinates are inside a polygon. If the coordinates or polygon is invalid, the query produces a null result.

Polygon definition and constraints

dynamic({“type”: “Polygon”,“coordinates”: [ LinearRingShell, LinearRingHole_1, …, LinearRingHole_N ]})

dynamic({“type”: “MultiPolygon”,“coordinates”: [[LinearRingShell, LinearRingHole_1, …, LinearRingHole_N ], …, [LinearRingShell, LinearRingHole_1, …, LinearRingHole_M]]})

  • LinearRingShell is required and defined as a counterclockwise ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be only one shell.
  • LinearRingHole is optional and defined as a clockwise ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be any number of interior rings and holes.
  • LinearRing vertices must be distinct with at least three coordinates. The first coordinate must be equal to the last. At least four entries are required.
  • Coordinates [longitude, latitude] must be valid. Longitude must be a real number in the range [-180, +180] and latitude must be a real number in the range [-90, +90].
  • LinearRingShell encloses at most half of the sphere. LinearRing divides the sphere into two regions. The smaller of the two regions, is chosen.
  • LinearRing edge length must be less than 180 degrees. The shortest edge between the two vertices is chosen.
  • LinearRings must not cross and must not share edges. LinearRings might share vertices.
  • Polygon doesn’t necessarily contain its vertices. Point containment in polygon is defined so that if the Earth is subdivided into polygons, every point is contained by exactly one polygon.

Examples

The following example finds locations which fall within Manhattan island, excluding the area of Central Park.

Screenshot of a map of the Manhattan area, with markers for a landmark, a museum, and an airport. The island appears dimmed except for Central Park.

datatable(longitude:real, latitude:real, description:string)
[
    real(-73.985654), 40.748487, 'Empire State Building',           // In Polygon 
    real(-73.963249), 40.779525, 'The Metropolitan Museum of Art',  // In exterior of polygon
    real(-73.874367), 40.777356, 'LaGuardia Airport',               // In exterior of polygon
]
| where geo_point_in_polygon(longitude, latitude, dynamic({"type":"Polygon","coordinates":[[[-73.92597198486328,40.87821814104651],[-73.94691467285156,40.85069618625578],[-73.94691467285156,40.841865966890786],[-74.01008605957031,40.7519385984599],[-74.01866912841797,40.704586878965245],[-74.01214599609375,40.699901911003046],[-73.99772644042969,40.70875101828792],[-73.97747039794922,40.71083299030839],[-73.97026062011719,40.7290474687069],[-73.97506713867186,40.734510840309376],[-73.970947265625,40.74543623770158],[-73.94210815429688,40.77586181063573],[-73.9434814453125,40.78080140115127],[-73.92974853515625,40.79691751000055],[-73.93077850341797,40.804454347291006],[-73.93489837646484,40.80965166748853],[-73.93524169921875,40.837190668541105],[-73.92288208007812,40.85770758108904],[-73.9101791381836,40.871728144624974],[-73.92597198486328,40.87821814104651]],[[-73.95824432373047,40.80071852197889],[-73.98206233978271,40.76815921628347],[-73.97309303283691,40.76422632379533],[-73.94914627075195,40.796949998204596],[-73.95824432373047,40.80071852197889]]]}))

Output

longitudelatitudedescription
-73.98565440.748487Empire State Building

The following example searches for coordinates in a multipolygon.

Screenshot of a map of the Manhattan area, with markers for the Upper West Side, Greenwich Village, and an airport. Three neighborhoods appear dimmed.

let multipolygon = dynamic({"type":"MultiPolygon","coordinates":[[[[-73.991460000000131,40.731738000000206],[-73.992854491775518,40.730082566051351],[-73.996772,40.725432000000154],[-73.997634685522883,40.725786309886963],[-74.002855946639244,40.728346630056791],[-74.001413,40.731065000000207],[-73.996796995070824,40.73736378205173],[-73.991724524037934,40.735245208931886],[-73.990703782359589,40.734781896080477],[-73.991460000000131,40.731738000000206]]],[[[-73.958357552055688,40.800369095633819],[-73.98143901556422,40.768762584141953],[-73.981548752788598,40.7685590292784],[-73.981565335901905,40.768307084720796],[-73.981754418060945,40.768399727738668],[-73.982038573548124,40.768387823012056],[-73.982268248204349,40.768298621883247],[-73.982384797518051,40.768097213086911],[-73.982320919746599,40.767894461792181],[-73.982155532845766,40.767756204474757],[-73.98238873834039,40.767411004834273],[-73.993650353659021,40.772145571634361],[-73.99415893763998,40.772493009137818],[-73.993831082030937,40.772931787850908],[-73.993891252437052,40.772955194876722],[-73.993962585514595,40.772944653908901],[-73.99401262480508,40.772882846631894],[-73.994122058082397,40.77292405902601],[-73.994136652588594,40.772901870174394],[-73.994301342391154,40.772970028663913],[-73.994281535134448,40.77299380206933],[-73.994376552751078,40.77303955110149],[-73.994294029824005,40.773156243992048],[-73.995023275860802,40.773481196576356],[-73.99508939189289,40.773388475039134],[-73.995013963716758,40.773358035426909],[-73.995050284699261,40.773297153189958],[-73.996240651898916,40.773789791397689],[-73.996195837470992,40.773852356184044],[-73.996098807369748,40.773951805299085],[-73.996179459973888,40.773986954351571],[-73.996095245226442,40.774086186437756],[-73.995572265161172,40.773870731394297],[-73.994017424135961,40.77321375261053],[-73.993935876811335,40.773179512586211],[-73.993861942928888,40.773269531698837],[-73.993822393527211,40.773381758622882],[-73.993767019318497,40.773483981224835],[-73.993698463744295,40.773562141052594],[-73.993358326468751,40.773926888327956],[-73.992622663865575,40.774974056037109],[-73.992577842766124,40.774956016359418],[-73.992527743951555,40.775002110439829],[-73.992469745815342,40.775024159551755],[-73.992403837191887,40.775018140390664],[-73.99226708903538,40.775116033858794],[-73.99217809026365,40.775279293897171],[-73.992059084937338,40.775497598192516],[-73.992125372394938,40.775509075053385],[-73.992226867797001,40.775482211026116],[-73.992329346608813,40.775468900958522],[-73.992361756801131,40.775501899766638],[-73.992386042960277,40.775557180424634],[-73.992087684712729,40.775983970821372],[-73.990927174149746,40.777566878763238],[-73.99039616003671,40.777585065679204],[-73.989461267506471,40.778875124584417],[-73.989175778438053,40.779287524015778],[-73.988868617400072,40.779692922911607],[-73.988871874499793,40.779713738253008],[-73.989219022880576,40.779697895209402],[-73.98927785904425,40.779723439271038],[-73.989409054180143,40.779737706471963],[-73.989498614927044,40.779725044389757],[-73.989596493388234,40.779698146683387],[-73.989679812902509,40.779677568658038],[-73.989752702937935,40.779671244211556],[-73.989842247806507,40.779680752670664],[-73.990040102120489,40.779707677698219],[-73.990137977524839,40.779699769704784],[-73.99033584033225,40.779661794394983],[-73.990430598697046,40.779664973055503],[-73.990622199396725,40.779676064914298],[-73.990745069505479,40.779671328184051],[-73.990872114282197,40.779646007643876],[-73.990961672224358,40.779639683751753],[-73.991057472829539,40.779652352625774],[-73.991157429497036,40.779669775606465],[-73.991242817404469,40.779671367084504],[-73.991255318289745,40.779650782516491],[-73.991294887120119,40.779630209208889],[-73.991321967649895,40.779631796041372],[-73.991359455569423,40.779585883337383],[-73.991551059227476,40.779574821437407],[-73.99141982585985,40.779755280287233],[-73.988886144117032,40.779878898532999],[-73.988939656706265,40.779956178440393],[-73.988926103530844,40.780059292013632],[-73.988911680264692,40.780096037146606],[-73.988919261468567,40.780226094343945],[-73.988381050202634,40.780981074045783],[-73.988232413846987,40.781233144215555],[-73.988210420831663,40.781225482542055],[-73.988140000000143,40.781409000000224],[-73.988041288067166,40.781585961353777],[-73.98810029382463,40.781602878305286],[-73.988076449145055,40.781650935001608],[-73.988018059972219,40.781634188810422],[-73.987960792842145,40.781770987031535],[-73.985465811970457,40.785360700575431],[-73.986172704965611,40.786068452258647],[-73.986455862401996,40.785919219081421],[-73.987072345615601,40.785189638820121],[-73.98711901394276,40.785210319004058],[-73.986497781023601,40.785951202887254],[-73.986164628806279,40.786121882448327],[-73.986128422486075,40.786239001331111],[-73.986071135219746,40.786240706026611],[-73.986027274789123,40.786228964236727],[-73.986097637849426,40.78605822569795],[-73.985429321269592,40.785413942184597],[-73.985081137732209,40.785921935110366],[-73.985198833254501,40.785966552197777],[-73.985170502389906,40.78601333415817],[-73.985216218673656,40.786030501816427],[-73.98525509797993,40.785976205511588],[-73.98524273937646,40.785972572653328],[-73.98524962933017,40.785963139855845],[-73.985281779186749,40.785978620950075],[-73.985240032884533,40.786035858136792],[-73.985683885242182,40.786222123919686],[-73.985717529004575,40.786175994668795],[-73.985765660297687,40.786196274858618],[-73.985682871922691,40.786309786213067],[-73.985636270930442,40.786290150649279],[-73.985670722564691,40.786242911993817],[-73.98520511880038,40.786047669212785],[-73.985211035607492,40.786039554883686],[-73.985162639946992,40.786020999769754],[-73.985131636312062,40.786060297019972],[-73.985016964065125,40.78601423719563],[-73.984655078830457,40.786534741807841],[-73.985743787901043,40.786570082854738],[-73.98589227228328,40.786426529019593],[-73.985942854994988,40.786452847880334],[-73.985949561556794,40.78648711396653],[-73.985812373526713,40.786616865357047],[-73.985135209703174,40.78658761889551],[-73.984619428584324,40.786586016349787],[-73.981952458164173,40.790393724337193],[-73.972823037363767,40.803428052816756],[-73.971036786332192,40.805918478839672],[-73.966701,40.804169000000186],[-73.959647,40.801156000000113],[-73.958508540159471,40.800682279767472],[-73.95853274080838,40.800491362464697],[-73.958357552055688,40.800369095633819]]],[[[-73.943592454622546,40.782747908206574],[-73.943648235390199,40.782656161333449],[-73.943870759887162,40.781273026571704],[-73.94345932494096,40.780048275653243],[-73.943213862652243,40.779317588660199],[-73.943004239504688,40.779639495474292],[-73.942716005450905,40.779544169476175],[-73.942712374762181,40.779214856940001],[-73.942535563208608,40.779090956062532],[-73.942893408188027,40.778614093246276],[-73.942438481745029,40.777315235766039],[-73.942244919522594,40.777104088947254],[-73.942074188038887,40.776917846977142],[-73.942002667222781,40.776185317382648],[-73.942620205199006,40.775180871576474],[-73.94285645694552,40.774796600349191],[-73.94293043781397,40.774676268036011],[-73.945870899588215,40.771692257932997],[-73.946618690150586,40.77093339256956],[-73.948664164778933,40.768857624399587],[-73.950069793030679,40.767025088383498],[-73.954418260786071,40.762184104951245],[-73.95650786241211,40.760285256574043],[-73.958787773424007,40.758213471309809],[-73.973015157270069,40.764278692864671],[-73.955760332998182,40.787906554459667],[-73.944023,40.782960000000301],[-73.943592454622546,40.782747908206574]]]]});
let coordinates = 
    datatable(longitude:real, latitude:real, description:string)
    [
        real(-73.9741), 40.7914, 'Upper West Side',    // In MultiPolygon
        real(-73.9950), 40.7340, 'Greenwich Village',  // In MultiPolygon
        real(-73.8743), 40.7773, 'LaGuardia Airport',  // In exterior of MultiPolygon
    ];
coordinates
| where geo_point_in_polygon(longitude, latitude, multipolygon)

Output

longitudelatitudedescription
-73.974140.7914Upper West Side
-73.99540.734Greenwich Village

The following example finds storm events in California. The events are filtered by a California state polygon and aggregated by event type and hash.

let california = dynamic({"type":"Polygon","coordinates":[[[-123.233256,42.006186],[-122.378853,42.011663],[-121.037003,41.995232],[-120.001861,41.995232],[-119.996384,40.264519],[-120.001861,38.999346],[-118.71478,38.101128],[-117.498899,37.21934],[-116.540435,36.501861],[-115.85034,35.970598],[-114.634459,35.00118],[-114.634459,34.87521],[-114.470151,34.710902],[-114.333228,34.448009],[-114.136058,34.305608],[-114.256551,34.174162],[-114.415382,34.108438],[-114.535874,33.933176],[-114.497536,33.697668],[-114.524921,33.54979],[-114.727567,33.40739],[-114.661844,33.034958],[-114.524921,33.029481],[-114.470151,32.843265],[-114.524921,32.755634],[-114.72209,32.717295],[-116.04751,32.624187],[-117.126467,32.536556],[-117.24696,32.668003],[-117.252437,32.876127],[-117.329114,33.122589],[-117.471515,33.297851],[-117.7837,33.538836],[-118.183517,33.763391],[-118.260194,33.703145],[-118.413548,33.741483],[-118.391641,33.840068],[-118.566903,34.042715],[-118.802411,33.998899],[-119.218659,34.146777],[-119.278905,34.26727],[-119.558229,34.415147],[-119.875891,34.40967],[-120.138784,34.475393],[-120.472878,34.448009],[-120.64814,34.579455],[-120.609801,34.858779],[-120.670048,34.902595],[-120.631709,35.099764],[-120.894602,35.247642],[-120.905556,35.450289],[-121.004141,35.461243],[-121.168449,35.636505],[-121.283465,35.674843],[-121.332757,35.784382],[-121.716143,36.195153],[-121.896882,36.315645],[-121.935221,36.638785],[-121.858544,36.6114],[-121.787344,36.803093],[-121.929744,36.978355],[-122.105006,36.956447],[-122.335038,37.115279],[-122.417192,37.241248],[-122.400761,37.361741],[-122.515777,37.520572],[-122.515777,37.783465],[-122.329561,37.783465],[-122.406238,38.15042],[-122.488392,38.112082],[-122.504823,37.931343],[-122.701993,37.893004],[-122.937501,38.029928],[-122.97584,38.265436],[-123.129194,38.451652],[-123.331841,38.566668],[-123.44138,38.698114],[-123.737134,38.95553],[-123.687842,39.032208],[-123.824765,39.366301],[-123.764519,39.552517],[-123.85215,39.831841],[-124.109566,40.105688],[-124.361506,40.259042],[-124.410798,40.439781],[-124.158859,40.877937],[-124.109566,41.025814],[-124.158859,41.14083],[-124.065751,41.442061],[-124.147905,41.715908],[-124.257444,41.781632],[-124.213628,42.000709],[-123.233256,42.006186]]]});
StormEvents
| project BeginLon, BeginLat, EventType
| where geo_point_in_polygon(BeginLon, BeginLat, california)
| summarize count() by EventType, hash = geo_point_to_s2cell(BeginLon, BeginLat, 7)
| project geo_s2cell_to_central_point(hash), EventType, count_
| render piechart with (kind=map) // map rendering available in Kusto Explorer desktop

Output

Screenshot of storm events in California rendered on a map by event type as pie chart indicators.

The following example shows how to classify coordinates to polygons using the partition operator.

let Polygons = datatable(description:string, polygon:dynamic)
    [  
      "New York city area", dynamic({"type":"Polygon","coordinates":[[[-73.85009765625,40.85744791303121],[-74.16046142578125,40.84290487729676],[-74.190673828125,40.59935608796518],[-73.83087158203125,40.61812224225511],[-73.85009765625,40.85744791303121]]]}),
      "Seattle area",       dynamic({"type":"Polygon","coordinates":[[[-122.200927734375,47.68573021131587],[-122.4591064453125,47.68573021131587],[-122.4755859375,47.468949677672484],[-122.17620849609374,47.47266286861342],[-122.200927734375,47.68573021131587]]]}),
      "Las Vegas",          dynamic({"type":"Polygon","coordinates":[[[-114.9,36.36],[-115.4498291015625,36.33282808737917],[-115.4498291015625,35.84453450421662],[-114.949951171875,35.902399875143615],[-114.9,36.36]]]}),
    ];
let Locations = datatable(longitude:real, latitude:real)
    [
      real(-73.95),  real(40.75), // Somewhere in New York
      real(-122.3),  real(47.6),  // Somewhere in Seattle
      real(-115.18), real(36.16)  // Somewhere in Las Vegas
    ];
Polygons
| project polygonPartition = tostring(pack("description", description, "polygon", polygon))
| partition hint.materialized=true hint.strategy=native by polygonPartition
{   
     Locations
     | extend description = parse_json(toscalar(polygonPartition)).description
     | extend polygon = parse_json(toscalar(polygonPartition)).polygon
     | where geo_point_in_polygon(longitude, latitude, polygon)
     | project-away polygon
}

Output

longitudelatitudedescription
-73.9540.75New York city area
-122.347.6Seattle area
-115.1836.16Las Vegas

See also geo_polygon_to_s2cells().

The following example folds several polygons into one multipolygon and checks locations that fall within the multipolygon.

let Polygons = 
    datatable(polygon:dynamic)
    [
        dynamic({"type":"Polygon","coordinates":[[[-73.991460000000131,40.731738000000206],[-73.992854491775518,40.730082566051351],[-73.996772,40.725432000000154],[-73.997634685522883,40.725786309886963],[-74.002855946639244,40.728346630056791],[-74.001413,40.731065000000207],[-73.996796995070824,40.73736378205173],[-73.991724524037934,40.735245208931886],[-73.990703782359589,40.734781896080477],[-73.991460000000131,40.731738000000206]]]}),
        dynamic({"type":"Polygon","coordinates":[[[-73.958357552055688,40.800369095633819],[-73.98143901556422,40.768762584141953],[-73.981548752788598,40.7685590292784],[-73.981565335901905,40.768307084720796],[-73.981754418060945,40.768399727738668],[-73.982038573548124,40.768387823012056],[-73.982268248204349,40.768298621883247],[-73.982384797518051,40.768097213086911],[-73.982320919746599,40.767894461792181],[-73.982155532845766,40.767756204474757],[-73.98238873834039,40.767411004834273],[-73.993650353659021,40.772145571634361],[-73.99415893763998,40.772493009137818],[-73.993831082030937,40.772931787850908],[-73.993891252437052,40.772955194876722],[-73.993962585514595,40.772944653908901],[-73.99401262480508,40.772882846631894],[-73.994122058082397,40.77292405902601],[-73.994136652588594,40.772901870174394],[-73.994301342391154,40.772970028663913],[-73.994281535134448,40.77299380206933],[-73.994376552751078,40.77303955110149],[-73.994294029824005,40.773156243992048],[-73.995023275860802,40.773481196576356],[-73.99508939189289,40.773388475039134],[-73.995013963716758,40.773358035426909],[-73.995050284699261,40.773297153189958],[-73.996240651898916,40.773789791397689],[-73.996195837470992,40.773852356184044],[-73.996098807369748,40.773951805299085],[-73.996179459973888,40.773986954351571],[-73.996095245226442,40.774086186437756],[-73.995572265161172,40.773870731394297],[-73.994017424135961,40.77321375261053],[-73.993935876811335,40.773179512586211],[-73.993861942928888,40.773269531698837],[-73.993822393527211,40.773381758622882],[-73.993767019318497,40.773483981224835],[-73.993698463744295,40.773562141052594],[-73.993358326468751,40.773926888327956],[-73.992622663865575,40.774974056037109],[-73.992577842766124,40.774956016359418],[-73.992527743951555,40.775002110439829],[-73.992469745815342,40.775024159551755],[-73.992403837191887,40.775018140390664],[-73.99226708903538,40.775116033858794],[-73.99217809026365,40.775279293897171],[-73.992059084937338,40.775497598192516],[-73.992125372394938,40.775509075053385],[-73.992226867797001,40.775482211026116],[-73.992329346608813,40.775468900958522],[-73.992361756801131,40.775501899766638],[-73.992386042960277,40.775557180424634],[-73.992087684712729,40.775983970821372],[-73.990927174149746,40.777566878763238],[-73.99039616003671,40.777585065679204],[-73.989461267506471,40.778875124584417],[-73.989175778438053,40.779287524015778],[-73.988868617400072,40.779692922911607],[-73.988871874499793,40.779713738253008],[-73.989219022880576,40.779697895209402],[-73.98927785904425,40.779723439271038],[-73.989409054180143,40.779737706471963],[-73.989498614927044,40.779725044389757],[-73.989596493388234,40.779698146683387],[-73.989679812902509,40.779677568658038],[-73.989752702937935,40.779671244211556],[-73.989842247806507,40.779680752670664],[-73.990040102120489,40.779707677698219],[-73.990137977524839,40.779699769704784],[-73.99033584033225,40.779661794394983],[-73.990430598697046,40.779664973055503],[-73.990622199396725,40.779676064914298],[-73.990745069505479,40.779671328184051],[-73.990872114282197,40.779646007643876],[-73.990961672224358,40.779639683751753],[-73.991057472829539,40.779652352625774],[-73.991157429497036,40.779669775606465],[-73.991242817404469,40.779671367084504],[-73.991255318289745,40.779650782516491],[-73.991294887120119,40.779630209208889],[-73.991321967649895,40.779631796041372],[-73.991359455569423,40.779585883337383],[-73.991551059227476,40.779574821437407],[-73.99141982585985,40.779755280287233],[-73.988886144117032,40.779878898532999],[-73.988939656706265,40.779956178440393],[-73.988926103530844,40.780059292013632],[-73.988911680264692,40.780096037146606],[-73.988919261468567,40.780226094343945],[-73.988381050202634,40.780981074045783],[-73.988232413846987,40.781233144215555],[-73.988210420831663,40.781225482542055],[-73.988140000000143,40.781409000000224],[-73.988041288067166,40.781585961353777],[-73.98810029382463,40.781602878305286],[-73.988076449145055,40.781650935001608],[-73.988018059972219,40.781634188810422],[-73.987960792842145,40.781770987031535],[-73.985465811970457,40.785360700575431],[-73.986172704965611,40.786068452258647],[-73.986455862401996,40.785919219081421],[-73.987072345615601,40.785189638820121],[-73.98711901394276,40.785210319004058],[-73.986497781023601,40.785951202887254],[-73.986164628806279,40.786121882448327],[-73.986128422486075,40.786239001331111],[-73.986071135219746,40.786240706026611],[-73.986027274789123,40.786228964236727],[-73.986097637849426,40.78605822569795],[-73.985429321269592,40.785413942184597],[-73.985081137732209,40.785921935110366],[-73.985198833254501,40.785966552197777],[-73.985170502389906,40.78601333415817],[-73.985216218673656,40.786030501816427],[-73.98525509797993,40.785976205511588],[-73.98524273937646,40.785972572653328],[-73.98524962933017,40.785963139855845],[-73.985281779186749,40.785978620950075],[-73.985240032884533,40.786035858136792],[-73.985683885242182,40.786222123919686],[-73.985717529004575,40.786175994668795],[-73.985765660297687,40.786196274858618],[-73.985682871922691,40.786309786213067],[-73.985636270930442,40.786290150649279],[-73.985670722564691,40.786242911993817],[-73.98520511880038,40.786047669212785],[-73.985211035607492,40.786039554883686],[-73.985162639946992,40.786020999769754],[-73.985131636312062,40.786060297019972],[-73.985016964065125,40.78601423719563],[-73.984655078830457,40.786534741807841],[-73.985743787901043,40.786570082854738],[-73.98589227228328,40.786426529019593],[-73.985942854994988,40.786452847880334],[-73.985949561556794,40.78648711396653],[-73.985812373526713,40.786616865357047],[-73.985135209703174,40.78658761889551],[-73.984619428584324,40.786586016349787],[-73.981952458164173,40.790393724337193],[-73.972823037363767,40.803428052816756],[-73.971036786332192,40.805918478839672],[-73.966701,40.804169000000186],[-73.959647,40.801156000000113],[-73.958508540159471,40.800682279767472],[-73.95853274080838,40.800491362464697],[-73.958357552055688,40.800369095633819]]]}),
        dynamic({"type":"Polygon","coordinates":[[[-73.943592454622546,40.782747908206574],[-73.943648235390199,40.782656161333449],[-73.943870759887162,40.781273026571704],[-73.94345932494096,40.780048275653243],[-73.943213862652243,40.779317588660199],[-73.943004239504688,40.779639495474292],[-73.942716005450905,40.779544169476175],[-73.942712374762181,40.779214856940001],[-73.942535563208608,40.779090956062532],[-73.942893408188027,40.778614093246276],[-73.942438481745029,40.777315235766039],[-73.942244919522594,40.777104088947254],[-73.942074188038887,40.776917846977142],[-73.942002667222781,40.776185317382648],[-73.942620205199006,40.775180871576474],[-73.94285645694552,40.774796600349191],[-73.94293043781397,40.774676268036011],[-73.945870899588215,40.771692257932997],[-73.946618690150586,40.77093339256956],[-73.948664164778933,40.768857624399587],[-73.950069793030679,40.767025088383498],[-73.954418260786071,40.762184104951245],[-73.95650786241211,40.760285256574043],[-73.958787773424007,40.758213471309809],[-73.973015157270069,40.764278692864671],[-73.955760332998182,40.787906554459667],[-73.944023,40.782960000000301],[-73.943592454622546,40.782747908206574]]]}),
    ];
let Coordinates = 
    datatable(longitude:real, latitude:real, description:string)
    [
        real(-73.9741), 40.7914, 'Upper West Side',
        real(-73.9950), 40.7340, 'Greenwich Village',
        real(-73.8743), 40.7773, 'LaGuardia Airport',
    ];
let multipolygon = toscalar(
    Polygons
    | project individual_polygon = pack_array(polygon.coordinates)
    | summarize multipolygon_coordinates = make_list(individual_polygon)
    | project multipolygon = bag_pack("type","MultiPolygon", "coordinates", multipolygon_coordinates));
Coordinates
| where geo_point_in_polygon(longitude, latitude, multipolygon)

Output

longitudelatitudedescription
-73.974140.7914Upper West Side
-73.99540.734Greenwich Village

The following example returns a null result because of the invalid coordinate input.

print in_polygon = geo_point_in_polygon(200,1,dynamic({"type": "Polygon","coordinates": [[[0,0],[10,10],[10,1],[0,0]]]}))

Output

in_polygon

The following example returns a null result because of the invalid polygon input.

print in_polygon = geo_point_in_polygon(1,1,dynamic({"type": "Polygon","coordinates": [[[0,0],[10,10],[10,10],[0,0]]]}))

Output

in_polygon

6.31 - geo_point_to_geohash()

Learn how to use the geo_point_to_geohash() function to calculate the geohash string value of a geographic location.

Calculates the geohash string value of a geographic location.

Read more about geohash.

Syntax

geo_point_to_geohash(longitude, latitude,[ accuracy ])

Parameters

NameTypeRequiredDescription
longitudereal✔️Geospatial coordinate, longitude value in degrees. Valid value is a real number and in the range [-180, +180].
latitudereal✔️Geospatial coordinate, latitude value in degrees. Valid value is a real number and in the range [-90, +90].
accuracyintDefines the requested accuracy. Supported values are in the range [1, 18]. If unspecified, the default value 5 is used.

Returns

The geohash string value of a given geographic location with requested accuracy length. If the coordinate or accuracy is invalid, the query produces an empty result.

Geohash rectangular area coverage per accuracy value:

AccuracyWidthHeight
15000 km5000 km
21250 km625 km
3156.25 km156.25 km
439.06 km19.53 km
54.88 km4.88 km
61.22 km0.61 km
7152.59 m152.59 m
838.15 m19.07 m
94.77 m4.77 m
101.19 m0.59 m
11149.01 mm149.01 mm
1237.25 mm18.63 mm
134.66 mm4.66 mm
141.16 mm0.58 mm
15145.52 μ145.52 μ
1636.28 μ18.19 μ
174.55 μ4.55 μ
181.14 μ0.57 μ

See also geo_point_to_s2cell(), geo_point_to_h3cell().

Examples

The following example finds US storm events aggregated by geohash.

StormEvents
| project BeginLon, BeginLat
| summarize by hash=geo_point_to_geohash(BeginLon, BeginLat, 3)
| project geo_geohash_to_central_point(hash)
| render scatterchart with (kind=map)

Output

Screenshot of US storm events grouped by geohash.

The following example calculates and returns the geohash string value.

print geohash = geo_point_to_geohash(-80.195829, 25.802215, 8)

Output

geohash
dhwfz15h

The following example finds groups of coordinates. Every pair of coordinates in the group resides in a rectangular area of 4.88 km by 4.88 km.

datatable(location_id:string, longitude:real, latitude:real)
[
  "A", double(-122.303404), 47.570482,
  "B", double(-122.304745), 47.567052,
  "C", double(-122.278156), 47.566936,
]
| summarize count = count(),                                          // items per group count
            locations = make_list(location_id)                        // items in the group
            by geohash = geo_point_to_geohash(longitude, latitude)    // geohash of the group

Output

geohashcountlocations
c23n82[“A”, “B”]
c23n91[“C”]

The following example produces an empty result because of the invalid coordinate input.

print geohash = geo_point_to_geohash(200,1,8)

Output

geohash

The following example produces an empty result because of the invalid accuracy input.

print geohash = geo_point_to_geohash(1,1,int(null))

Output

geohash

6.32 - geo_point_to_h3cell()

Learn how to use the geo_point_to_h3cell() function to calculate the H3 Cell token string value of a geographic location.

Calculates the H3 Cell token string value of a geographic location.

Read more about H3 Cell.

Syntax

geo_point_to_h3cell(longitude, latitude, [ resolution ])

Parameters

NameTypeRequiredDescription
longitudereal✔️Geospatial coordinate, longitude value in degrees. Valid value is a real number and in the range [-180, +180].
latitudereal✔️Geospatial coordinate, latitude value in degrees. Valid value is a real number and in the range [-90, +90].
resolutionintDefines the requested cell resolution. Supported values are in the range [0, 15]. If unspecified, the default value 6 is used.

Returns

The H3 Cell token string value of a given geographic location. If the coordinates or levels are invalid, the query will produce an empty result.

H3 Cell approximate area coverage per resolution value

LevelAverage Hexagon Edge Length
01108 km
1419 km
2158 km
360 km
423 km
58 km
63 km
71 km
8460 m
9174 m
1066 m
1125 m
129 m
133 m
141 m
150.5 m

The table source can be found in this H3 Cell statistical resource.

See also geo_point_to_s2cell(), geo_point_to_geohash().

Examples

print h3cell = geo_point_to_h3cell(-74.04450446039874, 40.689250859314974, 6)

Output

h3cell
862a1072fffffff

The following example finds groups of coordinates. Every pair of coordinates in the group resides in the H3 Cell with average hexagon area of 253 km².

datatable(location_id:string, longitude:real, latitude:real)
[
    "A", -73.956683, 40.807907,
    "B", -73.916869, 40.818314,
    "C", -73.989148, 40.743273,
]
| summarize count = count(),                                         // Items per group count
            locations = make_list(location_id)                       // Items in the group
            by h3cell = geo_point_to_h3cell(longitude, latitude, 5)  // H3 Cell of the group

Output

h3cellcountlocations
852a100bfffffff2[
“A”,
“B”
]
852a1073fffffff1[
“C”
]

The following example produces an empty result because of the invalid coordinate input.

print h3cell = geo_point_to_h3cell(300,1,8)

Output

h3cell

The following example produces an empty result because of the invalid level input.

print h3cell = geo_point_to_h3cell(1,1,16)

Output

h3cell

The following example produces an empty result because of the invalid level input.

print h3cell = geo_point_to_h3cell(1,1,int(null))

Output

h3cell

6.33 - geo_point_to_s2cell()

Learn how to use the geo_point_to_s2cell() function to calculate the S2 cell token string value of a geographic location.

Calculates the S2 cell token string value of a geographic location.

Read more about S2 cell hierarchy. S2 cell can be a useful geospatial clustering tool. An S2 cell is a cell on a spherical surface and it has geodesic edges. S2 cells are part of a hierarchy dividing up the Earth’s surface. They have a maximum of 31 levels, ranging from zero to 30, which define the number of times a cell is subdivided. Levels range from the largest coverage on level zero with area coverage of 85,011,012.19km², to the lowest coverage of 0.44 cm² at level 30. As S2 cells are subdivided at higher levels, the cell center is preserved well. Two geographic locations can be very close to each other but they have different S2 cell tokens.

Read more about S2 cell hierarchy.

Syntax

geo_point_to_s2cell(longitude, latitude, [ level ])

Parameters

NameTypeRequiredDescription
longitudereal✔️Geospatial coordinate, longitude value in degrees. Valid value is a real number and in the range [-180, +180].
latitudereal✔️Geospatial coordinate, latitude value in degrees. Valid value is a real number and in the range [-90, +90].
levelintDefines the requested cell level. Supported values are in the range [0, 30]. If unspecified, the default value 11 is used.

Returns

The S2 cell token string value of a given geographic location. If the coordinates or levels are invalid, the query produces an empty result.

S2 cell approximate area coverage per level value

For every level, the size of the S2 cell is similar but not exactly equal. Nearby cell sizes tend to be more equal.

LevelMinimum random cell edge length (UK)Maximum random cell edge length (US)
07842 km7842 km
13921 km5004 km
21825 km2489 km
3840 km1310 km
4432 km636 km
5210 km315 km
6108 km156 km
754 km78 km
827 km39 km
914 km20 km
107 km10 km
113 km5 km
121699 m2 km
13850 m1225 m
14425 m613 m
15212 m306 m
16106 m153 m
1753 m77 m
1827 m38 m
1913 m19 m
207 m10 m
213 m5 m
22166 cm2 m
2383 cm120 cm
2441 cm60 cm
2521 cm30 cm
2610 cm15 cm
275 cm7 cm
282 cm4 cm
2912 mm18 mm
306 mm9 mm

The table source can be found in this S2 Cell statistical resource.

Examples

US storm events aggregated by S2 cell

The following example finds US storm events aggregated by S2 cells.

StormEvents
| project BeginLon, BeginLat
| summarize by hash=geo_point_to_s2cell(BeginLon, BeginLat, 5)
| project geo_s2cell_to_central_point(hash)
| render scatterchart with (kind=map)

Output

Screenshot of a map rendering of US storm events aggregated by S2 cell.

The following example calculates the S2 cell ID.

print s2cell = geo_point_to_s2cell(-80.195829, 25.802215, 8)

Output

s2cell
88d9b

Find a group of coordinates

The following example finds groups of coordinates. Every pair of coordinates in the group resides in the S2 cell with a maximum area of 1632.45 km².

datatable(location_id:string, longitude:real, latitude:real)
[
  "A", 10.1234, 53,
  "B", 10.3579, 53,
  "C", 10.6842, 53,
]
| summarize count = count(),                                        // items per group count
            locations = make_list(location_id)                      // items in the group
            by s2cell = geo_point_to_s2cell(longitude, latitude, 8) // s2 cell of the group

Output

s2cellcountlocations
47b1d2[“A”,“B”]
47ae31[“C”]

Empty results

The following example produces an empty result because of the invalid coordinate input.

print s2cell = geo_point_to_s2cell(300,1,8)

Output

s2cell

The following example produces an empty result because of the invalid level input.

print s2cell = geo_point_to_s2cell(1,1,35)

Output

s2cell

The following example produces an empty result because of the invalid level input.

print s2cell = geo_point_to_s2cell(1,1,int(null))

Output

s2cell

6.34 - geo_polygon_area()

Learn how to use the geo_polygon_area() function to calculate the area of a polygon or a multipolygon on Earth.

Calculates the area of a polygon or a multipolygon on Earth.

Syntax

geo_polygon_area(polygon)

Parameters

NameTypeRequiredDescription
polygondynamic✔️Polygon or multipolygon in the GeoJSON format.

Returns

The area of a polygon or a multipolygon, in square meters, on Earth. If the polygon or the multipolygon is invalid, the query will produce a null result.

Polygon definition and constraints

dynamic({“type”: “Polygon”,“coordinates”: [ LinearRingShell, LinearRingHole_1, …, LinearRingHole_N ]})

dynamic({“type”: “MultiPolygon”,“coordinates”: [[ LinearRingShell, LinearRingHole_1, …, LinearRingHole_N ], …, [LinearRingShell, LinearRingHole_1, …, LinearRingHole_M]]})

  • LinearRingShell is required and defined as a counterclockwise ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be only one shell.
  • LinearRingHole is optional and defined as a clockwise ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be any number of interior rings and holes.
  • LinearRing vertices must be distinct with at least three coordinates. The first coordinate must be equal to the last. At least four entries are required.
  • Coordinates [longitude, latitude] must be valid. Longitude must be a real number in the range [-180, +180] and latitude must be a real number in the range [-90, +90].
  • LinearRingShell encloses at most half of the sphere. LinearRing divides the sphere into two regions. The smaller of the two regions will be chosen.
  • LinearRing edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.
  • LinearRings must not cross and must not share edges. LinearRings may share vertices.

Examples

The following example calculates NYC Central Park area.

let central_park = dynamic({"type":"Polygon","coordinates":[[[-73.9495,40.7969],[-73.95807266235352,40.80068603561921],[-73.98201942443848,40.76825672305777],[-73.97317886352539,40.76455136505513],[-73.9495,40.7969]]]});
print area = geo_polygon_area(central_park)

Output

area
3475207.28346606

The following example performs union of polygons in multipolygon and calculates area on the unified polygon.

let polygons = dynamic({"type":"MultiPolygon","coordinates":[[[[-73.9495,40.7969],[-73.95807266235352,40.80068603561921],[-73.98201942443848,40.76825672305777],[-73.97317886352539,40.76455136505513],[-73.9495,40.7969]]],[[[-73.94262313842773,40.775991804565585],[-73.98107528686523,40.791849155467695],[-73.99600982666016,40.77092185281977],[-73.96150588989258,40.75609977566361],[-73.94262313842773,40.775991804565585]]]]});
print polygons_union_area = geo_polygon_area(polygons)

Output

polygons_union_area
10889971.5343487

The following example calculates top 5 biggest US states by area.

US_States
| project name = features.properties.NAME, polygon = geo_polygon_densify(features.geometry)
| project name, area = geo_polygon_area(polygon)
| top 5 by area desc

Output

namearea
Alaska1550934810070.61
Texas693231378868.483
California410339536449.521
Montana379583933973.436
New Mexico314979912310.579

The following example returns True because of the invalid polygon.

print isnull(geo_polygon_area(dynamic({"type": "Polygon","coordinates": [[[0,0],[10,10],[10,10],[0,0]]]})))

Output

print_0
True

6.35 - geo_polygon_buffer()

Learn how to use the geo_polygon_buffer() function to calculate polygon buffer

Calculates polygon or multipolygon that contains all points within the given radius of the input polygon or multipolygon on Earth.

Syntax

geo_polygon_buffer(polygon, radius, tolerance)

Parameters

NameTypeRequiredDescription
polygondynamic✔️Polygon or multipolygon in the GeoJSON format.
radiusreal✔️Buffer radius in meters. Valid value must be positive.
tolerancerealDefines the tolerance in meters that determines how much a polygon can deviate from the ideal radius. If unspecified, the default value 10 is used. Tolerance should be no lower than 0.0001% of the radius. Specifying tolerance bigger than radius will lower the tolerance to biggest possible value below the radius.

Returns

Polygon or MultiPolygon around the input Polygon or multipolygon. If the coordinates or radius or tolerance is invalid, the query will produce a null result.

Polygon definition and constraints

dynamic({“type”: “Polygon”,“coordinates”: [LinearRingShell, LinearRingHole_1, …, LinearRingHole_N]})

dynamic({“type”: “MultiPolygon”,“coordinates”: [[LinearRingShell, LinearRingHole_1, …, LinearRingHole_N], …, [LinearRingShell, LinearRingHole_1, …, LinearRingHole_M]]})

  • LinearRingShell is required and defined as a counterclockwise ordered array of coordinates [[lng_1,lat_1], …, [lng_i,lat_i], …,[lng_j,lat_j], …,[lng_1,lat_1]]. There can be only one shell.
  • LinearRingHole is optional and defined as a clockwise ordered array of coordinates [[lng_1,lat_1], …,[lng_i,lat_i], …,[lng_j,lat_j], …,[lng_1,lat_1]]. There can be any number of interior rings and holes.
  • LinearRing vertices must be distinct with at least three coordinates. The first coordinate must be equal to the last. At least four entries are required.
  • Coordinates [longitude, latitude] must be valid. Longitude must be a real number in the range [-180, +180] and latitude must be a real number in the range [-90, +90].
  • LinearRingShell encloses at most half of the sphere. LinearRing divides the sphere into two regions. The smaller of the two regions will be chosen.
  • LinearRing edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.
  • LinearRings must not cross and must not share edges. LinearRings may share vertices.
  • Polygon contains its vertices.

Examples

The following query calculates polygon around input polygon, with radius of 10km.

let polygon = dynamic({"type":"Polygon","coordinates":[[[139.813757,35.719666],[139.72558,35.71813],[139.727471,35.653231],[139.818721,35.657264],[139.813757,35.719666]]]});
print buffer = geo_polygon_buffer(polygon, 10000)
buffer
{“type”: “Polygon”,“coordinates”: [ … ]}

The following query calculates buffer around each polygon and unifies result

datatable(polygon:dynamic, radius:real )
[
    dynamic({"type":"Polygon","coordinates":[[[12.451218693639277,41.906457003556625],[12.445753852969375,41.90160968881543],[12.453514425793855,41.90361551885886],[12.451218693639277,41.906457003556625]]]}), 100,
    dynamic({"type":"Polygon","coordinates":[[[12.4566086734784,41.905119850039995],[12.453913683559591,41.903652663265234],[12.455485761012113,41.90146110630562],[12.4566086734784,41.905119850039995]]]}), 20
]
| project buffer = geo_polygon_buffer(polygon, radius)
| summarize polygons = make_list(buffer)
| project result = geo_union_polygons_array(polygons)
result
{“type”: “Polygon”,“coordinates”: [ … ]}

The following example will return true, due to invalid polygon.

print buffer = isnull(geo_polygon_buffer(dynamic({"type":"p"}), 1))
buffer
True

The following example will return true, due to invalid radius.

print buffer = isnull(geo_polygon_buffer(dynamic({"type":"Polygon","coordinates":[[[10,10],[0,10],[0,0],[10,10]]]}), 0))
buffer
True

6.36 - geo_polygon_centroid()

Learn how to use the geo_polygon_centroid() function to calculate the centroid of a polygon or a multipolygon on Earth.

Calculates the centroid of a polygon or a multipolygon on Earth.

Syntax

geo_polygon_centroid(polygon)

Parameters

NameTypeRequiredDescription
polygondynamic✔️Polygon or multipolygon in the GeoJSON format.

Returns

The centroid coordinate values in GeoJSON Format and of a dynamic data type. If polygon or multipolygon are invalid, the query produces a null result.

Polygon definition and constraints

dynamic({“type”: “Polygon”,“coordinates”: [ LinearRingShell, LinearRingHole_1, …, LinearRingHole_N ]})

dynamic({“type”: “MultiPolygon”,“coordinates”: [[ LinearRingShell, LinearRingHole_1, …, LinearRingHole_N], …, [LinearRingShell, LinearRingHole_1, …, LinearRingHole_M]]})

  • LinearRingShell is required and defined as a counterclockwise ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be only one shell.
  • LinearRingHole is optional and defined as a clockwise ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be any number of interior rings and holes.
  • LinearRing vertices must be distinct with at least three coordinates. The first coordinate must be equal to the last. At least four entries are required.
  • Coordinates [longitude, latitude] must be valid. Longitude must be a real number in the range [-180, +180] and latitude must be a real number in the range [-90, +90].
  • LinearRingShell encloses at most half of the sphere. LinearRing divides the sphere into two regions and chooses the smaller of the two regions.
  • LinearRing edge length must be less than 180 degrees. The shortest edge between the two vertices is chosen.
  • LinearRings must not cross and must not share edges. LinearRings might share vertices.

Examples

The following example calculates the Central Park centroid in New York City.

let central_park = dynamic({"type":"Polygon","coordinates":[[[-73.9495,40.7969],[-73.95807266235352,40.80068603561921],[-73.98201942443848,40.76825672305777],[-73.97317886352539,40.76455136505513],[-73.9495,40.7969]]]});
print centroid = geo_polygon_centroid(central_park)

Output

centroid
{“type”: “Point”, “coordinates”: [-73.965735689907618, 40.782550538057812]}

The following example calculates the Central Park centroid longitude.

let central_park = dynamic({"type":"Polygon","coordinates":[[[-73.9495,40.7969],[-73.95807266235352,40.80068603561921],[-73.98201942443848,40.76825672305777],[-73.97317886352539,40.76455136505513],[-73.9495,40.7969]]]});
print 
centroid = geo_polygon_centroid(central_park)
| project lng = centroid.coordinates[0]

Output

lng
-73.9657356899076

The following example performs union of polygons in multipolygon and calculates the centroid of the unified polygon.

let polygons = dynamic({"type":"MultiPolygon","coordinates":[[[[-73.9495,40.7969],[-73.95807266235352,40.80068603561921],[-73.98201942443848,40.76825672305777],[-73.97317886352539,40.76455136505513],[-73.9495,40.7969]]],[[[-73.94262313842773,40.775991804565585],[-73.98107528686523,40.791849155467695],[-73.99600982666016,40.77092185281977],[-73.96150588989258,40.75609977566361],[-73.94262313842773,40.775991804565585]]]]});
print polygons_union_centroid = geo_polygon_centroid(polygons)

Output

polygons_union_centroid
“type”: “Point”, “coordinates”: [-73.968569587829577, 40.776310752555119]}

The following example visualizes the Central Park centroid on a map.

let central_park = dynamic({"type":"Polygon","coordinates":[[[-73.9495,40.7969],[-73.95807266235352,40.80068603561921],[-73.98201942443848,40.76825672305777],[-73.97317886352539,40.76455136505513],[-73.9495,40.7969]]]});
print 
centroid = geo_polygon_centroid(central_park)
| render scatterchart with (kind = map)

Output

Screenshot of New York City Central park centroid.

The following example returns true because of the invalid polygon.

print isnull(geo_polygon_centroid(dynamic({"type": "Polygon","coordinates": [[[0,0],[10,10],[10,10],[0,0]]]})))

Output

print_0
true

6.37 - geo_polygon_densify()

Learn how to use the geo_polygon_densify() function to convert polygon or multipolygon planar edges to geodesics.

Converts polygon or multipolygon planar edges to geodesics by adding intermediate points.

Syntax

geo_polygon_densify(polygon, tolerance, [ preserve_crossing ])

Parameters

NameTypeRequiredDescription
polygondynamic✔️Polygon or multipolygon in the GeoJSON format.
toleranceint, long, or realDefines maximum distance in meters between the original planar edge and the converted geodesic edge chain. Supported values are in the range [0.1, 10000]. If unspecified, the default value is 10.
preserve_crossingboolIf true, preserves edge crossing over antimeridian. If unspecified, the default value false is used.

Polygon definition

dynamic({“type”: “Polygon”,“coordinates”: [ LinearRingShell, LinearRingHole_1, …, LinearRingHole_N ]})

dynamic({“type”: “MultiPolygon”,“coordinates”: [[ LinearRingShell, LinearRingHole_1, …, LinearRingHole_N ], …, [LinearRingShell, LinearRingHole_1, …, LinearRingHole_M]]})

  • LinearRingShell is required and defined as a counterclockwise ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be only one shell.
  • LinearRingHole is optional and defined as a clockwise ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be any number of interior rings and holes.
  • LinearRing vertices must be distinct with at least three coordinates. The first coordinate must be equal to the last. At least four entries are required.
  • Coordinates [longitude, latitude] must be valid. Longitude must be a real number in the range [-180, +180] and latitude must be a real number in the range [-90, +90].
  • LinearRingShell encloses at most half of the sphere. LinearRing divides the sphere into two regions. The smaller of the two regions will be chosen.
  • LinearRing edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.

Constraints

  • The maximum number of points in the densified polygon is limited to 10485760.
  • Storing polygons in dynamic format has size limits.
  • Densifying a valid polygon may invalidate the polygon. The algorithm adds points in a non-uniform manner, and as such may cause edges to intertwine with each other.

Motivation

  • GeoJSON format defines an edge between two points as a straight cartesian line while geo_polygon_densify() uses geodesic.
  • The decision to use geodesic or planar edges might depend on the dataset and is especially relevant in long edges.

Returns

Densified polygon in the GeoJSON format and of a dynamic data type. If either the polygon or tolerance is invalid, the query produces a null result.

Examples

The following example densifies Manhattan Central Park polygon. The edges are short and the distance between planar edges and their geodesic counterparts is less than the distance specified by tolerance. As such, the result remains unchanged.

print densified_polygon = tostring(geo_polygon_densify(dynamic({"type":"Polygon","coordinates":[[[-73.958244,40.800719],[-73.949146,40.79695],[-73.973093,40.764226],[-73.982062,40.768159],[-73.958244,40.800719]]]})))

Output

densified_polygon
{“type”:“Polygon”,“coordinates”:[[[-73.958244,40.800719],[-73.949146,40.79695],[-73.973093,40.764226],[-73.982062,40.768159],[-73.958244,40.800719]]]}

The following example densifies two edges of the polygon. Densified edges length is ~110 km

print densified_polygon = tostring(geo_polygon_densify(dynamic({"type":"Polygon","coordinates":[[[10,10],[11,10],[11,11],[10,11],[10,10]]]})))

Output

densified_polygon
{“type”:“Polygon”,“coordinates”:[[[10,10],[10.25,10],[10.5,10],[10.75,10],[11,10],[11,11],[10.75,11],[10.5,11],[10.25,11],[10,11],[10,10]]]}

The following example returns a null result because of the invalid coordinate input.

print densified_polygon = geo_polygon_densify(dynamic({"type":"Polygon","coordinates":[[[10,900],[11,10],[11,11],[10,11],[10,10]]]}))

Output

densified_polygon

The following example returns a null result because of the invalid tolerance input.

print densified_polygon = geo_polygon_densify(dynamic({"type":"Polygon","coordinates":[[[10,10],[11,10],[11,11],[10,11],[10,10]]]}), 0)

Output

densified_polygon

6.38 - geo_polygon_perimeter()

Learn how to use the geo_polygon_perimeter() function to calculate the length of the boundary of a polygon or a multipolygon on Earth.

Calculates the length of the boundary of a polygon or a multipolygon on Earth.

Syntax

geo_polygon_perimeter(polygon)

Parameters

NameTypeRequiredDescription
polygondynamic✔️Polygon or multipolygon in the GeoJSON format.

Returns

The length of the boundary of polygon or a multipolygon, in meters, on Earth. If polygon or multipolygon are invalid, the query will produce a null result.

Polygon definition and constraints

dynamic({“type”: “Polygon”,“coordinates”: [ LinearRingShell, LinearRingHole_1, …, LinearRingHole_N ]})

dynamic({“type”: “MultiPolygon”,“coordinates”: [[ LinearRingShell, LinearRingHole_1, …, LinearRingHole_N ], …, [LinearRingShell, LinearRingHole_1, …, LinearRingHole_M]]})

  • LinearRingShell is required and defined as a counterclockwise ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be only one shell.
  • LinearRingHole is optional and defined as a clockwise ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be any number of interior rings and holes.
  • LinearRing vertices must be distinct with at least three coordinates. The first coordinate must be equal to the last. At least four entries are required.
  • Coordinates [longitude, latitude] must be valid. Longitude must be a real number in the range [-180, +180] and latitude must be a real number in the range [-90, +90].
  • LinearRingShell encloses at most half of the sphere. LinearRing divides the sphere into two regions. The smaller of the two regions will be chosen.
  • LinearRing edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.
  • LinearRings must not cross and must not share edges. LinearRings may share vertices.

Examples

The following example calculates the NYC Central Park perimeter, in meters.

let central_park = dynamic({"type":"Polygon","coordinates":[[[-73.9495,40.7969],[-73.95807266235352,40.80068603561921],[-73.98201942443848,40.76825672305777],[-73.97317886352539,40.76455136505513],[-73.9495,40.7969]]]});
print perimeter = geo_polygon_perimeter(central_park)

Output

perimeter
9930.30149604938

The following example performs union of polygons in multipolygon and calculates perimeter of the unified polygon.

let polygons = dynamic({"type":"MultiPolygon","coordinates":[[[[-73.9495,40.7969],[-73.95807266235352,40.80068603561921],[-73.98201942443848,40.76825672305777],[-73.97317886352539,40.76455136505513],[-73.9495,40.7969]]],[[[-73.94262313842773,40.775991804565585],[-73.98107528686523,40.791849155467695],[-73.99600982666016,40.77092185281977],[-73.96150588989258,40.75609977566361],[-73.94262313842773,40.775991804565585]]]]});
print perimeter = geo_polygon_perimeter(polygons)

Output

perimeter
15943.5384578745

The following example returns True because of the invalid polygon.

print is_invalid = isnull(geo_polygon_perimeter(dynamic({"type": "Polygon","coordinates": [[[0,0],[10,10],[10,10],[0,0]]]})))

Output

is_invalid
True

6.39 - geo_polygon_simplify()

Learn how to use the geo_polygon_simplify() function to simplify a polygon or a multipolygon.

Simplifies a polygon or a multipolygon by replacing nearly straight chains of short edges with a single long edge on Earth.

Syntax

geo_polygon_simplify(polygon, tolerance)

Parameters

NameTypeRequiredDescription
polygondynamic✔️Polygon or multipolygon in the GeoJSON format.
toleranceint, long, or realDefines maximum distance in meters between the original planar edge and the converted geodesic edge chain. Supported values are in the range [0.1, 10000]. If unspecified, the default value is 10.

Returns

Simplified polygon or a multipolygon in the GeoJSON format and of a dynamic data type, with no two vertices with distance less than tolerance. If either the polygon or tolerance is invalid, the query will produce a null result.

Polygon definition and constraints

dynamic({“type”: “Polygon”,“coordinates”: [ LinearRingShell, LinearRingHole_1, …, LinearRingHole_N ]})

dynamic({“type”: “MultiPolygon”,“coordinates”: [[ LinearRingShell, LinearRingHole_1, …, LinearRingHole_N ], …, [LinearRingShell, LinearRingHole_1, …, LinearRingHole_M]]})

  • LinearRingShell is required and defined as a counterclockwise ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be only one shell.
  • LinearRingHole is optional and defined as a clockwise ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be any number of interior rings and holes.
  • LinearRing vertices must be distinct with at least three coordinates. The first coordinate must be equal to the last. At least four entries are required.
  • Coordinates [longitude, latitude] must be valid. Longitude must be a real number in the range [-180, +180] and latitude must be a real number in the range [-90, +90].
  • LinearRingShell encloses at most half of the sphere. LinearRing divides the sphere into two regions. The smaller of the two regions will be chosen.
  • LinearRing edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.
  • LinearRings must not cross and must not share edges. LinearRings may share vertices.

Examples

The following example simplifies polygons by removing vertices that are within a 10-meter distance from each other.

let polygon = dynamic({"type":"Polygon","coordinates":[[[-73.94885122776031,40.79673476355657],[-73.94885927438736,40.79692258628347],[-73.94887939095497,40.79692055577034],[-73.9488673210144,40.79693476936093],[-73.94888743758202,40.79693476936093],[-73.9488834142685,40.796959135509105],[-73.94890084862709,40.79695304397289],[-73.94906312227248,40.79710736271788],[-73.94923612475395,40.7968708081794],[-73.94885122776031,40.79673476355657]]]});
print simplified = geo_polygon_simplify(polygon)

Output

simplified
{“type”: “Polygon”, “coordinates”: [[[-73.948851227760315, 40.796734763556572],[-73.949063122272477, 40.797107362717881],[-73.949236124753952, 40.7968708081794],[-73.948851227760315, 40.796734763556572]]]}

The following example simplifies polygons and combines results into GeoJSON geometry collection.

Polygons
| project polygon = features.geometry
| project simplified = geo_polygon_simplify(polygon, 1000)
| summarize lst = make_list(simplified)
| project geojson = bag_pack("type", "Feature","geometry", bag_pack("type", "GeometryCollection", "geometries", lst), "properties", bag_pack("name", "polygons"))

Output

geojson
{“type”: “Feature”, “geometry”: {“type”: “GeometryCollection”, “geometries”: [ … ]}, “properties”: {“name”: “polygons”}}

The following example simplifies polygons and unifies result

US_States
| project polygon = features.geometry
| project simplified = geo_polygon_simplify(polygon, 1000)
| summarize lst = make_list(simplified)
| project polygons = geo_union_polygons_array(lst)

Output

polygons
{“type”: “MultiPolygon”, “coordinates”: [ … ]}

The following example returns True because of the invalid polygon.

let polygon = dynamic({"type":"Polygon","coordinates":[[[5,48],[5,48]]]});
print is_invalid_polygon = isnull(geo_polygon_simplify(polygon))

Output

is_invalid_polygon
1

The following example returns True because of the invalid tolerance.

let polygon = dynamic({"type":"Polygon","coordinates":[[[5,48],[0,50],[0,47],[4,47],[5,48]]]});
print is_invalid_polygon = isnull(geo_polygon_simplify(polygon, -0.1))

Output

is_invalid_polygon
1

The following example returns True because high tolerance causes polygon to disappear.

let polygon = dynamic({"type":"Polygon","coordinates":[[[5,48],[0,50],[0,47],[4,47],[5,48]]]});
print is_invalid_polygon = isnull(geo_polygon_simplify(polygon, 1000000))

Output

is_invalid_polygon
1

6.40 - geo_polygon_to_h3cells()

Learn how to use the geo_polygon_to_h3cells() function to calculate H3 cells for a polygon

Converts polygon to H3 cells. This function is a useful geospatial join and visualization tool.

Syntax

geo_polygon_to_h3cells(polygon [, resolution[, radius]])

Parameters

NameTypeRequiredDescription
polygondynamic✔️Polygon or multipolygon in the GeoJSON format.
resolutionintDefines the requested cell resolution. Supported values are in the range [0, 15]. If unspecified, the default value 6 is used.
radiusrealBuffer radius in meters. If unspecified, the default value 0 is used.

Returns

Array of H3 cell token strings of the same resolution that represet a polygon or a multipolygon. If radius is set to a positive value, then the polygon will be enlarged such that all points within the given radius of the input polygon or multipolygon will be contained inside and the newly calculated polygon that will be converted to H3 cells. If polygon, resolution, radius is invalid, or the cell count exceeds the limit, the query will produce a null result.

Seel also geo_polygon_to_s2cells().

Examples

The following example calculates H3 cells that approximate the polygon.

let polygon = dynamic({"type":"Polygon","coordinates":[[[-3.659,40.553],[-3.913,40.409],[-3.729,40.273],[-3.524,40.440],[-3.659,40.553]]]});
print h3_cells = geo_polygon_to_h3cells(polygon)

Output

h3_cells
[“86390cb57ffffff”,“86390cb0fffffff”,“86390ca27ffffff”,“86390cb87ffffff”,“86390cb07ffffff”,“86390ca2fffffff”,“86390ca37ffffff”,“86390cb17ffffff”,“86390cb1fffffff”,“86390cb8fffffff”,“86390cba7ffffff”,“86390ca07ffffff”,“86390cbafffffff”]

The following example demonstrates a multipolygon that consists of H3 cells that approximate the above polygon. Specifing a higher resolution will improve polygon approximation.

let polygon = dynamic({"type":"Polygon","coordinates":[[[-3.659,40.553],[-3.913,40.409],[-3.729,40.273],[-3.524,40.440],[-3.659,40.553]]]});
print h3_cells = geo_polygon_to_h3cells(polygon)
| mv-expand cell = h3_cells to typeof(string) // extract cell to a separate row
| project polygon_cell = geo_h3cell_to_polygon(cell) // convert each cell to a polygon
| project individual_polygon_coordinates = pack_array(polygon_cell.coordinates)
| summarize multipolygon_coordinates = make_list(individual_polygon_coordinates)
| project multipolygon = bag_pack("type","MultiPolygon", "coordinates", multipolygon_coordinates)

Output

multipolygon
{“type”: “MultiPolygon”,
“coordinates”: [ … ]}

The following example return null because the polygon is invalid.

let polygon = dynamic({"type":"Polygon","coordinates":[[[0,0],[1,1]]]});
print is_null = isnull(geo_polygon_to_h3cells(polygon))

Output

is_null
True

6.41 - geo_polygon_to_s2cells()

Learn how to use the geo_polygon_to_s2cells() function to calculate S2 cell tokens that cover a polygon or a multipolygon on Earth.

Calculates S2 cell tokens that cover a polygon or multipolygon on Earth. This function is a useful geospatial join tool.

Read more about S2 cell hierarchy.

Syntax

geo_polygon_to_s2cells(polygon [, level[, radius]])

Parameters

NameTypeRequiredDescription
polygondynamic✔️Polygon or multipolygon in the GeoJSON format.
levelintDefines the requested cell level. Supported values are in the range [0, 30]. If unspecified, the default value 11 is used.
radiusrealBuffer radius in meters. If unspecified, the default value 0 is used.

Returns

Array of S2 cell token strings that cover a polygon or a multipolygon. If radius is set to a positive value, then the covering will be, in addition to input shape, of all points within the radius of the input geometry. If polygon, level, radius is invalid, or the cell count exceeds the limit, the query will produce a null result.

Motivation for covering polygons with S2 cell tokens

Without this function, here’s one approach we could take in order to classify coordinates into polygons containing these coordinates.

let Polygons = 
    datatable(description:string, polygon:dynamic)
    [  
      "New York",  dynamic({"type":"Polygon","coordinates":[[[-73.85009765625,40.85744791303121],[-74.16046142578125,40.84290487729676],[-74.190673828125,40.59935608796518],[-73.83087158203125,40.61812224225511],[-73.85009765625,40.85744791303121]]]}),
      "Seattle",   dynamic({"type":"Polygon","coordinates":[[[-122.200927734375,47.68573021131587],[-122.4591064453125,47.68573021131587],[-122.4755859375,47.468949677672484],[-122.17620849609374,47.47266286861342],[-122.200927734375,47.68573021131587]]]}),
      "Las Vegas", dynamic({"type":"Polygon","coordinates":[[[-114.9,36.36],[-115.4498291015625,36.33282808737917],[-115.4498291015625,35.84453450421662],[-114.949951171875,35.902399875143615],[-114.9,36.36]]]}),
    ];
let Coordinates = 
    datatable(longitude:real, latitude:real)
    [
      real(-73.95),  real(40.75), // New York
      real(-122.3),  real(47.6),  // Seattle
      real(-115.18), real(36.16)  // Las Vegas
    ];
Polygons | extend dummy=1
| join kind=inner (Coordinates | extend dummy=1) on dummy
| where geo_point_in_polygon(longitude, latitude, polygon)
| project longitude, latitude, description

Output

longitudelatitudedescription
-73.9540.75New York city
-122.347.6Seattle
-115.1836.16Las Vegas

While this method works in some cases, it’s inefficient. This method does a cross-join, meaning that it tries to match every polygon to every point. This process consumes a large amount of memory and compute resources. Instead, we would like to match every polygon to a point with a high probability of containment success, and filter out other points.

This match can be achieved by the following process:

  1. Converting polygons to S2 cells of level k,
  2. Converting points to the same S2 cells level k,
  3. Joining on S2 cells,
  4. Filtering by geo_point_in_polygon(). This phase can be omitted if some amount of false positives is ok. The maximum error will be the area of s2 cells at level k beyond the boundary of the polygon.

Choosing the S2 cell level

  • Ideally we would want to cover every polygon with one or just a few unique cells such that no two polygons share the same cell.
  • If the polygons are close to each other, choose the S2 cell level such that its cell edge will be smaller (4, 8, 12 times smaller) than the edge of the average polygon.
  • If the polygons are far from each other, choose the S2 cell level such that its cell edge will be similar or bigger than the edge of the average polygon.
  • In practice, covering a polygon with more than 10,000 cells might not yield good performance.
  • Sample use cases:
  • S2 cell level 5 might prove to be good for covering countries/regions.
  • S2 cell level 16 can cover dense and relatively small Manhattan (New York) neighborhoods.
  • S2 cell level 11 can be used for covering suburbs of Australia.
  • Query run time and memory consumption might differ greatly because of different S2 cell level values.

Examples

The following example classifies coordinates into polygons.

let Polygons = 
    datatable(description:string, polygon:dynamic)
    [
        'Greenwich Village', dynamic({"type":"Polygon","coordinates":[[[-73.991460000000131,40.731738000000206],[-73.992854491775518,40.730082566051351],[-73.996772,40.725432000000154],[-73.997634685522883,40.725786309886963],[-74.002855946639244,40.728346630056791],[-74.001413,40.731065000000207],[-73.996796995070824,40.73736378205173],[-73.991724524037934,40.735245208931886],[-73.990703782359589,40.734781896080477],[-73.991460000000131,40.731738000000206]]]}),
        'Upper West Side',   dynamic({"type":"Polygon","coordinates":[[[-73.958357552055688,40.800369095633819],[-73.98143901556422,40.768762584141953],[-73.981548752788598,40.7685590292784],[-73.981565335901905,40.768307084720796],[-73.981754418060945,40.768399727738668],[-73.982038573548124,40.768387823012056],[-73.982268248204349,40.768298621883247],[-73.982384797518051,40.768097213086911],[-73.982320919746599,40.767894461792181],[-73.982155532845766,40.767756204474757],[-73.98238873834039,40.767411004834273],[-73.993650353659021,40.772145571634361],[-73.99415893763998,40.772493009137818],[-73.993831082030937,40.772931787850908],[-73.993891252437052,40.772955194876722],[-73.993962585514595,40.772944653908901],[-73.99401262480508,40.772882846631894],[-73.994122058082397,40.77292405902601],[-73.994136652588594,40.772901870174394],[-73.994301342391154,40.772970028663913],[-73.994281535134448,40.77299380206933],[-73.994376552751078,40.77303955110149],[-73.994294029824005,40.773156243992048],[-73.995023275860802,40.773481196576356],[-73.99508939189289,40.773388475039134],[-73.995013963716758,40.773358035426909],[-73.995050284699261,40.773297153189958],[-73.996240651898916,40.773789791397689],[-73.996195837470992,40.773852356184044],[-73.996098807369748,40.773951805299085],[-73.996179459973888,40.773986954351571],[-73.996095245226442,40.774086186437756],[-73.995572265161172,40.773870731394297],[-73.994017424135961,40.77321375261053],[-73.993935876811335,40.773179512586211],[-73.993861942928888,40.773269531698837],[-73.993822393527211,40.773381758622882],[-73.993767019318497,40.773483981224835],[-73.993698463744295,40.773562141052594],[-73.993358326468751,40.773926888327956],[-73.992622663865575,40.774974056037109],[-73.992577842766124,40.774956016359418],[-73.992527743951555,40.775002110439829],[-73.992469745815342,40.775024159551755],[-73.992403837191887,40.775018140390664],[-73.99226708903538,40.775116033858794],[-73.99217809026365,40.775279293897171],[-73.992059084937338,40.775497598192516],[-73.992125372394938,40.775509075053385],[-73.992226867797001,40.775482211026116],[-73.992329346608813,40.775468900958522],[-73.992361756801131,40.775501899766638],[-73.992386042960277,40.775557180424634],[-73.992087684712729,40.775983970821372],[-73.990927174149746,40.777566878763238],[-73.99039616003671,40.777585065679204],[-73.989461267506471,40.778875124584417],[-73.989175778438053,40.779287524015778],[-73.988868617400072,40.779692922911607],[-73.988871874499793,40.779713738253008],[-73.989219022880576,40.779697895209402],[-73.98927785904425,40.779723439271038],[-73.989409054180143,40.779737706471963],[-73.989498614927044,40.779725044389757],[-73.989596493388234,40.779698146683387],[-73.989679812902509,40.779677568658038],[-73.989752702937935,40.779671244211556],[-73.989842247806507,40.779680752670664],[-73.990040102120489,40.779707677698219],[-73.990137977524839,40.779699769704784],[-73.99033584033225,40.779661794394983],[-73.990430598697046,40.779664973055503],[-73.990622199396725,40.779676064914298],[-73.990745069505479,40.779671328184051],[-73.990872114282197,40.779646007643876],[-73.990961672224358,40.779639683751753],[-73.991057472829539,40.779652352625774],[-73.991157429497036,40.779669775606465],[-73.991242817404469,40.779671367084504],[-73.991255318289745,40.779650782516491],[-73.991294887120119,40.779630209208889],[-73.991321967649895,40.779631796041372],[-73.991359455569423,40.779585883337383],[-73.991551059227476,40.779574821437407],[-73.99141982585985,40.779755280287233],[-73.988886144117032,40.779878898532999],[-73.988939656706265,40.779956178440393],[-73.988926103530844,40.780059292013632],[-73.988911680264692,40.780096037146606],[-73.988919261468567,40.780226094343945],[-73.988381050202634,40.780981074045783],[-73.988232413846987,40.781233144215555],[-73.988210420831663,40.781225482542055],[-73.988140000000143,40.781409000000224],[-73.988041288067166,40.781585961353777],[-73.98810029382463,40.781602878305286],[-73.988076449145055,40.781650935001608],[-73.988018059972219,40.781634188810422],[-73.987960792842145,40.781770987031535],[-73.985465811970457,40.785360700575431],[-73.986172704965611,40.786068452258647],[-73.986455862401996,40.785919219081421],[-73.987072345615601,40.785189638820121],[-73.98711901394276,40.785210319004058],[-73.986497781023601,40.785951202887254],[-73.986164628806279,40.786121882448327],[-73.986128422486075,40.786239001331111],[-73.986071135219746,40.786240706026611],[-73.986027274789123,40.786228964236727],[-73.986097637849426,40.78605822569795],[-73.985429321269592,40.785413942184597],[-73.985081137732209,40.785921935110366],[-73.985198833254501,40.785966552197777],[-73.985170502389906,40.78601333415817],[-73.985216218673656,40.786030501816427],[-73.98525509797993,40.785976205511588],[-73.98524273937646,40.785972572653328],[-73.98524962933017,40.785963139855845],[-73.985281779186749,40.785978620950075],[-73.985240032884533,40.786035858136792],[-73.985683885242182,40.786222123919686],[-73.985717529004575,40.786175994668795],[-73.985765660297687,40.786196274858618],[-73.985682871922691,40.786309786213067],[-73.985636270930442,40.786290150649279],[-73.985670722564691,40.786242911993817],[-73.98520511880038,40.786047669212785],[-73.985211035607492,40.786039554883686],[-73.985162639946992,40.786020999769754],[-73.985131636312062,40.786060297019972],[-73.985016964065125,40.78601423719563],[-73.984655078830457,40.786534741807841],[-73.985743787901043,40.786570082854738],[-73.98589227228328,40.786426529019593],[-73.985942854994988,40.786452847880334],[-73.985949561556794,40.78648711396653],[-73.985812373526713,40.786616865357047],[-73.985135209703174,40.78658761889551],[-73.984619428584324,40.786586016349787],[-73.981952458164173,40.790393724337193],[-73.972823037363767,40.803428052816756],[-73.971036786332192,40.805918478839672],[-73.966701,40.804169000000186],[-73.959647,40.801156000000113],[-73.958508540159471,40.800682279767472],[-73.95853274080838,40.800491362464697],[-73.958357552055688,40.800369095633819]]]}),
        'Upper East Side',   dynamic({"type":"Polygon","coordinates":[[[-73.943592454622546,40.782747908206574],[-73.943648235390199,40.782656161333449],[-73.943870759887162,40.781273026571704],[-73.94345932494096,40.780048275653243],[-73.943213862652243,40.779317588660199],[-73.943004239504688,40.779639495474292],[-73.942716005450905,40.779544169476175],[-73.942712374762181,40.779214856940001],[-73.942535563208608,40.779090956062532],[-73.942893408188027,40.778614093246276],[-73.942438481745029,40.777315235766039],[-73.942244919522594,40.777104088947254],[-73.942074188038887,40.776917846977142],[-73.942002667222781,40.776185317382648],[-73.942620205199006,40.775180871576474],[-73.94285645694552,40.774796600349191],[-73.94293043781397,40.774676268036011],[-73.945870899588215,40.771692257932997],[-73.946618690150586,40.77093339256956],[-73.948664164778933,40.768857624399587],[-73.950069793030679,40.767025088383498],[-73.954418260786071,40.762184104951245],[-73.95650786241211,40.760285256574043],[-73.958787773424007,40.758213471309809],[-73.973015157270069,40.764278692864671],[-73.955760332998182,40.787906554459667],[-73.944023,40.782960000000301],[-73.943592454622546,40.782747908206574]]]}),
    ];
let Coordinates = 
    datatable(longitude:real, latitude:real)
    [
        real(-73.9741), 40.7914, // Upper West Side
        real(-73.9950), 40.7340, // Greenwich Village
        real(-73.9584), 40.7688, // Upper East Side
    ];
let Level = 16;
Polygons
| extend covering = geo_polygon_to_s2cells(polygon, Level) // cover every polygon with s2 cell token array
| mv-expand covering to typeof(string)                     // expand cells array such that every row will have one cell mapped to its polygon
| join kind=inner hint.strategy=broadcast                  // assume that Polygons count is small (In some specific case)
(
    Coordinates
    | extend covering = geo_point_to_s2cell(longitude, latitude, Level) // cover point with cell
) on covering // join on the cell, this filters out rows of point and polygons where the point definitely does not belong to the polygon
| where geo_point_in_polygon(longitude, latitude, polygon) // final filtering for exact result
| project longitude, latitude, description

Output

longitudelatitudedescription
-73.974140.7914Upper West Side
-73.99540.734Greenwich Village
-73.958440.7688Upper East Side

Here is even more improvement on the above query. Count storm events per US state. The below query performs a very efficient join because it doesn’t carry polygons through the join and uses lookup operator

" target="_blank">Run the query

let Level = 6;
let polygons = materialize(
    US_States
    | project StateName = tostring(features.properties.NAME), polygon = features.geometry, id = new_guid());
let tmp = 
    polygons
    | project id, covering = geo_polygon_to_s2cells(polygon, Level) 
    | mv-expand covering to typeof(string)
    | join kind=inner hint.strategy=broadcast
            (
                StormEvents
                | project lng = BeginLon, lat = BeginLat
                | project lng, lat, covering = geo_point_to_s2cell(lng, lat, Level)
            ) on covering
    | project-away covering, covering1;
tmp | lookup polygons on id
| project-away id
| where geo_point_in_polygon(lng, lat, polygon)
| summarize StormEventsCountByState = count() by StateName

Output

StateNameStormEventsCountByState
Florida960
Georgia1085

The following example filters out polygons that don’t intersect with the area of the polygon of interest. The maximum error is diagonal of s2cell length. This example is based on a polygonized earth at night raster file.

let intersection_level_hint = 7;
let area_of_interest = dynamic({"type": "Polygon","coordinates": [[[-73.94966125488281,40.79698248639272],[-73.95841598510742,40.800426144169315],[-73.98124694824219,40.76806170936614],[-73.97283554077148,40.7645513650551],[-73.94966125488281,40.79698248639272]]]});
let area_of_interest_covering = geo_polygon_to_s2cells(area_of_interest, intersection_level_hint);
EarthAtNight
| project value = features.properties.DN, polygon = features.geometry
| extend covering = geo_polygon_to_s2cells(polygon, intersection_level_hint)
| mv-apply c = covering to typeof(string) on
(
    summarize is_intersects = take_anyif(1, array_index_of(area_of_interest_covering, c) != -1)
)
| where is_intersects == 1
| count

Output

Count
83

Count of cells that will be needed in order to cover some polygon with S2 cells of level 5.

let polygon = dynamic({"type":"Polygon","coordinates":[[[0,0],[0,50],[100,50],[0,0]]]});
print s2_cell_token_count = array_length(geo_polygon_to_s2cells(polygon, 5));

Output

s2_cell_token_count
286

Covering a large-area polygon with small-area cells returns null.

let polygon = dynamic({"type":"Polygon","coordinates":[[[0,0],[0,50],[100,50],[0,0]]]});
print geo_polygon_to_s2cells(polygon, 30);

Output

print_0

Covering a large-area polygon with small-area cells returns null.

let polygon = dynamic({"type":"Polygon","coordinates":[[[0,0],[0,50],[100,50],[0,0]]]});
print isnull(geo_polygon_to_s2cells(polygon, 30));

Output

print_0
1

6.42 - geo_s2cell_neighbors()

Learn how to use the geo_s2cell_neighbors() function to calculate S2 cell neighbors.

Calculates S2 cell neighbors.

Read more about S2 cell hierarchy.

Syntax

geo_s2cell_neighbors(s2cell)

Parameters

NameTypeRequiredDescription
s2cellstring✔️S2 cell token value as it was calculated by geo_point_to_s2cell(). The S2 cell token maximum string length is 16 characters.

Returns

An array of S2 cell neighbors. If the S2 Cell is invalid, the query produces a null result.

Examples

The following example calculates S2 cell neighbors.

print neighbors = geo_s2cell_neighbors('89c259')

Output

neighbors
[“89c25d”,“89c2f9”,“89c251”,“89c257”,“89c25f”,“89c25b”,“89c2f7”,“89c2f5”]

The following example calculates an array of input S2 cell with its neighbors.

let s2cell = '89c259';
print cells = array_concat(pack_array(s2cell), geo_s2cell_neighbors(s2cell))

Output

cells
[“89c259”,“89c25d”,“89c2f9”,“89c251”,“89c257”,“89c25f”,“89c25b”,“89c2f7”,“89c2f5”]

The following example calculates S2 cells polygons GeoJSON geometry collection.

let s2cell = '89c259';
print cells = array_concat(pack_array(s2cell), geo_s2cell_neighbors(s2cell))
| mv-expand cells to typeof(string)
| project polygons = geo_s2cell_to_polygon(cells)
| summarize arr = make_list(polygons)
| project geojson = bag_pack("type", "Feature","geometry", bag_pack("type", "GeometryCollection", "geometries", arr), "properties", bag_pack("name", "polygons"))

Output

geojson
{“type”: “Feature”,“geometry”: {“type”: “GeometryCollection”,“geometries”: [
{“type”: “Polygon”,“coordinates”: [[[ -74.030012249838478, 40.8012684339439],[ -74.030012249838478, 40.7222262918358],[ -73.935982114337421, 40.708880489804564],[ -73.935982114337421, 40.787917134506841],[ -74.030012249838478, 40.8012684339439]]]},
{“type”: “Polygon”,“coordinates”: [[[ -73.935982114337421, 40.708880489804564],[ -73.935982114337421, 40.629736433321796],[ -73.841906340776248, 40.616308079144915],[ -73.841906340776248, 40.695446474556284],[ -73.935982114337421, 40.708880489804564]]]},
{“type”: “Polygon”,“coordinates”: [[[ -74.1239959854733, 40.893471289549765],[ -74.1239959854733, 40.814531536204242],[ -74.030012249838478, 40.8012684339439],[ -74.030012249838478, 40.880202851376716],[ -74.1239959854733, 40.893471289549765]]]},
{“type”: “Polygon”,“coordinates”: [[[ -74.1239959854733, 40.735483949993387],[ -74.1239959854733, 40.656328734184143],[ -74.030012249838478, 40.643076628676461],[ -74.030012249838478, 40.7222262918358],[ -74.1239959854733, 40.735483949993387]]]},
{“type”: “Polygon”,“coordinates”: [[[ -74.1239959854733, 40.814531536204242],[ -74.1239959854733, 40.735483949993387],[ -74.030012249838478, 40.7222262918358],[ -74.030012249838478, 40.8012684339439],[ -74.1239959854733, 40.814531536204242]]]},
{“type”: “Polygon”,“coordinates”: [[[ -73.935982114337421, 40.787917134506841],[ -73.935982114337421, 40.708880489804564],[ -73.841906340776248, 40.695446474556284],[ -73.841906340776248, 40.774477568182071],[ -73.935982114337421, 40.787917134506841]]]},
{“type”: “Polygon”,“coordinates”: [[[ -74.030012249838478, 40.7222262918358],[ -74.030012249838478, 40.643076628676461],[ -73.935982114337421, 40.629736433321796],[ -73.935982114337421, 40.708880489804564],[ -74.030012249838478, 40.7222262918358]]]},
{“type”: “Polygon”,“coordinates”: [[[ -74.030012249838478, 40.880202851376716],[ -74.030012249838478, 40.8012684339439],[ -73.935982114337421, 40.787917134506841],[ -73.935982114337421, 40.866846163445771],[ -74.030012249838478, 40.880202851376716]]]},
{“type”: “Polygon”,“coordinates”: [[[ -73.935982114337421, 40.866846163445771],[ -73.935982114337421, 40.787917134506841],[ -73.841906340776248, 40.774477568182071],[ -73.841906340776248, 40.853401155678846],[ -73.935982114337421, 40.866846163445771]]]}]},
“properties”: {“name”: “polygons”}}

The following example calculates polygon unions that represent S2 cell and its neighbors.

let s2cell = '89c259';
print cells = array_concat(pack_array(s2cell), geo_s2cell_neighbors(s2cell))
| mv-expand cells to typeof(string)
| project polygons = geo_s2cell_to_polygon(cells)
| summarize arr = make_list(polygons)
| project polygon = geo_union_polygons_array(arr)

Output

polygon
{“type”: “Polygon”,“coordinates”: [[[-73.841906340776248,40.695446474556284],[-73.841906340776248,40.774477568182071],[-73.841906340776248,40.853401155678846],[-73.935982114337421,40.866846163445771],[-74.030012249838478,40.880202851376716],[-74.1239959854733,40.893471289549758],[-74.1239959854733,40.814531536204242],[-74.1239959854733,40.735483949993387],[-74.1239959854733,40.656328734184143],[-74.030012249838478,40.643076628676461],[-73.935982114337421,40.629736433321796],[-73.841906340776248,40.616308079144915],[-73.841906340776248,40.695446474556284]]]}

The following example returns true because of the invalid S2 Cell token input.

print invalid = isnull(geo_s2cell_neighbors('a'))

Output

invalid
1

6.43 - geo_s2cell_to_central_point()

Learn how to use the geo_s2cell_to_central_point() function to calculate the geospatial coordinates that represent the center of an S2 cell.

Calculates the geospatial coordinates that represent the center of an S2 cell.

Read more about S2 cell hierarchy.

Syntax

geo_s2cell_to_central_point(s2cell)

Parameters

NameTypeRequiredDescription
s2cellstring✔️S2 cell token value as it was calculated by geo_point_to_s2cell(). The S2 cell token maximum string length is 16 characters.

Returns

The geospatial coordinate values in GeoJSON Format and of a dynamic data type. If the S2 cell token is invalid, the query will produce a null result.

Examples

print point = geo_s2cell_to_central_point("1234567")
| extend coordinates = point.coordinates
| extend longitude = coordinates[0], latitude = coordinates[1]

Output

pointcoordinateslongitudelatitude
{
“type”: “Point”,
“coordinates”: [
9.86830731850408,
27.468392925827604
]
}
[
9.86830731850408,
27.468392925827604
]
9.8683073185040827.4683929258276

The following example returns a null result because of the invalid S2 cell token input.

print point = geo_s2cell_to_central_point("a")

Output

point

6.44 - geo_s2cell_to_polygon()

Learn how to use the geo_s2cell_to_polygon() function to calculate the polygon that represents the S2 Cell rectangular area.

Calculates the polygon that represents the S2 Cell rectangular area.

Read more about S2 Cells.

Syntax

geo_s2cell_to_polygon(s2cell)

Parameters

NameTypeRequiredDescription
s2cellstring✔️S2 cell token value as it was calculated by geo_point_to_s2cell(). The S2 cell token maximum string length is 16 characters.

Returns

Polygon in GeoJSON Format and of a dynamic data type. If the s2cell is invalid, the query produces a null result.

Examples

print s2cellPolygon = geo_s2cell_to_polygon("89c259")

Output

s2cellPolygon
{
“type”: “Polygon”,
“coordinates”: [[[-74.030012249838478, 40.8012684339439], [-74.030012249838478, 40.7222262918358], [-73.935982114337421, 40.708880489804564], [-73.935982114337421, 40.787917134506841], [-74.030012249838478, 40.8012684339439]]]
}

The following example assembles GeoJSON geometry collection of S2 Cell polygons.

datatable(lng:real, lat:real)
[
    -73.956683, 40.807907,
    -73.916869, 40.818314,
    -73.989148, 40.743273,
]
| project s2_hash = geo_point_to_s2cell(lng, lat, 10)
| project s2_hash_polygon = geo_s2cell_to_polygon(s2_hash)
| summarize s2_hash_polygon_lst = make_list(s2_hash_polygon)
| project bag_pack(
    "type", "Feature",
    "geometry", bag_pack("type", "GeometryCollection", "geometries", s2_hash_polygon_lst),
    "properties", bag_pack("name", "S2 Cell polygons collection"))

Output

Column1
{
“type”: “Feature”,
“geometry”: {“type”: “GeometryCollection”, “geometries”: [
{“type”: “Polygon”, “coordinates”: [[[-74.030012249838478, 40.880202851376716], [-74.030012249838478, 40.8012684339439], [-73.935982114337421, 40.787917134506841], [-73.935982114337421, 40.866846163445771], [-74.030012249838478, 40.880202851376716]]]},
{“type”: “Polygon”, “coordinates”: [[[-73.935982114337421, 40.866846163445771], [-73.935982114337421, 40.787917134506841], [-73.841906340776248, 40.774477568182071], [-73.841906340776248, 40.853401155678846], [-73.935982114337421, 40.866846163445771]]]},
{“type”: “Polygon”, “coordinates”: [[[-74.030012249838478, 40.8012684339439], [-74.030012249838478, 40.7222262918358], [-73.935982114337421, 40.708880489804564], [-73.935982114337421, 40.787917134506841], [-74.030012249838478, 40.8012684339439]]]}]
},
“properties”: {“name”: “S2 Cell polygons collection”}
}

The following example returns a null result because of the invalid s2cell token input.

print s2cellPolygon = geo_s2cell_to_polygon("a")

Output

s2cellPolygon

6.45 - geo_simplify_polygons_array()

Learn how to use the geo_simplify_polygons_array() function to simplify polygons by replacing nearly straight chains of short edges with a single long edge on Earth.

Simplifies polygons by replacing nearly straight chains of short edges with a single long edge on Earth.

Syntax

geo_simplify_polygons_array(polygons, tolerance)

Parameters

NameTypeRequiredDescription
polygondynamic✔️Polygon or multipolygon in the GeoJSON format.
toleranceint, long, or realDefines minimum distance in meters between any two vertices. Supported values are in the range [0, ~7,800,000 meters]. If unspecified, the default value 10 is used.

Returns

Simplified polygon or a multipolygon in the GeoJSON format and of a dynamic data type, with no two vertices with distance less than tolerance. If either the polygon or tolerance is invalid, the query will produce a null result.

Polygon definition and constraints

dynamic({“type”: “Polygon”,“coordinates”: [ LinearRingShell, LinearRingHole_1, …, LinearRingHole_N ]})

dynamic({“type”: “MultiPolygon”,“coordinates”: [[ LinearRingShell, LinearRingHole_1, …, LinearRingHole_N ], …, [LinearRingShell, LinearRingHole_1, …, LinearRingHole_M]]})

  • LinearRingShell is required and defined as a counterclockwise ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be only one shell.
  • LinearRingHole is optional and defined as a clockwise ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be any number of interior rings and holes.
  • LinearRing vertices must be distinct with at least three coordinates. The first coordinate must be equal to the last. At least four entries are required.
  • Coordinates [longitude, latitude] must be valid. Longitude must be a real number in the range [-180, +180] and latitude must be a real number in the range [-90, +90].
  • LinearRingShell encloses at most half of the sphere. LinearRing divides the sphere into two regions. The smaller of the two regions will be chosen.
  • LinearRing edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.
  • LinearRings must not cross and must not share edges. LinearRings may share vertices.

Examples

The following example simplifies polygons with mutual borders (USA states), by removing vertices that are within a 100-meter distance from each other.

US_States
| project polygon = features.geometry
| summarize lst = make_list(polygon)
| project polygons = geo_simplify_polygons_array(lst, 100)

Output

polygons
{ “type”: “MultiPolygon”, “coordinates”: [ … ]]}

The following example returns True because one of the polygons is invalid.

datatable(polygons:dynamic)
[
    dynamic({"type":"Polygon","coordinates":[[[-73.9495,40.7969],[-73.95807,40.80068],[-73.98201,40.76825],[-73.97317,40.76455],[-73.9495,40.7969]]]}),
    dynamic({"type":"Polygon","coordinates":[[[-73.94622,40.79249]]]}),
    dynamic({"type":"Polygon","coordinates":[[[-73.97335,40.77274],[-73.9936,40.76630],[-73.97171,40.75655],[-73.97335,40.77274]]]})
]
| summarize arr = make_list(polygons)
| project is_invalid_polygon = isnull(geo_simplify_polygons_array(arr))

Output

is_invalid_polygon
1

The following example returns True because of the invalid tolerance.

datatable(polygons:dynamic)
[
    dynamic({"type":"Polygon","coordinates":[[[-73.9495,40.7969],[-73.95807,40.80068],[-73.98201,40.76825],[-73.97317,40.76455],[-73.9495,40.7969]]]}),
    dynamic({"type":"Polygon","coordinates":[[[-73.94622,40.79249],[-73.96888,40.79282],[-73.9577,40.7789],[-73.94622,40.79249]]]}),
    dynamic({"type":"Polygon","coordinates":[[[-73.97335,40.77274],[-73.9936,40.76630],[-73.97171,40.75655],[-73.97335,40.77274]]]})
]
| summarize arr = make_list(polygons)
| project is_null = isnull(geo_simplify_polygons_array(arr, -1))

Output

is_null
1

The following example returns True because high tolerance causes polygon to disappear.

datatable(polygons:dynamic)
[
    dynamic({"type":"Polygon","coordinates":[[[-73.9495,40.7969],[-73.95807,40.80068],[-73.98201,40.76825],[-73.97317,40.76455],[-73.9495,40.7969]]]}),
    dynamic({"type":"Polygon","coordinates":[[[-73.94622,40.79249],[-73.96888,40.79282],[-73.9577,40.7789],[-73.94622,40.79249]]]}),
    dynamic({"type":"Polygon","coordinates":[[[-73.97335,40.77274],[-73.9936,40.76630],[-73.97171,40.75655],[-73.97335,40.77274]]]})
]
| summarize arr = make_list(polygons)
| project is_null = isnull(geo_simplify_polygons_array(arr, 10000))

Output

is_null
1

6.46 - geo_union_lines_array()

Learn how to use the geo_union_lines_array() function to calculate the union of line strings or multiline strings on Earth.

Calculates the union of lines or multilines on Earth.

Syntax

geo_union_lines_array(lineStrings)

Parameters

NameTypeRequiredDescription
lineStringsdynamic✔️An array of lines or multilines in the GeoJSON format.

Returns

A line or a multiline in GeoJSON Format and of a dynamic data type. If any of the provided lines or multilines is invalid, the query will produce a null result.

LineString definition and constraints

dynamic({“type”: “LineString”,“coordinates”: [[lng_1,lat_1], [lng_2,lat_2], …, [lng_N,lat_N]]})

dynamic({“type”: “MultiLineString”,“coordinates”: [[line_1, line_2, …, line_N]]})

  • LineString coordinates array must contain at least two entries.
  • Coordinates [longitude, latitude] must be valid where longitude is a real number in the range [-180, +180] and latitude is a real number in the range [-90, +90].
  • Edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.

Examples

The following example performs geospatial union on line rows.

datatable(lines:dynamic)
[
    dynamic({"type":"LineString","coordinates":[[-73.95683884620665,40.80502891480884],[-73.95633727312088,40.8057171711177],[-73.95489156246185,40.80510200431311]]}),
    dynamic({"type":"LineString","coordinates":[[-73.95633727312088,40.8057171711177],[-73.95489156246185,40.80510200431311],[-73.95537436008453,40.804413741624515]]}),
    dynamic({"type":"LineString","coordinates":[[-73.95633727312088,40.8057171711177],[-73.95489156246185,40.80510200431311]]})
]
| summarize lines_arr = make_list(lines)
| project lines_union = geo_union_lines_array(lines_arr)

Output

lines_union
{“type”: “LineString”, “coordinates”: [[-73.956838846206651, 40.805028914808844], [-73.95633727312088, 40.8057171711177], [ -73.954891562461853, 40.80510200431312], [-73.955374360084534, 40.804413741624522]]}

The following example performs geospatial union on line columns.

datatable(line1:dynamic, line2:dynamic)
[
    dynamic({"type":"LineString","coordinates":[[-73.95683884620665,40.80502891480884],[-73.95633727312088,40.8057171711177],[-73.95489156246185,40.80510200431311]]}), dynamic({"type":"LineString","coordinates":[[-73.95633727312088,40.8057171711177],[-73.95489156246185,40.80510200431311],[-73.95537436008453,40.804413741624515]]})
]
| project lines_arr = pack_array(line1, line2)
| project lines_union = geo_union_lines_array(lines_arr)

Output

lines_union
{“type”: “LineString”, “coordinates”:[[-73.956838846206651, 40.805028914808844], [-73.95633727312088, 40.8057171711177], [-73.954891562461853, 40.80510200431312], [-73.955374360084534, 40.804413741624522]]}

The following example returns True because one of the lines is invalid.

datatable(lines:dynamic)
[
    dynamic({"type":"LineString","coordinates":[[-73.95683884620665,40.80502891480884],[-73.95633727312088,40.8057171711177],[-73.95489156246185,40.80510200431311]]}),
    dynamic({"type":"LineString","coordinates":[[1, 1]]})
]
| summarize lines_arr = make_list(lines)
| project invalid_union = isnull(geo_union_lines_array(lines_arr))

Output

invalid_union
True

6.47 - geo_union_polygons_array()

Learn how to use the geo_union_polygons_array() function to calculate the union of polygons or multipolygons on Earth.

Calculates the union of polygons or multipolygons on Earth.

Syntax

geo_union_polygons_array(polygons)

Parameters

NameTypeRequiredDescription
polygonsdynamic✔️An array of polygons or multipolygons in the GeoJSON format.

Returns

A polygon or a multipolygon in GeoJSON Format and of a dynamic data type. If any of the provided polygons or multipolygons is invalid, the query will produce a null result.

Polygon definition and constraints

dynamic({“type”: “Polygon”,“coordinates”: [ LinearRingShell, LinearRingHole_1, …, LinearRingHole_N ]})

dynamic({“type”: “MultiPolygon”,“coordinates”: [[ LinearRingShell, LinearRingHole_1, …, LinearRingHole_N], …, [LinearRingShell, LinearRingHole_1, …, LinearRingHole_M]]})

  • LinearRingShell is required and defined as a counterclockwise ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be only one shell.
  • LinearRingHole is optional and defined as a clockwise ordered array of coordinates [[lng_1,lat_1],…,[lng_i,lat_i],…,[lng_j,lat_j],…,[lng_1,lat_1]]. There can be any number of interior rings and holes.
  • LinearRing vertices must be distinct with at least three coordinates. The first coordinate must be equal to the last. At least four entries are required.
  • Coordinates [longitude, latitude] must be valid. Longitude must be a real number in the range [-180, +180] and latitude must be a real number in the range [-90, +90].
  • LinearRingShell encloses at most half of the sphere. LinearRing divides the sphere into two regions. The smaller of the two regions will be chosen.
  • LinearRing edge length must be less than 180 degrees. The shortest edge between the two vertices will be chosen.
  • LinearRings must not cross and must not share edges. LinearRings may share vertices.

Examples

The following example performs geospatial union on polygon rows.

datatable(polygons:dynamic)
[
    dynamic({"type":"Polygon","coordinates":[[[-73.9495,40.7969],[-73.95807,40.80068],[-73.98201,40.76825],[-73.97317,40.76455],[-73.9495,40.7969]]]}),
    dynamic({"type":"Polygon","coordinates":[[[-73.94622,40.79249],[-73.96888,40.79282],[-73.9577,40.7789],[-73.94622,40.79249]]]}),
    dynamic({"type":"Polygon","coordinates":[[[-73.97335,40.77274],[-73.9936,40.76630],[-73.97171,40.75655],[-73.97335,40.77274]]]})
]
| summarize polygons_arr = make_list(polygons)
| project polygons_union = geo_union_polygons_array(polygons_arr)

Output

polygons_union
{“type”:“Polygon”,“coordinates”:[[[-73.972599326729608,40.765330371902991],[-73.960302383706178,40.782140794645024],[-73.9577,40.7789],[-73.94622,40.79249],[-73.9526593223173,40.792584227716468],[-73.9495,40.7969],[-73.95807,40.80068],[-73.9639277517478,40.792748258673875],[-73.96888,40.792819999999992],[-73.9662719791645,40.7895734224338],[-73.9803360309571,40.770518810606404],[-73.9936,40.7663],[-73.97171,40.756550000000004],[-73.972599326729608,40.765330371902991]]]}

The following example performs geospatial union on polygon columns.

datatable(polygon1:dynamic, polygon2:dynamic)
[
    dynamic({"type":"Polygon","coordinates":[[[-73.9495,40.7969],[-73.95807,40.80068],[-73.98201,40.76825],[-73.97317,40.76455],[-73.9495,40.7969]]]}), dynamic({"type":"Polygon","coordinates":[[[-73.94622,40.79249],[-73.96888,40.79282],[-73.9577,40.7789],[-73.94622,40.79249]]]})
]
| project polygons_arr = pack_array(polygon1, polygon2)
| project polygons_union = geo_union_polygons_array(polygons_arr)

Output

polygons_union
{“type”:“Polygon”,“coordinates”:[[[-73.9495,40.7969],[-73.95807,40.80068],[-73.9639277517478,40.792748258673875],[-73.96888,40.792819999999992],[-73.9662719791645,40.7895734224338],[-73.98201,40.76825],[-73.97317,40.76455],[-73.960302383706178,40.782140794645024],[-73.9577,40.7789],[-73.94622,40.79249],[-73.9526593223173,40.792584227716468],[-73.9495,40.7969]]]}

The following example returns True because one of the polygons is invalid.

datatable(polygons:dynamic)
[
    dynamic({"type":"Polygon","coordinates":[[[-73.9495,40.7969],[-73.95807,40.80068],[-73.98201,40.76825],[-73.97317,40.76455],[-73.9495,40.7969]]]}),
    dynamic({"type":"Polygon","coordinates":[[[-73.94622,40.79249]]]})
]
| summarize polygons_arr = make_list(polygons)
| project invalid_union = isnull(geo_union_polygons_array(polygons_arr))

Output

invalid_union
True

6.48 - Geospatial data visualizations

Learn how to visualize geospatial data.

Geospatial data can be visualized as part of your query using the render operator as points, pies, or bubbles on a map.

Visualize points on a map

You can visualize points either using [Longitude, Latitude] columns, or GeoJSON column. Using a series column is optional. The [Longitude, Latitude] pair defines each point, in that order.

Example: Visualize points on a map

The following example finds storm events and visualizes 100 on a map.

StormEvents
| take 100
| project BeginLon, BeginLat
| render scatterchart with (kind = map)

Screenshot of sample storm events on a map.

Example: Visualize multiple series of points on a map

The following example visualizes multiple series of points, where the [Longitude, Latitude] pair defines each point, and a third column defines the series. In this example, the series is EventType.

StormEvents
| take 100
| project BeginLon, BeginLat, EventType
| render scatterchart with (kind = map)

Screenshot of sample storm series events on a map.

Example: Visualize series of points on data with multiple columns

The following example visualizes a series of points on a map. If you have multiple columns in the result, you must specify the columns to be used for xcolumn (Longitude), ycolumn (Latitude), and series.

StormEvents
| take 100
| render scatterchart with (kind = map, xcolumn = BeginLon, ycolumns = BeginLat, series = EventType)

Screenshot of sample storm series events using arguments.

Example: Visualize points on a map defined by GeoJSON dynamic values

The following example visualizes points on the map using GeoJSON dynamic values to define the points.

StormEvents
| project BeginLon, BeginLat
| summarize by hash=geo_point_to_s2cell(BeginLon, BeginLat, 5)
| project geo_s2cell_to_central_point(hash)
| render scatterchart with (kind = map)

Screenshot of sample storm GeoJSON events.

Visualization of pies or bubbles on a map

You can visualize pies or bubbles either using [Longitude, Latitude] columns, or GeoJSON column. These visualizations can be created with color or numeric axes.

Example: Visualize pie charts by location

The following example shows storm events aggregated by S2 cells. The chart aggregates events in bubbles by location in one color.

StormEvents
| project BeginLon, BeginLat, EventType
| where geo_point_in_circle(BeginLon, BeginLat, real(-81.3891), 28.5346, 1000 * 100)
| summarize count() by EventType, hash = geo_point_to_s2cell(BeginLon, BeginLat)
| project geo_s2cell_to_central_point(hash), count_
| extend Events = "count"
| render piechart with (kind = map)

Screenshot of storm events on a bubble map.

Example: Visualize bubbles using a color axis

The following example shows storm events aggregated by S2 cells. The chart aggregates events by event type in pie charts by location.

StormEvents
| project BeginLon, BeginLat, EventType
| where geo_point_in_circle(BeginLon, BeginLat, real(-81.3891), 28.5346, 1000 * 100)
| summarize count() by EventType, hash = geo_point_to_s2cell(BeginLon, BeginLat)
| project geo_s2cell_to_central_point(hash), EventType, count_
| render piechart with (kind = map)

Screenshot of storm events on a pie map in Kusto.Explorer.

6.49 - Geospatial grid system

Learn how to use geospatial grid systems to cluster geospatial data.

Geospatial data can be analyzed efficiently using grid systems to create geospatial clusters. You can use geospatial tools to aggregate, cluster, partition, reduce, join, and index geospatial data. These tools improve query runtime performance, reduce stored data size, and visualize aggregated geospatial data.

The following methods of geospatial clustering are supported:

The core functionalities of these methods are:

  • Calculate hash\index\cell token of geospatial coordinate. Different geospatial coordinates that belong to same cell will have same cell token value.
  • Calculate center point of hash\index\cell token. This point is useful because it may represent all the values in the cell.
  • Calculate cell polygon. Calculating cell polygons is useful in cell visualization or other calculations, for example, distance, or point in polygon checks.

Compare methods

CriteriaGeohashS2 CellH3 Cell
Levels of hierarchy183116
Cell shapeRectangleRectangleHexagon
Cell edgesstraightgeodesicstraight
Projection systemNone. Encodes latitude and longitude.Cube face centered quadratic transform.Icosahedron face centered gnomonic.
Neighbors count886
Noticeable featureCommon prefixes indicate points proximity.31 hierarchy levels.Cell shape is hexagonal.
PerformanceSuperbSuperbFast
Cover polygon with cellsNot supportedSupportedNot supported
Cell parentNot supportedNot SupportedSupported
Cell childrenNot supportedNot SupportedSupported
Cell ringsNot supportedNot SupportedSupported

Geohash functions

Function Name
geo_point_to_geohash()
geo_geohash_to_central_point()
geo_geohash_neighbors()
geo_geohash_to_polygon()

S2 Cell functions

Function Name
geo_point_to_s2cell()
geo_s2cell_to_central_point()
geo_s2cell_neighbors()
geo_s2cell_to_polygon()
geo_polygon_to_s2cells()

H3 Cell functions

Function Name
geo_point_to_h3cell()
geo_h3cell_to_central_point()
geo_h3cell_neighbors()
geo_h3cell_to_polygon()
geo_h3cell_parent()
geo_h3cell_children()
geo_h3cell_rings()

7 - Graph operators

7.1 - Best practices for Kusto Query Language (KQL) graph semantics

Learn about the best practices for Kusto Query Language (KQL) graph semantics.

Best practices for Kusto Query Language (KQL) graph semantics

This article explains how to use the graph semantics feature in KQL effectively and efficiently for different use cases and scenarios. It shows how to create and query graphs with the syntax and operators, and how to integrate them with other KQL features and functions. It also helps users avoid common pitfalls or errors. For instance, creating graphs that exceed memory or performance limits, or applying unsuitable or incompatible filters, projections, or aggregations.

Size of graph

The make-graph operator creates an in-memory representation of a graph. It consists of the graph structure itself and its properties. When making a graph, use appropriate filters, projections, and aggregations to select only the relevant nodes and edges and their properties.

The following example shows how to reduce the number of nodes and edges and their properties. In this scenario, Bob changed manager from Alice to Eve and the user only wants to see the latest state of the graph for their organization. To reduce the size of the graph, the nodes are first filtered by the organization property and then the property is removed from the graph using the project-away operator. The same happens for edges. Then summarize operator together with arg_max is used to get the last known state of the graph.

let allEmployees = datatable(organization: string, name:string, age:long)
[
  "R&D", "Alice", 32,
  "R&D","Bob", 31,
  "R&D","Eve", 27,
  "R&D","Mallory", 29,
  "Marketing", "Alex", 35
];
let allReports = datatable(employee:string, manager:string, modificationDate: datetime)
[
  "Bob", "Alice", datetime(2022-05-23),
  "Bob", "Eve", datetime(2023-01-01),
  "Eve", "Mallory", datetime(2022-05-23),
  "Alice", "Dave", datetime(2022-05-23)
];
let filteredEmployees =
    allEmployees
    | where organization == "R&D"
    | project-away age, organization;
let filteredReports =
    allReports
    | summarize arg_max(modificationDate, *) by employee
    | project-away modificationDate;
filteredReports
| make-graph employee --> manager with filteredEmployees on name
| graph-match (employee)-[hasManager*2..5]-(manager)
  where employee.name == "Bob"
  project employee = employee.name, topManager = manager.name

Output

employeetopManager
BobMallory

Last known state of the graph

The Size of graph example demonstrated how to get the last known state of the edges of a graph by using summarize operator and the arg_max aggregation function. Obtaining the last known state is a compute-intensive operation.

Consider creating a materialized view to improve the query performance, as follows:

  1. Create tables that have some notion of version as part of their model. We recommend using a datetime column that you can later use to create a graph time series.

    .create table employees (organization: string, name:string, stateOfEmployment:string, properties:dynamic, modificationDate:datetime)
    
    .create table reportsTo (employee:string, manager:string, modificationDate: datetime)
    
  2. Create a materialized view for each table and use the arg_max aggregation function to determine the last known state of employees and the reportsTo relation.

    .create materialized-view employees_MV on table employees
    {
        employees
        | summarize arg_max(modificationDate, *) by name
    }
    
    .create materialized-view reportsTo_MV on table reportsTo
    {
        reportsTo
        | summarize arg_max(modificationDate, *) by employee
    }
    
  3. Create two functions that ensure that only the materialized component of the materialized view is used and other filters and projections are applied.

    .create function currentEmployees () {
        materialized_view('employees_MV')
        | where stateOfEmployment == "employed"
    }
    
    .create function reportsTo_lastKnownState () {
        materialized_view('reportsTo_MV')
        | project-away modificationDate
    }
    

The resulting query using materialized makes the query faster and more efficient for larger graphs. It also enables higher concurrency and lower latency queries for the latest state of the graph. The user can still query the graph history based on the employees and reportsTo tables, if needed

let filteredEmployees =
    currentEmployees
    | where organization == "R&D"
    | project-away organization;
reportsTo_lastKnownState
| make-graph employee --> manager with filteredEmployees on name
| graph-match (employee)-[hasManager*2..5]-(manager)
  where employee.name == "Bob"
  project employee = employee.name, reportingPath = map(hasManager, manager)

Graph time travel

Some scenarios require you to analyze data based on the state of a graph at a specific point in time. Graph time travel uses a combination of time filters and summarizes using the arg_max aggregation function.

The following KQL statement creates a function with a parameter that defines the interesting point in time for the graph. It returns a ready-made graph.

.create function graph_time_travel (interestingPointInTime:datetime ) {
    let filteredEmployees =
        employees
        | where modificationDate < interestingPointInTime
        | summarize arg_max(modificationDate, *) by name;
    let filteredReports =
        reportsTo
        | where modificationDate < interestingPointInTime
        | summarize arg_max(modificationDate, *) by employee
        | project-away modificationDate;
    filteredReports
    | make-graph employee --> manager with filteredEmployees on name
}

With the function in place, the user can craft a query to get the top manager of Bob based on the graph in June 2022.

graph_time_travel(datetime(2022-06-01))
| graph-match (employee)-[hasManager*2..5]-(manager)
  where employee.name == "Bob"
  project employee = employee.name, reportingPath = map(hasManager, manager)

Output

employeetopManager
BobDave

Dealing with multiple node and edge types

Sometimes it’s required to contextualize time series data with a graph that consists of multiple node types. One way of handling this scenario is creating a general-purpose property graph that is represented by a canonical model.

Occasionally, you might need to contextualize time series data with a graph that has multiple node types. You could approach the problem by creating a general-purpose property graph that is based on a canonical model, such as the following.

  • nodes
    • nodeId (string)
    • label (string)
    • properties (dynamic)
  • edges
    • source (string)
    • destination (string)
    • label (string)
    • properties (dynamic)

The following example shows how to transform the data into a canonical model and how to query it. The base tables for the nodes and edges of the graph have different schemas.

This scenario involves a factory manager who wants to find out why equipment isn’t working well and who is responsible for fixing it. The manager decides to use a graph that combines the asset graph of the production floor and the maintenance staff hierarchy which changes every day.

The following graph shows the relations between assets and their time series, such as speed, temperature, and pressure. The operators and the assets, such as pump, are connected via the operates edge. The operators themselves report up to management.

Infographic on the property graph scenario.

The data for those entities can be stored directly in your cluster or acquired using query federation to a different service, such as Azure Cosmos DB, Azure SQL, or Azure Digital Twin. To illustrate the example, the following tabular data is created as part of the query:

let sensors = datatable(sensorId:string, tagName:string, unitOfMeasuree:string)
[
  "1", "temperature", "°C",
  "2", "pressure", "Pa",
  "3", "speed", "m/s"
];
let timeseriesData = datatable(sensorId:string, timestamp:string, value:double, anomaly: bool )
[
    "1", datetime(2023-01-23 10:00:00), 32, false,
    "1", datetime(2023-01-24 10:00:00), 400, true,
    "3", datetime(2023-01-24 09:00:00), 9, false
];
let employees = datatable(name:string, age:long)
[
  "Alice", 32,
  "Bob", 31,
  "Eve", 27,
  "Mallory", 29,
  "Alex", 35,
  "Dave", 45
];
let allReports = datatable(employee:string, manager:string)
[
  "Bob", "Alice",
  "Alice", "Dave",
  "Eve", "Mallory",
  "Alex", "Dave"
];
let operates = datatable(employee:string, machine:string, timestamp:datetime)
[
  "Bob", "Pump", datetime(2023-01-23),
  "Eve", "Pump", datetime(2023-01-24),
  "Mallory", "Press", datetime(2023-01-24),
  "Alex", "Conveyor belt", datetime(2023-01-24),
];
let assetHierarchy = datatable(source:string, destination:string)
[
  "1", "Pump",
  "2", "Pump",
  "Pump", "Press",
  "3", "Conveyor belt"
];

The employees, sensors, and other entities and relationships don’t share a canonical data model. You can use the union operator to combine and canonize the data.

The following query joins the sensor data with the time series data to find the sensors that have abnormal readings. Then, it uses a projection to create a common model for the graph nodes.

let nodes =
    union
        (
            sensors
            | join kind=leftouter
            (
                timeseriesData
                | summarize hasAnomaly=max(anomaly) by sensorId
            ) on sensorId
            | project nodeId = sensorId, label = "tag", properties = pack_all(true)
        ),
        ( employees | project nodeId = name, label = "employee", properties = pack_all(true));

The edges are transformed in a similar way.

let edges =
    union
        ( assetHierarchy | extend label = "hasParent" ),
        ( allReports | project source = employee, destination = manager, label = "reportsTo" ),
        ( operates | project source = employee, destination = machine, properties = pack_all(true), label = "operates" );

With the canonized nodes and edges data, you can create a graph using the make-graph operator, as follows:

let graph = edges
| make-graph source --> destination with nodes on nodeId;

Once created, define the path pattern and project the information required. The pattern starts at a tag node followed by a variable length edge to an asset. That asset is operated by an operator that reports to a top manager via a variable length edge, called reportsTo. The constraints section of the graph-match operator, in this instance where, reduces the tags to the ones that have an anomaly and were operated on a specific day.

graph
| graph-match (tag)-[hasParent*1..5]->(asset)<-[operates]-(operator)-[reportsTo*1..5]->(topManager)
    where tag.label=="tag" and tobool(tag.properties.hasAnomaly) and
        startofday(todatetime(operates.properties.timestamp)) == datetime(2023-01-24)
        and topManager.label=="employee"
    project
        tagWithAnomaly = tostring(tag.properties.tagName),
        impactedAsset = asset.nodeId,
        operatorName = operator.nodeId,
        responsibleManager = tostring(topManager.nodeId)

Output

tagWithAnomalyimpactedAssetoperatorNameresponsibleManager
temperaturePumpEveMallory

The projection in graph-match outputs the information that the temperature sensor showed an anomaly on the specified day. It was operated by Eve who ultimately reports to Mallory. With this information, the factory manager can reach out to Eve and potentially Mallory to get a better understanding of the anomaly.

7.2 - Graph operators

Learn how to use KQL graph operators.

Kusto Query Language (KQL) graph operators enable graph analysis of data by representing tabular data as a graph with nodes and edges. This setup lets us use graph operations to study the connections and relationships between different data points.

Graph analysis is typically comprised of the following steps:

  1. Prepare and preprocess the data using tabular operators
  2. Build a graph from the prepared tabular data using make-graph
  3. Perform graph analysis using graph-match
  4. Transform the results of the graph analysis back into tabular form using graph-to-table
  5. Continue the query with tabular operators

Supported graph operators

The following table describes the supported graph operators.

OperatorDescription
make-graphBuilds a graph from tabular data.
graph-matchSearches for patterns in a graph.
graph-to-tableBuilds nodes or edges tables from a graph.
graph-shortest-pathsFinds the shortest paths from a given set of source nodes to a set of target nodes.
graph-mark-componentsFinds and marks all connected components.

Graph model

A graph is modeled as a directed property graph that represents the data as a network of vertices, or nodes, connected by edges. Both nodes and edges can have properties that store more information about them, and a node in the graph must have a unique identifier. A pair of nodes can have multiple edges between them that have different properties or direction. There’s no special distinction of labels in the graph, and any property can act as a label.

Graph lifetime

A graph is a transient object. It’s built in each query that contains graph operators and ceases to exist once the query is completed. To persist a graph, it has to first be transformed back into tabular form and then stored as edges or nodes tables.

Limitations and recommendations

The graph object is built in memory on the fly for each graph query. As such, there’s a performance cost for building the graph and a limit to the size of the graph that can be built.

Although it isn’t strictly enforced, we recommend building graphs with at most 10 million elements (nodes and edges). The actual memory limit for the graph is determined by query operators memory limit.

7.3 - graph-mark-components operator (Preview)

Learn how to use the graph-mark-components operator to find and mark all connected components of a graph.

The graph-mark-components operator finds all connected components of a graph and marks each node with a component identifier.

Syntax

G | graph-mark-components [kind = Kind] [with_component_id = ComponentId]

Parameters

NameTypeRequiredDescription
Gstring✔️The graph source.
KindstringThe connected component kind, either weak (default) or strong. A weak component is a set of nodes connected by a path, ignoring the direction of edges. A strong component is a set of nodes connected in both directions, considering the edges’ directions.
ComponentIdstringThe property name that denotes the component identifier. The default property name is ComponentId.

Returns

The graph-mark-components operator returns a graph result, where each node has a component identifier in the ComponentId property. The identifier is a zero-based consecutive index of the components. Each component index is chosen arbitrarily and might not be consistent across runs.

Examples

The examples in this section show how to use the syntax to help you get started.

Find families by their relationships

The following example creates a graph from a set of child-parent pairs and identifies connected components using a family identifier.

let ChildOf = datatable(child:string, parent:string) 
[ 
  "Alice", "Bob",  
  "Carol", "Alice",  
  "Carol", "Dave",  
  "Greg", "Alice",  
  "Greg", "Dave",  
  "Howard", "Alice",  
  "Howard", "Dave",  
  "Eve", "Frank",  
  "Frank", "Mallory",
  "Eve", "Kirk",
]; 
ChildOf 
| make-graph child --> parent with_node_id=name
| graph-mark-components with_component_id = family
| graph-to-table nodes

Output

namefamily
Alice0
Bob0
Carol0
Dave0
Greg0
Howard0
Eve1
Frank1
Mallory1
Kirk1

Find a greatest common ancestor for each family

The following example uses the connected component family identifier and the graph-match operator to identify the greatest ancestor of each family in a set of child-parent data.

let ChildOf = datatable(child:string, parent:string) 
[ 
  "Alice", "Bob",  
  "Carol", "Alice",  
  "Carol", "Dave",  
  "Greg", "Alice",  
  "Greg", "Dave",  
  "Howard", "Alice",  
  "Howard", "Dave",  
  "Eve", "Frank",  
  "Frank", "Mallory",
  "Eve", "Kirk",
]; 
ChildOf 
| make-graph child --> parent with_node_id=name
| graph-mark-components with_component_id = family
| graph-match (descendant)-[childOf*1..5]->(ancestor)
  project name = ancestor.name, lineage = map(childOf, child), family = ancestor.family
| summarize (generations, name) = arg_max(array_length(lineage),name) by family

Output

familygenerationsname
12Mallory
02Bob

7.4 - graph-match operator

Learn how to use the graph-match operator to search for all occurrences of a graph pattern in a graph.

The graph-match operator searches for all occurrences of a graph pattern in an input graph source.

Syntax

G | graph-match [cycles = CyclesOption] Pattern [where Constraints] project [ColumnName =] Expression [, …]

Parameters

NameTypeRequiredDescription
Gstring✔️The input graph source.
Patternstring✔️One or more comma delimited sequences of graph node elements connected by graph edge elements using graph notations. See Graph pattern notation.
ConstraintsstringA Boolean expression composed of properties of named variables in the Pattern. Each graph element (node/edge) has a set of properties that were attached to it during the graph construction. The constraints define which elements (nodes and edges) are matched by the pattern. A property is referenced by the variable name followed by a dot (.) and the property name.
Expressionstring✔️The project clause converts each pattern to a row in a tabular result. The project expressions must be scalar and reference properties of named variables defined in the Pattern. A property is referenced by the variable name followed by a dot (.) and the attribute name.
CyclesOptionstringControls whether cycles are matched in the Pattern, allowed values: all, none, unique_edges. If all is specified, then all cycles are matched, if none is specified cycles aren’t matched, if unique_edges (default) is specified, cycles are matched but only if the cycles don’t include the same edge more than once.

Graph pattern notation

The following table shows the supported graph notation:

ElementNamed variableAnonymous
Node(n)()
Directed edge: left to right-[e]->-->
Directed edge: right to left<-[e]-<--
Any direction edge-[e]---
Variable length edge-[e*3..5]--[*3..5]-

Variable length edge

A variable length edge allows a specific pattern to be repeated multiple times within defined limits. This type of edge is denoted by an asterisk (*), followed by the minimum and maximum occurrence values in the format min..max. Both the minimum and maximum values must be integer scalars. Any sequence of edges falling within this occurrence range can match the variable edge of the pattern, if all the edges in the sequence satisfy the constraints outlined in the where clause.

Multiple sequences

Multiple comma delimited sequences are used to express nonlinear patterns. To describe the connection between different sequences, they have to share one or more variable name of a node. For example, to represent a star pattern with node n at the center connected to nodes a,b,c, and d, the following pattern could be used:

(a)--(n)--(b),(c)--(n)--(d)

Only single connected component patterns are supported.

Returns

The graph-match operator returns a tabular result, where each record corresponds to a match of the pattern in the graph.
The returned columns are defined in the operator’s project clause using properties of edges and/or nodes defined in the pattern. Properties and functions of properties of variable length edges are returned as a dynamic array, each value in the array corresponds to an occurrence of the variable length edge.

Examples

The examples in this section show how to use the syntax to help you get started.

All employees in a manager’s organization

The following example represents an organizational hierarchy. It demonstrates how a variable length edge could be used to find employees of different levels of the hierarchy in a single query. The nodes in the graph represent employees and the edges are from an employee to their manager. After we build the graph using make-graph, we search for employees in Alice’s organization that are younger than 30.

let employees = datatable(name:string, age:long) 
[ 
  "Alice", 32,  
  "Bob", 31,  
  "Eve", 27,  
  "Joe", 29,  
  "Chris", 45, 
  "Alex", 35,
  "Ben", 23,
  "Richard", 39,
]; 
let reports = datatable(employee:string, manager:string) 
[ 
  "Bob", "Alice",  
  "Chris", "Alice",  
  "Eve", "Bob",
  "Ben", "Chris",
  "Joe", "Alice", 
  "Richard", "Bob"
]; 
reports 
| make-graph employee --> manager with employees on name 
| graph-match (alice)<-[reports*1..5]-(employee)
  where alice.name == "Alice" and employee.age < 30
  project employee = employee.name, age = employee.age, reportingPath = map(reports, manager)

Output

employeeagereportingPath
Joe29[
“Alice”
]
Eve27[
“Alice”,
“Bob”
]
Ben23[
“Alice”,
“Chris”
]

Attack path

The following example builds a graph from the Actions and Entities tables. The entities are people and systems, and the actions describe different relations between entities. Following the make-graph operator that builds the graph is a call to graph-match with a graph pattern that searches for attack paths to the "Apollo" system.

let Entities = datatable(name:string, type:string, age:long) 
[ 
  "Alice", "Person", 23,  
  "Bob", "Person", 31,  
  "Eve", "Person", 17,  
  "Mallory", "Person", 29,  
  "Apollo", "System", 99 
]; 
let Actions = datatable(source:string, destination:string, action_type:string) 
[ 
  "Alice", "Bob", "communicatesWith",  
  "Alice", "Apollo", "trusts",  
  "Bob", "Apollo", "hasPermission",  
  "Eve", "Alice", "attacks",  
  "Mallory", "Alice", "attacks",  
  "Mallory", "Bob", "attacks"  
]; 
Actions 
| make-graph source --> destination with Entities on name 
| graph-match (mallory)-[attacks]->(compromised)-[hasPermission]->(apollo) 
  where mallory.name == "Mallory" and apollo.name == "Apollo" and attacks.action_type == "attacks" and hasPermission.action_type == "hasPermission" 
  project Attacker = mallory.name, Compromised = compromised.name, System = apollo.name

Output

AttackerCompromisedSystem
MalloryBobApollo

Star pattern

The following example is similar to the previous attack path example, but with an extra constraint: we want the compromised entity to also communicate with Alice. The graph-match pattern prefix is the same as the previous example and we add another sequence with the compromised as a link between the sequences.

let Entities = datatable(name:string, type:string, age:long) 
[ 
  "Alice", "Person", 23,  
  "Bob", "Person", 31,  
  "Eve", "Person", 17,  
  "Mallory", "Person", 29,  
  "Apollo", "System", 99 
]; 
let Actions = datatable(source:string, destination:string, action_type:string) 
[ 
  "Alice", "Bob", "communicatesWith",  
  "Alice", "Apollo", "trusts",  
  "Bob", "Apollo", "hasPermission",  
  "Eve", "Alice", "attacks",  
  "Mallory", "Alice", "attacks",  
  "Mallory", "Bob", "attacks"  
]; 
Actions 
| make-graph source --> destination with Entities on name 
| graph-match (mallory)-[attacks]->(compromised)-[hasPermission]->(apollo), (compromised)-[communicates]-(alice) 
  where mallory.name == "Mallory" and apollo.name == "Apollo" and attacks.action_type == "attacks" and hasPermission.action_type == "hasPermission" and alice.name == "Alice"
  project Attacker = mallory.name, Compromised = compromised.name, System = apollo.name

Output

AttackerCompromisedSystem
MalloryBobApollo

7.5 - graph-shortest-paths Operator (Preview)

Learn how to use the graph-shortest-paths operator to efficiently find the shortest paths from a given set of source nodes to a set of target nodes within a graph

The graph-shortest-paths operator finds the shortest paths between a set of source nodes and a set of target nodes in a graph and returns a table with the results.

Syntax

G | graph-shortest-paths [output = OutputOption] Pattern where Predicate project [ColumnName =] Expression [, …]

Parameters

NameTypeRequiredDescription
Gstring✔️The graph source, typically the output from a make-graph operation.
Patternstring✔️A path pattern that describes the path to find. Patterns must include at least one variable length edge and can’t contain multiple sequences.
PredicateexpressionA boolean expression that consists of properties of named variables in the pattern and constants.
Expressionexpression✔️A scalar expression that defines the output row for each found path, using constants and references to properties of named variables in the pattern.
OutputOptionstringSpecifies the search output as any (default) or all. Output is specified as any for a single shortest path per source/target pair and all for all shortest paths of equal minimum length.

Path pattern notation

The following table shows the supported path pattern notations.

ElementNamed variableAnonymous element
Node(n)()
Directed edge from left to right-[e]->-->
Directed edge from right to left<-[e]-<--
Any direction edge-[e]---
Variable length edge-[e*3..5]--[*3..5]-

Variable length edge

A variable length edge allows a specific pattern to repeat multiple times within defined limits. An asterisk (*) denotes this type of edge, followed by the minimum and maximum occurrence values in the format min..max. These values must be integer scalars. Any sequence of edges within this range can match the variable edge of the pattern, provided all the edges in the sequence meet the where clause constraints.

Returns

The graph-shortest-paths operator returns a tabular result, where each record corresponds to a path found in the graph. The returned columns are defined in the operator’s project clause using properties of nodes and edges defined in the pattern. Properties and functions of properties of variable length edges, are returned as a dynamic array. Each value in the array corresponds to an occurrence of the variable length edge.

Examples

This section provides practical examples demonstrating how to use the graph-shortest-paths operator in different scenarios.

Find any shortest path between two train stations

The following example demonstrates how to use the graph-shortest-paths operator to find the shortest path between two stations in a transportation network. The query constructs a graph from the data in connections and finds the shortest path from the "South-West" to the "North" station, considering paths up to five connections long. Since the default output is any, it finds any shortest path.

let connections = datatable(from_station:string, to_station:string, line:string) 
[ 
  "Central", "North", "red",
  "North", "Central", "red", 
  "Central", "South",  "red", 
  "South", "Central",  "red", 
  "South", "South-West", "red", 
  "South-West", "South", "red", 
  "South-West", "West", "red", 
  "West", "South-West", "red", 
  "Central", "East", "blue", 
  "East", "Central", "blue", 
  "Central", "West", "blue",
  "West", "Central", "blue",
]; 
connections 
| make-graph from_station --> to_station with_node_id=station
| graph-shortest-paths (start)-[connections*1..5]->(destination)
  where start.station == "South-West" and destination.station == "North"
  project from = start.station, path = map(connections, to_station), line = map(connections, line), to = destination.station

Output

frompathlineto
South-West[
“South”,
“Central”,
“North”
]
[
“red”,
“red”,
“red”
]
North

Find all shortest paths between two train stations

The following example, like the previous example, finds the shortest paths in a transportation network. However, it uses output=all, so returns all shortest paths.

let connections = datatable(from_station:string, to_station:string, line:string) 
[ 
  "Central", "North", "red",
  "North", "Central", "red", 
  "Central", "South",  "red", 
  "South", "Central",  "red", 
  "South", "South-West", "red", 
  "South-West", "South", "red", 
  "South-West", "West", "red", 
  "West", "South-West", "red", 
  "Central", "East", "blue", 
  "East", "Central", "blue", 
  "Central", "West", "blue",
  "West", "Central", "blue",
]; 
connections 
| make-graph from_station --> to_station with_node_id=station
| graph-shortest-paths output=all (start)-[connections*1..5]->(destination)
  where start.station == "South-West" and destination.station == "North"
  project from = start.station, path = map(connections, to_station), line = map(connections, line), to = destination.station

Output

frompathlineto
South-West[
“South”,
“Central”,
“North”
]
[
“red”,
“red”,
“red”
]
North
South-West[
“West”,
“Central”,
“North”
]
[
“red”,
“blue”,
“red”
]
North

7.6 - graph-to-table operator

Learn how to use the graph-to-table operator to export nodes or edges from a graph to tables.

The graph-to-table operator exports nodes or edges from a graph to tables.

Syntax

Nodes

G | graph-to-table nodes [ with_node_id=ColumnName ]

Edges

G | graph-to-table edges [ with_source_id=ColumnName ] [ with_target_id=ColumnName ] [ as TableName ]

Nodes and edges

G | graph-to-table nodes as NodesTableName [ with_node_id=ColumnName ], edges as EdgesTableName [ with_source_id=ColumnName ] [ with_target_id=ColumnName ]

Parameters

NameTypeRequiredDescription
Gstring✔️The input graph source.
NodesTableNamestringThe name of the exported nodes table.
EdgesTableNamestringThe name of the exported edges table.
ColumnNamestringExport the node hash ID, source node hash ID, or target node hash ID with the given column name.

Returns

Nodes

The graph-to-table operator returns a tabular result, in which each row corresponds to a node in the source graph. The returned columns are the node’s properties. When with_node_id is provided, the node hash column is of long type.

Edges

The graph-to-table operator returns a tabular result, in which each row corresponds to an edge in the source graph. The returned columns are the node’s properties. When with_source_id or with_target_id are provided, the node hash column is of long type.

Nodes and edges

The graph-to-table operator returns two tabular results, matching the previous descriptions.

Examples

The following examples use the make-graph operator to build a graph from edges and nodes tables. The nodes represent people and systems, and the edges are different relations between nodes. Then, each example shows a different usage of graph-to-table.

Get edges

In this example, the graph-to-table operator exports the edges from a graph to a table. The with_source_id and with_target_id parameters export the node hash for source and target nodes of each edge.

let nodes = datatable(name:string, type:string, age:long) 
[ 
	"Alice", "Person", 23,  
	"Bob", "Person", 31,  
	"Eve", "Person", 17,  
	"Mallory", "Person", 29,  
	"Trent", "System", 99 
]; 
let edges = datatable(source:string, destination:string, edge_type:string) 
[ 
	"Alice", "Bob", "communicatesWith",  
	"Alice", "Trent", "trusts",  
	"Bob", "Trent", "hasPermission",  
	"Eve", "Alice", "attacks",  
	"Mallory", "Alice", "attacks",  
	"Mallory", "Bob", "attacks"  
]; 
edges 
| make-graph source --> destination with nodes on name
| graph-to-table edges with_source_id=SourceId with_target_id=TargetId

Output

SourceIdTargetIdsourcedestinationedge_type
-3122868243544336885-7133945255344544237AliceBobcommunicatesWith
-31228682435443368852533909231875758225AliceTrenttrusts
-71339452553445442372533909231875758225BobTrenthasPermission
4363395278938690453-3122868243544336885EveAliceattacks
3855580634910899594-3122868243544336885MalloryAliceattacks
3855580634910899594-7133945255344544237MalloryBobattacks

Get nodes

In this example, the graph-to-table operator exports the nodes from a graph to a table. The with_node_id parameter exports the node hash.

let nodes = datatable(name:string, type:string, age:long) 
[ 
	"Alice", "Person", 23,  
	"Bob", "Person", 31,  
	"Eve", "Person", 17,
	"Trent", "System", 99
]; 
let edges = datatable(source:string, destination:string, edge_type:string) 
[ 
	"Alice", "Bob", "communicatesWith",  
	"Alice", "Trent", "trusts",  
	"Bob", "Trent", "hasPermission",  
	"Eve", "Alice", "attacks",  
	"Mallory", "Alice", "attacks",  
	"Mallory", "Bob", "attacks"
]; 
edges 
| make-graph source --> destination with nodes on name
| graph-to-table nodes with_node_id=NodeId

Output

NodeIdnametypeage
-3122868243544336885AlicePerson23
-7133945255344544237BobPerson31
4363395278938690453EvePerson17
2533909231875758225TrentSystem99
3855580634910899594Mallory

Get nodes and edges

In this example, the graph-to-table operator exports the nodes and edges from a graph to a table.

let nodes = datatable(name:string, type:string, age:long) 
[ 
	"Alice", "Person", 23,  
	"Bob", "Person", 31,  
	"Eve", "Person", 17,
	"Trent", "System", 99
]; 
let edges = datatable(source:string, destination:string, edge_type:string) 
[ 
	"Alice", "Bob", "communicatesWith",  
	"Alice", "Trent", "trusts",  
	"Bob", "Trent", "hasPermission",  
	"Eve", "Alice", "attacks",  
	"Mallory", "Alice", "attacks",  
	"Mallory", "Bob", "attacks"
]; 
edges 
| make-graph source --> destination with nodes on name
| graph-to-table nodes as N with_node_id=NodeId, edges as E with_source_id=SourceId;
N; 
E

Output table 1

NodeIdnametypeage
-3122868243544336885AlicePerson23
-7133945255344544237BobPerson31
4363395278938690453EvePerson17
2533909231875758225TrentSystem99
3855580634910899594Mallory

Output table 2

SourceIdsourcedestinationedge_type
-3122868243544336885AliceBobcommunicatesWith
-3122868243544336885AliceTrenttrusts
-7133945255344544237BobTrenthasPermission
4363395278938690453EveAliceattacks
3855580634910899594MalloryAliceattacks
3855580634910899594MalloryBobattacks

7.7 - Kusto Query Language (KQL) graph semantics overview

Learn about how to contextualize data in queries using KQL graph semantics

Kusto Query Language (KQL) graph semantics overview

Graph semantics in Kusto Query Language (KQL) allows you to model and query data as graphs. The structure of a graph comprises nodes and edges that connect them. Both nodes and edges can have properties that describe them.

Graphs are useful for representing complex and dynamic data that involve many-to-many, hierarchical, or networked relationships, such as social networks, recommendation systems, connected assets, or knowledge graphs. For example, the following graph illustrates a social network that consists of four nodes and three edges. Each node has a property for its name, such as Bob, and each edge has a property for its type, such as reportsTo.

Diagram that shows a social network as a graph.

Graphs store data differently from relational databases, which use tables and need indexes and joins to connect related data. In graphs, each node has a direct pointer to its neighbors (adjacency), so there’s no need to index or join anything, making it easy and fast to traverse the graph. Graph queries can use the graph structure and meaning to do complex and powerful operations, such as finding paths, patterns, shortest distances, communities, or centrality measures.

You can create and query graphs using KQL graph semantics, which has a simple and intuitive syntax that works well with the existing KQL features. You can also mix graph queries with other KQL features, such as time-based, location-based, and machine-learning queries, to do more advanced and powerful data analysis. By using KQL with graph semantics, you get the speed and scale of KQL queries with the flexibility and expressiveness of graphs.

For example, you can use:

  • Time-based queries to analyze the evolution of a graph over time, such as how the network structure or the node properties change
  • Geospatial queries to analyze the spatial distribution or proximity of nodes and edges, such as how the location or distance affects the relationship
  • Machine learning queries to apply various algorithms or models to graph data, such as clustering, classification, or anomaly detection

How does it work?

Every query of the graph semantics in Kusto requires creating a new graph representation. You use a graph operator that converts tabular expressions for edges and optionally nodes into a graph representation of the data. Once the graph is created, you can apply different operations to further enhance or examine the graph data.

The graph semantics extension uses an in-memory graph engine that works on the data in the memory of your cluster, making graph analysis interactive and fast. The memory consumption of a graph representation is affected by the number of nodes and edges and their respective properties. The graph engine uses a property graph model that supports arbitrary properties for nodes and edges. It also integrates with all the existing scalar operators of KQL, which gives users the ability to write expressive and complex graph queries that can use the full power and functionality of KQL.

Why use graph semantics in KQL?

There are several reasons to use graph semantics in KQL, such as the following examples:

  • KQL doesn’t support recursive joins, so you have to explicitly define the traversals you want to run (see Scenario: Friends of a friend). You can use the make-graph operator to define hops of variable length, which is useful when the relationship distance or depth isn’t fixed. For example, you can use this operator to find all the resources that are connected in a graph or all the places you can reach from a source in a transportation network.

  • Time-aware graphs are a unique feature of graph semantics in KQL that allow users to model graph data as a series of graph manipulation events over time. Users can examine how the graph evolves over time, such as how the graph’s network structure or the node properties change, or how the graph events or anomalies happen. For example, users can use time series queries to discover trends, patterns, or outliers in the graph data, such as how the network density, centrality, or modularity change over time

  • The intellisense feature of the KQL query editor assists users in writing and executing queries in the query language. It provides syntax highlighting, autocompletion, error checking, and suggestions. It also helps users with the graph semantics extension by offering graph-specific keywords, operators, functions, and examples to guide users through the graph creation and querying process.

Limits

The following are some of the main limits of the graph semantics feature in KQL:

  • You can only create or query graphs that fit into the memory of one cluster node.
  • Graph data isn’t persisted or distributed across cluster nodes, and is discarded after the query execution.

Therefore, When using the graph semantics feature in KQL, you should consider the memory consumption and performance implications of creating and querying large or dense graphs. Where possible, you should use filters, projections, and aggregations to reduce the graph size and complexity.

7.8 - make-graph operator

Learn how to use the graph-to-table operator to build a graph structure from tabular inputs of edges and nodes.

The make-graph operator builds a graph structure from tabular inputs of edges and nodes.

Syntax

Edges | make-graph SourceNodeId --> TargetNodeId [ with Nodes1 on NodeId1 [, Nodes2 on NodeId2 ]]

Edges | make-graph SourceNodeId --> TargetNodeId [ with_node_id= DefaultNodeId ]

Parameters

NameTypeRequiredDescription
Edgesstring✔️The tabular source containing the edges of the graph, each row represents an edge in the graph.
SourceNodeIdstring✔️The column in Edges with the source node IDs of the edges.
TargetNodeIdstring✔️The column in Edges with the target node IDs of the edges.
NodesstringThe tabular expressions containing the properties of the nodes in the graph.
NodesIdstringThe columns with the node IDs in Nodes.
DefaultNodeIdstringThe name of the column for the default node ID.

Returns

The make-graph operator returns a graph expression and must be followed by a graph operator. Each row in the source Edges expression becomes an edge in the graph with properties that are the column values of the row. Each row in the Nodes tabular expression becomes a node in the graph with properties that are the column values of the row. Nodes that appear in the Edges table but don’t have a corresponding row in the Nodes table are created as nodes with the corresponding node ID and empty properties.

Users can handle node information in the following ways:

  1. No node information required: make-graph completes with source and target.
  2. Explicit node properties: use up to two tabular expressions using “with Nodes1 on NodeId1 [, Nodes2 on NodeId2 ].”
  3. Default node identifier: use “with_node_id= DefaultNodeId.”

Example

Edges and nodes graph

The following example builds a graph from edges and nodes tables. The nodes represent people and systems, and the edges represent different relationships between nodes. The make-graph operator builds the graph. Then, the graph-match operator is used with a graph pattern to search for attack paths leading to the "Trent" system node.

let nodes = datatable(name:string, type:string, age:int) 
[ 
  "Alice", "Person", 23,  
  "Bob", "Person", 31,  
  "Eve", "Person", 17,  
  "Mallory", "Person", 29,  
  "Trent", "System", 99 
]; 
let edges = datatable(Source:string, Destination:string, edge_type:string) 
[ 
  "Alice", "Bob", "communicatesWith",  
  "Alice", "Trent", "trusts",  
  "Bob", "Trent", "hasPermission",  
  "Eve", "Alice", "attacks",  
  "Mallory", "Alice", "attacks",  
  "Mallory", "Bob", "attacks"  
]; 
edges 
| make-graph Source --> Destination with nodes on name 
| graph-match (mallory)-[attacks]->(compromised)-[hasPermission]->(trent) 
  where mallory.name == "Mallory" and trent.name == "Trent" and attacks.edge_type == "attacks" and hasPermission.edge_type == "hasPermission" 
  project Attacker = mallory.name, Compromised = compromised.name, System = trent.name

Output

AttackerCompromisedSystem
MalloryBobTrent

Default node identifier

The following example builds a graph using only edges, with the name property as the default node identifier. This approach is useful when creating a graph from a tabular expression of edges, ensuring that the node identifier is available for the constraints section of the subsequent graph-match operator.

let edges = datatable(source:string, destination:string, edge_type:string) 
[ 
  "Alice", "Bob", "communicatesWith",  
  "Alice", "Trent", "trusts",  
  "Bob", "Trent", "hasPermission",  
  "Eve", "Alice", "attacks",  
  "Mallory", "Alice", "attacks",  
  "Mallory", "Bob", "attacks"  
]; 
edges 
| make-graph source --> destination with_node_id=name
| graph-match (mallory)-[attacks]->(compromised)-[hasPermission]->(trent) 
  where mallory.name == "Mallory" and trent.name == "Trent" and attacks.edge_type == "attacks" and hasPermission.edge_type == "hasPermission" 
  project Attacker = mallory.name, Compromised = compromised.name, System = trent.name

Output

AttackerCompromisedSystem
MalloryBobTrent

7.9 - Scenarios for using Kusto Query Language (KQL) graph semantics

Learn about common scenarios for using Kusto Query Language (KQL) graph semantics.

What are common scenarios for using Kusto Query Language (KQL) graph semantics?

Graph semantics in Kusto Query Language (KQL) allows you to model and query data as graphs. There are many scenarios where graphs are useful for representing complex and dynamic data that involve many-to-many, hierarchical, or networked relationships, such as social networks, recommendation systems, connected assets, or knowledge graphs.

In this article, you learn about the following common scenarios for using KQL graph semantics:

Friends of a friend

One common use case for graphs is to model and query social networks, where nodes are users and edges are friendships or interactions. For example, imagine we have a table called Users that has data about users, such as their name and organization, and a table called Knows that has data about the friendships between users as shown in the following diagram:

Diagram that shows a graph of friends of a friend.

Without using graph semantics in KQL, you could create a graph to find friends of a friend by using multiple joins, as follows:

let Users = datatable (UserId: string, name: string, org: string)[]; // nodes
let Knows = datatable (FirstUser: string, SecondUser: string)[]; // edges
Users
| where org == "Contoso"
| join kind=inner (Knows) on $left.UserId == $right.FirstUser
| join kind=innerunique(Users) on $left.SecondUser == $right.UserId
| join kind=inner (Knows) on $left.SecondUser == $right.FirstUser
| join kind=innerunique(Users) on $left.SecondUser1 == $right.UserId
| where UserId != UserId1
| project name, name1, name2

You can use graph semantics in KQL to perform the same query in a more intuitive and efficient way. The following query uses the make-graph operator to create a directed graph from FirstUser to SecondUser and enriches the properties on the nodes with the columns provided by the Users table. Once the graph is instantiated, the graph-match operator provides the friend-of-a-friend pattern including filters and a projection that results in a tabular output.

let Users = datatable (UserId:string , name:string , org:string)[]; // nodes
let Knows = datatable (FirstUser:string , SecondUser:string)[]; // edges
Knows
| make-graph FirstUser --> SecondUser with Users on UserId
| graph-match (user)-->(middle_man)-->(friendOfAFriend)
    where user.org == "Contoso" and user.UserId != friendOfAFriend.UserId
    project contoso_person = user.name, middle_man = middle_man.name, kusto_friend_of_friend = friendOfAFriend.name

Insights from log data

In some use cases, you want to gain insights from a simple flat table containing time series information, such as log data. The data in each row is a string that contains raw data. To create a graph from this data, you must first identify the entities and relationships that are relevant to the graph analysis. For example, suppose you have a table called rawLogs from a web server that contains information about requests, such as the timestamp, the source IP address, the destination resource, and much more.

The following table shows an example of the raw data:

let rawLogs = datatable (rawLog: string) [
    "31.56.96.51 - - [2019-01-22 03:54:16 +0330] \"GET /product/27 HTTP/1.1\" 200 5379 \"https://www.contoso.com/m/filter/b113\" \"some client\" \"-\"",
    "31.56.96.51 - - [2019-01-22 03:55:17 +0330] \"GET /product/42 HTTP/1.1\" 200 5667 \"https://www.contoso.com/m/filter/b113\" \"some client\" \"-\"",
    "54.36.149.41 - - [2019-01-22 03:56:14 +0330] \"GET /product/27 HTTP/1.1\" 200 30577 \"-\" \"some client\" \"-\""
];

One possible way to model a graph from this table is to treat the source IP addresses as nodes and the web requests to resources as edges. You can use the parse operator to extract the columns you need for the graph and then you can create a graph that represents the network traffic and interactions between different sources and destinations. To create the graph, you can use the make-graph operator specifying the source and destination columns as the edge endpoints, and optionally providing additional columns as edge or node properties.

The following query creates a graph from the raw logs:

let parsedLogs = rawLogs
    | parse rawLog with ipAddress: string " - - [" timestamp: datetime "] \"" httpVerb: string " " resource: string " " *
    | project-away rawLog;
let edges = parsedLogs;
let nodes =
    union
        (parsedLogs
        | distinct ipAddress
        | project nodeId = ipAddress, label = "IP address"),
        (parsedLogs | distinct resource | project nodeId = resource, label = "resource");
let graph = edges
    | make-graph ipAddress --> resource with nodes on nodeId;

This query parses the raw logs and creates a directed graph where the nodes are either IP addresses or resources and each edge is a request from the source to the destination, with the timestamp and HTTP verb as edge properties.

Diagram that shows a graph of the parsed log data.

Once the graph is created, you can use the graph-match operator to query the graph data using patterns, filters, and projections. For example, you can create a pattern that makes a simple recommendation based on the resources that other IP addresses requested within the last five minutes, as follows:

graph
| graph-match (startIp)-[request]->(resource)<--(otherIP)-[otherRequest]->(otherResource)
    where startIp.label == "IP address" and //start with an IP address
    resource.nodeId != otherResource.nodeId and //recommending a different resource
    startIp.nodeId != otherIP.nodeId and //only other IP addresses are interesting
    (request.timestamp - otherRequest.timestamp < 5m) //filter on recommendations based on the last 5 minutes
    project Recommendation=otherResource.nodeId

Output

Recommendation
/product/42

The query returns “/product/42” as a recommendation based on a raw text-based log.

8 - Limits and Errors

8.1 - Query consistency

This article describes Query consistency.

Query consistency refers to how queries and updates are synchronized. There are two supported modes of query consistency:

  • Strong consistency: Strong consistency ensures immediate access to the most recent updates, such as data appends, deletions, and schema modifications. Strong consistency is the default consistency mode. Due to synchronization, this consistency mode performs slightly less well than weak consistency mode in terms of concurrency.

  • Weak consistency: With weak consistency, there may be a delay before query results reflect the latest database updates. Typically, this delay ranges from 1 to 2 minutes. Weak consistency can support higher query concurrency rates than strong consistency.

For example, if 1000 records are ingested each minute into a table in the database, queries over that table running with strong consistency will have access to the most-recently ingested records, whereas queries over that table running with weak consistency may not have access to some of records from the last few minutes.

Use cases for strong consistency

If you have a strong dependency on updates that occurred in the database in the last few minutes, use strong consistency.

For example, the following query counts the number of error records in the 5 minutes and triggers an alert that count is larger than 0. This use case is best handled with strong consistency, since your insights may be altered you don’t have access to records ingested in the past few minutes, as may be the case with weak consistency.

my_table
| where timestamp between(ago(5m)..now())
| where level == "error"
| count

In addition, strong consistency should be used when database metadata is large. For instance. there are millions of data extents in the database, using weak consistency would result in query heads downloading and deserializing extensive metadata artifacts from persistent storage, which may increase the likelihood of transient failures in downloads and related operations.

Use cases for weak consistency

If you don’t have a strong dependency on updates that occurred in the database in the last few minutes, and you need high query concurrency, use weak consistency.

For example, the following query counts the number of error records per week in the last 90 days. Weak consistency is appropriate in this case, since your insights are unlikely to be impacted records ingested in the past few minutes are omitted.

my_table
| where timestamp between(ago(90d) .. now())
| where level == "error"
| summarize count() by level, startofweek(Timestamp)

Weak consistency modes

The following table summarizes the four modes of weak query consistency.

ModeDescription
RandomQueries are routed randomly to one of the nodes in the cluster that can serve as a weakly consistent query head.
Affinity by databaseQueries within the same database are routed to the same weakly consistent query head, ensuring consistent execution for that database.
Affinity by query textQueries with the same query text hash are routed to the same weakly consistent query head, which is beneficial for leveraging query caching.
Affinity by session IDQueries with the same session ID hash are routed to the same weakly consistent query head, ensuring consistent execution within a session.

Affinity by database

The affinity by database mode ensures that queries running against the same database are executed against the same version of the database, although not necessarily the most recent version of the database. This mode is useful when ensuring consistent execution within a specific database is important. However. there’s an imbalance in the number of queries across databases, then this mode may result in uneven load distribution.

Affinity by query text

The affinity by query text mode is beneficial when queries leverage the Query results cache. This mode routes repeating queries frequently executed by the same identity to the same query head, allowing them to benefit from cached results and reducing the load on the cluster.

Affinity by session ID

The affinity by session ID mode ensures that queries belonging to the same user activity or session are executed against the same version of the database, although not necessarily the most recent one. To use this mode, the session ID needs to be explicitly specified in each query’s client request properties. This mode is helpful in scenarios where consistent execution within a session is essential.

How to specify query consistency

You can specify the query consistency mode by the client sending the request or using a server side policy. If it isn’t specified by either, the default mode of strong consistency applies.

  • Client sending the request: Use the queryconsistency client request property. This method sets the query consistency mode for a specific query and doesn’t affect the overall effective consistency mode, which is determined by the default or the server-side policy. For more information, see client request properties.

  • Server side policy: Use the QueryConsistency property of the Query consistency policy. This method sets the query consistency mode at the workload group level, which eliminates the need for users to specify the consistency mode in their client request properties and allows for enforcing desired consistency modes. For more information, see Query consistency policy.

8.2 - Query limits

This article describes Query limits.

Kusto is an ad-hoc query engine that hosts large datasets and attempts to satisfy queries by holding all relevant data in-memory. There’s an inherent risk that queries will monopolize the service resources without bounds. Kusto provides several built-in protections in the form of default query limits. If you’re considering removing these limits, first determine whether you actually gain any value by doing so.

Limit on request concurrency

Request concurrency is a limit that is imposed on several requests running at the same time.

  • The default value of the limit depends on the SKU the database is running on, and is calculated as: Cores-Per-Node x 10.
    • For example, for a database that’s set up on D14v2 SKU, where each machine has 16 vCores, the default limit is 16 cores x10 = 160.
  • The default value can be changed by configuring the request rate limit policy of the default workload group.
    • The actual number of requests that can run concurrently on a database depends on various factors. The most dominant factors are database SKU, database’s available resources, and usage patterns. The policy can be configured based on load tests performed on production-like usage patterns.

For more information, see Optimize for high concurrency with Azure Data Explorer.

Limit on result set size (result truncation)

Result truncation is a limit set by default on the result set returned by the query. Kusto limits the number of records returned to the client to 500,000, and the overall data size for those records to 64 MB. When either of these limits is exceeded, the query fails with a “partial query failure”. Exceeding overall data size will generate an exception with the message:

The Kusto DataEngine has failed to execute a query: 'Query result set has exceeded the internal data size limit 67108864 (E_QUERY_RESULT_SET_TOO_LARGE).'

Exceeding the number of records will fail with an exception that says:

The Kusto DataEngine has failed to execute a query: 'Query result set has exceeded the internal record count limit 500000 (E_QUERY_RESULT_SET_TOO_LARGE).'

There are several strategies for dealing with this error.

  • Reduce the result set size by modifying the query to only return interesting data. This strategy is useful when the initial failing query is too “wide”. For example, the query doesn’t project away data columns that aren’t needed.
  • Reduce the result set size by shifting post-query processing, such as aggregations, into the query itself. The strategy is useful in scenarios where the output of the query is fed to another processing system, and that then does other aggregations.
  • Switch from queries to using data export when you want to export large sets of data from the service.
  • Instruct the service to suppress this query limit using set statements listed below or flags in client request properties.

Methods for reducing the result set size produced by the query include:

You can disable result truncation by using the notruncation request option. We recommend that some form of limitation is still put in place.

For example:

set notruncation;
MyTable | take 1000000

It’s also possible to have more refined control over result truncation by setting the value of truncationmaxsize (maximum data size in bytes, defaults to 64 MB) and truncationmaxrecords (maximum number of records, defaults to 500,000). For example, the following query sets result truncation to happen at either 1,105 records or 1 MB, whichever is exceeded.

set truncationmaxsize=1048576;
set truncationmaxrecords=1105;
MyTable | where User=="UserId1"

Removing the result truncation limit means that you intend to move bulk data out of Kusto.

You can remove the result truncation limit either for export purposes by using the .export command or for later aggregation. If you choose later aggregation, consider aggregating by using Kusto.

Kusto provides a number of client libraries that can handle “infinitely large” results by streaming them to the caller. Use one of these libraries, and configure it to streaming mode. For example, use the .NET Framework client (Microsoft.Azure.Kusto.Data) and either set the streaming property of the connection string to true, or use the ExecuteQueryV2Async() call that always streams results. For an example of how to use ExecuteQueryV2Async(), see the HelloKustoV2 application.

You may also find the C# streaming ingestion sample application helpful.

Result truncation is applied by default, not just to the result stream returned to the client. It’s also applied by default to any subquery that one cluster issues to another cluster in a cross-cluster query, with similar effects.

It’s also applied by default to any subquery that one Eventhouse issues to another Eventhouse in a cross-Eventhouse query, with similar effects.

Setting multiple result truncation properties

The following apply when using set statements, and/or when specifying flags in client request properties.

  • If notruncation is set, and any of truncationmaxsize, truncationmaxrecords, or query_take_max_records are also set - notruncation is ignored.
  • If truncationmaxsize, truncationmaxrecords and/or query_take_max_records are set multiple times - the lower value for each property applies.

Limit on memory consumed by query operators (E_RUNAWAY_QUERY)

Kusto limits the memory that each query operator can consume to protect against “runaway” queries. This limit might be reached by some query operators, such as join and summarize, that operate by holding significant data in memory. By default the limit is 5GB (per node), and it can be increased by setting the request option maxmemoryconsumptionperiterator:

set maxmemoryconsumptionperiterator=16106127360;
MyTable | summarize count() by Use

When this limit is reached, a partial query failure is emitted with a message that includes the text E_RUNAWAY_QUERY.

The ClusterBy operator has exceeded the memory budget during evaluation. Results may be incorrect or incomplete E_RUNAWAY_QUERY.

The DemultiplexedResultSetCache operator has exceeded the memory budget during evaluation. Results may be incorrect or incomplete (E_RUNAWAY_QUERY).

The ExecuteAndCache operator has exceeded the memory budget during evaluation. Results may be incorrect or incomplete (E_RUNAWAY_QUERY).

The HashJoin operator has exceeded the memory budget during evaluation. Results may be incorrect or incomplete (E_RUNAWAY_QUERY).

The Sort operator has exceeded the memory budget during evaluation. Results may be incorrect or incomplete (E_RUNAWAY_QUERY).

The Summarize operator has exceeded the memory budget during evaluation. Results may be incorrect or incomplete (E_RUNAWAY_QUERY).

The TopNestedAggregator operator has exceeded the memory budget during evaluation. Results may be incorrect or incomplete (E_RUNAWAY_QUERY).

The TopNested operator has exceeded the memory budget during evaluation. Results may be incorrect or incomplete (E_RUNAWAY_QUERY).

If maxmemoryconsumptionperiterator is set multiple times, for example in both client request properties and using a set statement, the lower value applies.

The maximum supported value for this request option is 32212254720 (30 GB).

An additional limit that might trigger an E_RUNAWAY_QUERY partial query failure is a limit on the max accumulated size of strings held by a single operator. This limit cannot be overridden by the request option above:

Runaway query (E_RUNAWAY_QUERY). Aggregation over string column exceeded the memory budget of 8GB during evaluation.

When this limit is exceeded, most likely the relevant query operator is a join, summarize, or make-series. To work-around the limit, one should modify the query to use the shuffle query strategy. (This is also likely to improve the performance of the query.)

In all cases of E_RUNAWAY_QUERY, an additional option (beyond increasing the limit by setting the request option and changing the query to use a shuffle strategy) is to switch to sampling. The two queries below show how to do the sampling. The first query is a statistical sampling, using a random number generator. The second query is deterministic sampling, done by hashing some column from the dataset, usually some ID.

T | where rand() < 0.1 | ...

T | where hash(UserId, 10) == 1 | ...

Limit on memory per node

Max memory per query per node is another limit used to protect against “runaway” queries. This limit, represented by the request option max_memory_consumption_per_query_per_node, sets an upper bound on the amount of memory that can be used on a single node for a specific query.

set max_memory_consumption_per_query_per_node=68719476736;
MyTable | ...

If max_memory_consumption_per_query_per_node is set multiple times, for example in both client request properties and using a set statement, the lower value applies.

If the query uses summarize, join, or make-series operators, you can use the shuffle query strategy to reduce memory pressure on a single machine.

Limit execution timeout

Server timeout is a service-side timeout that is applied to all requests. Timeout on running requests (queries and management commands) is enforced at multiple points in the Kusto:

  • client library (if used)
  • service endpoint that accepts the request
  • service engine that processes the request

By default, timeout is set to four minutes for queries, and 10 minutes for management commands. This value can be increased if needed (capped at one hour).

  • Various client tools support changing the timeout as part of their global or per-connection settings. For example, in Kusto.Explorer, use Tools > Options* > Connections > Query Server Timeout.
  • Programmatically, SDKs support setting the timeout through the servertimeout property. For example, in .NET SDK this is done through a client request property, by setting a value of type System.TimeSpan.

Notes about timeouts

  • On the client side, the timeout is applied from the request being created until the time that the response starts arriving to the client. The time it takes to read the payload back at the client isn’t treated as part of the timeout. It depends on how quickly the caller pulls the data from the stream.
  • Also on the client side, the actual timeout value used is slightly higher than the server timeout value requested by the user. This difference, is to allow for network latencies.
  • To automatically use the maximum allowed request timeout, set the client request property norequesttimeout to true.

Limit on query CPU resource usage

Kusto lets you run queries and use all the available CPU resources that the database has. It attempts to do a fair round-robin between queries if more than one is running. This method yields the best performance for query-defined functions. At other times, you may want to limit the CPU resources used for a particular query. If you run a “background job”, for example, the system might tolerate higher latencies to give concurrent inline queries high priority.

Kusto supports specifying two request properties when running a query. The properties are query_fanout_threads_percent and query_fanout_nodes_percent. Both properties are integers that default to the maximum value (100), but may be reduced for a specific query to some other value.

The first, query_fanout_threads_percent, controls the fanout factor for thread use. When this property is set 100%, all CPUs will be assigned on each node. For example, 16 CPUs deployed on Azure D14 nodes. When this property is set to 50%, then half of the CPUs will be used, and so on. The numbers are rounded up to a whole CPU, so it’s safe to set the property value to 0.

The second, query_fanout_nodes_percent, controls how many of the query nodes to use per subquery distribution operation. It functions in a similar manner.

If query_fanout_nodes_percent or query_fanout_threads_percent are set multiple times, for example, in both client request properties and using a set statement - the lower value for each property applies.

Limit on query complexity

During query execution, the query text is transformed into a tree of relational operators representing the query. If the tree depth exceeds an internal threshold, the query is considered too complex for processing, and will fail with an error code. The failure indicates that the relational operators tree exceeds its limits.

The following examples show common query patterns that can cause the query to exceed this limit and fail:

  • a long list of binary operators that are chained together. For example:
T
| where Column == "value1" or
        Column == "value2" or
        .... or
        Column == "valueN"

For this specific case, rewrite the query using the in() operator.

T
| where Column in ("value1", "value2".... "valueN")
  • a query which has a union operator that is running too wide schema analysis especially that the default flavor of union is to return “outer” union schema (meaning – that output will include all columns of the underlying table).

The suggestion in this case is to review the query and reduce the columns being used by the query.

8.3 - Partial query failures

8.3.1 - Kusto query result set exceeds internal limit

This article describes Query result set has exceeded the internal.

A query result set has exceeded the internal … limit is a kind of partial query failure that happens when the query’s result has exceeded one of two limits:

  • A limit on the number of records (record count limit, set by default to 500,000)
  • A limit on the total amount of data (data size limit, set by default to 67,108,864 (64MB))

There are several possible courses of action:

[!NOTE] We don’t recommend that you increase the query limit, since the limits exist to protect the database. The limits make sure that a single query doesn’t disrupt concurrent queries running on the database.

8.3.2 - Overflows

This article describes Overflows.

An overflow occurs when the result of a computation is too large for the destination type. The overflow usually leads to a partial query failure.

For example, the following query will result in an overflow.

let Weight = 92233720368547758;
range x from 1 to 3 step 1
| summarize percentilesw(x, Weight * 100, 50)

Kusto’s percentilesw() implementation accumulates the Weight expression for values that are “close enough”. In this case, the accumulation triggers an overflow because it doesn’t fit into a signed 64-bit integer.

Usually, overflows are a result of a “bug” in the query, since Kusto uses 64-bit types for arithmetic computations. The best course of action is to look at the error message, and identify the function or aggregation that triggered the overflow. Make sure the input arguments evaluate to values that make sense.

8.3.3 - Runaway queries

This article describes Runaway queries.

A runaway query is a kind of partial query failure that happens when some internal query limit was exceeded during query execution.

For example, the following error may be reported: HashJoin operator has exceeded the memory budget during evaluation. Results may be incorrect or incomplete.

There are several possible courses of action.

  • Change the query to consume fewer resources. For example, if the error indicates that the query result set is too large, you can:
  • Increase the relevant query limit temporarily for that query. For more information, see query limits - limit on memory per iterator. This method, however, isn’t recommended. The limits exist to protect the cluster and to make sure that a single query doesn’t disrupt concurrent queries running on the cluster.
  • Increase the relevant query limit temporarily for that query. For more information, see query limits - limit on memory per iterator. This method, however, isn’t recommended. The limits exist to protect the Eventhouse and to make sure that a single query doesn’t disrupt concurrent queries running on the Eventhouse.

9 - Plugins

9.1 - Data reshaping plugins

9.1.1 - bag_unpack plugin

Learn how to use the bag_unpack plugin to unpack a dynamic column.

The bag_unpack plugin unpacks a single column of type dynamic, by treating each property bag top-level slot as a column. The plugin is invoked with the evaluate operator.

Syntax

T | evaluate bag_unpack( Column [, OutputColumnPrefix ] [, columnsConflict ] [, ignoredProperties ] ) [: OutputSchema]

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose column Column is to be unpacked.
Columndynamic✔️The column of T to unpack.
OutputColumnPrefixstringA common prefix to add to all columns produced by the plugin.
columnsConflictstringThe direction for column conflict resolution. Valid values:
error - Query produces an error (default)
replace_source - Source column is replaced
keep_source - Source column is kept
ignoredPropertiesdynamicAn optional set of bag properties to be ignored. }
OutputSchemaThe names and types for the expected columns of the bag_unpack plugin output. Specifying the expected schema optimizes query execution by not having to first run the actual query to explore the schema. For syntax information, see Output schema syntax.

Output schema syntax

( ColumnName : ColumnType [, …] )

To add all columns of the input table to the plugin output, use a wildcard * as the first parameter, as follows:

( * , ColumnName : ColumnType [, …] )

Returns

The bag_unpack plugin returns a table with as many records as its tabular input (T). The schema of the table is the same as the schema of its tabular input with the following modifications:

  • The specified input column (Column) is removed.
  • The schema is extended with as many columns as there are distinct slots in the top-level property bag values of T. The name of each column corresponds to the name of each slot, optionally prefixed by OutputColumnPrefix. Its type is either the type of the slot, if all values of the same slot have the same type, or dynamic, if the values differ in type.

Examples

Expand a bag

datatable(d:dynamic)
[
    dynamic({"Name": "John", "Age":20}),
    dynamic({"Name": "Dave", "Age":40}),
    dynamic({"Name": "Jasmine", "Age":30}),
]
| evaluate bag_unpack(d)

Output

AgeName
20John
40Dave
30Jasmine

Expand a bag with OutputColumnPrefix

Expand a bag and use the OutputColumnPrefix option to produce column names that begin with the prefix ‘Property_’.

datatable(d:dynamic)
[
    dynamic({"Name": "John", "Age":20}),
    dynamic({"Name": "Dave", "Age":40}),
    dynamic({"Name": "Jasmine", "Age":30}),
]
| evaluate bag_unpack(d, 'Property_')

Output

Property_AgeProperty_Name
20John
40Dave
30Jasmine

Expand a bag with columnsConflict

Expand a bag and use the columnsConflict option to resolve conflicts between existing columns and columns produced by the bag_unpack() operator.

datatable(Name:string, d:dynamic)
[
    'Old_name', dynamic({"Name": "John", "Age":20}),
    'Old_name', dynamic({"Name": "Dave", "Age":40}),
    'Old_name', dynamic({"Name": "Jasmine", "Age":30}),
]
| evaluate bag_unpack(d, columnsConflict='replace_source') // Use new name

Output

AgeName
20John
40Dave
30Jasmine
datatable(Name:string, d:dynamic)
[
    'Old_name', dynamic({"Name": "John", "Age":20}),
    'Old_name', dynamic({"Name": "Dave", "Age":40}),
    'Old_name', dynamic({"Name": "Jasmine", "Age":30}),
]
| evaluate bag_unpack(d, columnsConflict='keep_source') // Keep old name

Output

AgeName
20Old_name
40Old_name
30Old_name

Expand a bag with ignoredProperties

Expand a bag and use the ignoredProperties option to ignore certain properties in the property bag.

datatable(d:dynamic)
[
    dynamic({"Name": "John", "Age":20, "Address": "Address-1" }),
    dynamic({"Name": "Dave", "Age":40, "Address": "Address-2"}),
    dynamic({"Name": "Jasmine", "Age":30, "Address": "Address-3"}),
]
// Ignore 'Age' and 'Address' properties
| evaluate bag_unpack(d, ignoredProperties=dynamic(['Address', 'Age']))

Output

Name
John
Dave
Jasmine

Expand a bag with a query-defined OutputSchema

Expand a bag and use the OutputSchema option to allow various optimizations to be evaluated before running the actual query.

datatable(d:dynamic)
[
    dynamic({"Name": "John", "Age":20}),
    dynamic({"Name": "Dave", "Age":40}),
    dynamic({"Name": "Jasmine", "Age":30}),
]
| evaluate bag_unpack(d) : (Name:string, Age:long)

Output

NameAge
John20
Dave40
Jasmine30

Expand a bag and use the OutputSchema option to allow various optimizations to be evaluated before running the actual query. Use a wildcard * to return all columns of the input table.

datatable(d:dynamic, Description: string)
[
    dynamic({"Name": "John", "Age":20}), "Student",
    dynamic({"Name": "Dave", "Age":40}), "Teacher",
    dynamic({"Name": "Jasmine", "Age":30}), "Student",
]
| evaluate bag_unpack(d) : (*, Name:string, Age:long)

Output

DescriptionNameAge
StudentJohn20
TeacherDave40
StudentJasmine30

9.1.2 - narrow plugin

Learn how to use the narrow plugin to display a wide table.

The narrow plugin “unpivots” a wide table into a table with three columns:

  • Row number
  • Column type
  • Column value (as string)

The narrow plugin is designed mainly for display purposes, as it allows wide tables to be displayed comfortably without the need of horizontal scrolling.

The plugin is invoked with the evaluate operator.

Syntax

T | evaluate narrow()

Examples

The following example shows an easy way to read the output of the Kusto .show diagnostics management command.

.show diagnostics
 | evaluate narrow()

The results of .show diagnostics itself is a table with a single row and 33 columns. By using the narrow plugin we “rotate” the output to something like this:

RowColumnValue
0IsHealthyTrue
0IsRebalanceRequiredFalse
0IsScaleOutRequiredFalse
0MachinesTotal2
0MachinesOffline0
0NodeLastRestartedOn2017-03-14 10:59:18.9263023
0AdminLastElectedOn2017-03-14 10:58:41.6741934
0ClusterWarmDataCapacityFactor0.130552847673333
0ExtentsTotal136
0DiskColdAllocationPercentage5
0InstancesTargetBasedOnDataCapacity2
0TotalOriginalDataSize5167628070
0TotalExtentSize1779165230
0IngestionsLoadFactor0
0IngestionsInProgress0
0IngestionsSuccessRate100
0MergesInProgress0
0BuildVersion1.0.6281.19882
0BuildTime2017-03-13 11:02:44.0000000
0ClusterDataCapacityFactor0.130552847673333
0IsDataWarmingRequiredFalse
0RebalanceLastRunOn2017-03-21 09:14:53.8523455
0DataWarmingLastRunOn2017-03-21 09:19:54.1438800
0MergesSuccessRate100
0NotHealthyReason[null]
0IsAttentionRequiredFalse
0AttentionRequiredReason[null]
0ProductVersionKustoRelease_2017.03.13.2
0FailedIngestOperations0
0FailedMergeOperations0
0MaxExtentsInSingleTable64
0TableWithMaxExtentsKustoMonitoringPersistentDatabase.KustoMonitoringTable
0WarmExtentSize1779165230

9.1.3 - pivot plugin

Learn how to use the pivot plugin to rotate a table with specified columns and aggregates the remaining columns.

Rotates a table by turning the unique values from one column in the input table into multiple columns in the output table and performs aggregations as required on any remaining column values that will appear in the final output.

Syntax

T | evaluate pivot(pivotColumn[, aggregationFunction] [,column1 [,column2]]) [: OutputSchema]

Parameters

NameTypeRequiredDescription
pivotColumnstring✔️The column to rotate. Each unique value from this column will be a column in the output table.
aggregationFunctionstringAn aggregation function used to aggregate multiple rows in the input table to a single row in the output table. Currently supported functions: min(), max(), take_any(), sum(), dcount(), avg(), stdev(), variance(), make_list(), make_bag(), make_set(), count(). The default is count().
column1, column2, …stringA column name or comma-separated list of column names. The output table will contain an additional column per each specified column. The default is all columns other than the pivoted column and the aggregation column.
OutputSchemaThe names and types for the expected columns of the pivot plugin output.

Syntax: ( ColumnName : ColumnType [, …] )

Specifying the expected schema optimizes query execution by not having to first run the actual query to explore the schema. An error is raised if the run-time schema doesn’t match the OutputSchema schema.

Returns

Pivot returns the rotated table with specified columns (column1, column2, …) plus all unique values of the pivot columns. Each cell for the pivoted columns will contain the aggregate function computation.

Examples

Pivot by a column

For each EventType and State starting with ‘AL’, count the number of events of this type in this state.

StormEvents
| project State, EventType
| where State startswith "AL"
| where EventType has "Wind"
| evaluate pivot(State)

Output

EventTypeALABAMAALASKA
Thunderstorm Wind3521
High Wind095
Extreme Cold/Wind Chill010
Strong Wind220

Pivot by a column with aggregation function

For each EventType and State starting with ‘AR’, display the total number of direct deaths.

StormEvents
| where State startswith "AR"
| project State, EventType, DeathsDirect
| where DeathsDirect > 0
| evaluate pivot(State, sum(DeathsDirect))

Output

EventTypeARKANSASARIZONA
Heavy Rain10
Thunderstorm Wind10
Lightning01
Flash Flood06
Strong Wind10
Heat30

Pivot by a column with aggregation function and a single additional column

Result is identical to previous example.

StormEvents
| where State startswith "AR"
| project State, EventType, DeathsDirect
| where DeathsDirect > 0
| evaluate pivot(State, sum(DeathsDirect), EventType)

Output

EventTypeARKANSASARIZONA
Heavy Rain10
Thunderstorm Wind10
Lightning01
Flash Flood06
Strong Wind10
Heat30

Specify the pivoted column, aggregation function, and multiple additional columns

For each event type, source, and state, sum the number of direct deaths.

StormEvents
| where State startswith "AR"
| where DeathsDirect > 0
| evaluate pivot(State, sum(DeathsDirect), EventType, Source)

Output

EventTypeSourceARKANSASARIZONA
Heavy RainEmergency Manager10
Thunderstorm WindEmergency Manager10
LightningNewspaper01
Flash FloodTrained Spotter02
Flash FloodBroadcast Media03
Flash FloodNewspaper01
Strong WindLaw Enforcement10
HeatNewspaper30

Pivot with a query-defined output schema

The following example selects specific columns in the StormEvents table. It uses an explicit schema definition that allows various optimizations to be evaluated before running the actual query.

StormEvents
| project State, EventType
| where EventType has "Wind"
| evaluate pivot(State): (EventType:string, ALABAMA:long, ALASKA:long)

Output

EventTypeALABAMAALASKA
Thunderstorm Wind3521
High Wind095
Marine Thunderstorm Wind00
Strong Wind220
Extreme Cold/Wind Chill010
Cold/Wind Chill00
Marine Strong Wind00
Marine High Wind00

9.2 - General plugins

9.2.1 - dcount_intersect plugin

Learn how to use the dcount_intersect plugin to calculate the intersection between N sets based on hyper log log (hll) values.

Calculates intersection between N sets based on hll values (N in range of [2..16]), and returns N dcount values. The plugin is invoked with the evaluate operator.

Syntax

T | evaluate dcount_intersect(hll_1, hll_2, [, hll_3, …])

Parameters

NameTypeRequiredDescription
Tstring✔️The input tabular expression.
hll_iThe values of set Si calculated with the hll() function.

Returns

Returns a table with N dcount values (per column, representing set intersections). Column names are s0, s1, … (until n-1).

Given sets S1, S2, .. Sn return values will be representing distinct counts of:
S1,
S1 ∩ S2,
S1 ∩ S2 ∩ S3,
… ,
S1 ∩ S2 ∩ … ∩ Sn

Examples

// Generate numbers from 1 to 100
range x from 1 to 100 step 1
| extend isEven = (x % 2 == 0), isMod3 = (x % 3 == 0), isMod5 = (x % 5 == 0)
// Calculate conditional HLL values (note that '0' is included in each of them as additional value, so we will subtract it later)
| summarize hll_even = hll(iif(isEven, x, 0), 2),
            hll_mod3 = hll(iif(isMod3, x, 0), 2),
            hll_mod5 = hll(iif(isMod5, x, 0), 2) 
// Invoke the plugin that calculates dcount intersections         
| evaluate dcount_intersect(hll_even, hll_mod3, hll_mod5)
| project evenNumbers = s0 - 1,             //                             100 / 2 = 50
          even_and_mod3 = s1 - 1,           // gcd(2,3) = 6, therefor:     100 / 6 = 16
          even_and_mod3_and_mod5 = s2 - 1   // gcd(2,3,5) is 30, therefore: 100 / 30 = 3 

Output

evenNumberseven_and_mod3even_and_mod3_and_mod5
50163

9.2.2 - infer_storage_schema plugin

Learn how to use the infer_storage_schema plugin to infer the schema of external data.

This plugin infers the schema of external data, and returns it as CSL schema string. The string can be used when creating external tables. The plugin is invoked with the evaluate operator.

Authentication and authorization

In the properties of the request, you specify storage connection strings to access. Each storage connection string specifies the authorization method to use for access to the storage. Depending on the authorization method, the principal may need to be granted permissions on the external storage to perform the schema inference.

The following table lists the supported authentication methods and any required permissions by storage type.

Authentication methodAzure Blob Storage / Data Lake Storage Gen2Data Lake Storage Gen1
ImpersonationStorage Blob Data ReaderReader
Shared Access (SAS) tokenList + ReadThis authentication method isn’t supported in Gen1.
Microsoft Entra access token
Storage account access keyThis authentication method isn’t supported in Gen1.

Syntax

evaluate infer_storage_schema( Options )

Parameters

NameTypeRequiredDescription
Optionsdynamic✔️A property bag specifying the properties of the request.

Supported properties of the request

NameTypeRequiredDescription
StorageContainersdynamic✔️An array of storage connection strings that represent prefix URI for stored data artifacts.
DataFormatstring✔️One of the supported data formats.
FileExtensionstringIf specified, the function only scans files ending with this file extension. Specifying the extension may speed up the process or eliminate data reading issues.
FileNamePrefixstringIf specified, the function only scans files starting with this prefix. Specifying the prefix may speed up the process.
ModestringThe schema inference strategy. A value of: any, last, all. The function infers the data schema from the first found file, from the last written file, or from all files respectively. The default value is last.
InferenceOptionsdynamicMore inference options. Valid options: UseFirstRowAsHeader for delimited file formats. For example, 'InferenceOptions': {'UseFirstRowAsHeader': true} .

Returns

The infer_storage_schema plugin returns a single result table containing a single row/column containing CSL schema string.

Example

let options = dynamic({
  'StorageContainers': [
    h@'https://storageaccount.blob.core.windows.net/MobileEvents;secretKey'
  ],
  'FileExtension': '.parquet',
  'FileNamePrefix': 'part-',
  'DataFormat': 'parquet'
});
evaluate infer_storage_schema(options)

Output

CslSchema
app_id:string, user_id:long, event_time:datetime, country:string, city:string, device_type:string, device_vendor:string, ad_network:string, campaign:string, site_id:string, event_type:string, event_name:string, organic:string, days_from_install:int, revenue:real

Use the returned schema in external table definition:

.create external table MobileEvents(
    app_id:string, user_id:long, event_time:datetime, country:string, city:string, device_type:string, device_vendor:string, ad_network:string, campaign:string, site_id:string, event_type:string, event_name:string, organic:string, days_from_install:int, revenue:real
)
kind=blob
partition by (dt:datetime = bin(event_time, 1d), app:string = app_id)
pathformat = ('app=' app '/dt=' datetime_pattern('yyyyMMdd', dt))
dataformat = parquet
(
    h@'https://storageaccount.blob.core.windows.net/MovileEvents;secretKey'
)

9.2.3 - infer_storage_schema_with_suggestions plugin

Learn how to use the infer_storage_schema_with_suggestions plugin to infer the optimal schema of external data.

This infer_storage_schema_with_suggestions plugin infers the schema of external data and returns a JSON object. For each column, the object provides inferred type, a recommended type, and the recommended mapping transformation. The recommended type and mapping are provided by the suggestion logic that determines the optimal type using the following logic:

  • Identity columns: If the inferred type for a column is long and the column name ends with id, the suggested type is string since it provides optimized indexing for identity columns where equality filters are common.
  • Unix datetime columns: If the inferred type for a column is long and one of the unix-time to datetime mapping transformations produces a valid datetime value, the suggested type is datetime and the suggested ApplicableTransformationMapping mapping is the one that produced a valid datetime value.

The plugin is invoked with the evaluate operator. To obtain the table schema that uses the inferred schema for Create and alter Azure Storage external tables without suggestions, use the infer_storage_schema plugin.

Authentication and authorization

In the properties of the request, you specify storage connection strings to access. Each storage connection string specifies the authorization method to use for access to the storage. Depending on the authorization method, the principal may need to be granted permissions on the external storage to perform the schema inference.

The following table lists the supported authentication methods and any required permissions by storage type.

Authentication methodAzure Blob Storage / Data Lake Storage Gen2Data Lake Storage Gen1
ImpersonationStorage Blob Data ReaderReader
Shared Access (SAS) tokenList + ReadThis authentication method isn’t supported in Gen1.
Microsoft Entra access token
Storage account access keyThis authentication method isn’t supported in Gen1.

Syntax

evaluate infer_storage_schema_with_suggestions( Options )

Parameters

NameTypeRequiredDescription
Optionsdynamic✔️A property bag specifying the properties of the request.

Supported properties of the request

NameTypeRequiredDescription
StorageContainersdynamic✔️An array of storage connection strings that represent prefix URI for stored data artifacts.
DataFormatstring✔️One of the supported Data formats supported for ingestion
FileExtensionstringIf specified, the function only scans files ending with this file extension. Specifying the extension may speed up the process or eliminate data reading issues.
FileNamePrefixstringIf specified, the function only scans files starting with this prefix. Specifying the prefix may speed up the process.
ModestringThe schema inference strategy. A value of: any, last, all. The function infers the data schema from the first found file, from the last written file, or from all files respectively. The default value is last.
InferenceOptionsdynamicMore inference options. Valid options: UseFirstRowAsHeader for delimited file formats. For example, 'InferenceOptions': {'UseFirstRowAsHeader': true} .

Returns

The infer_storage_schema_with_suggestions plugin returns a single result table containing a single row/column containing a JSON string.

Example

let options = dynamic({
  'StorageContainers': [
    h@'https://storageaccount.blob.core.windows.net/MobileEvents;secretKey'
  ],
  'FileExtension': '.json',
  'FileNamePrefix': 'js-',
  'DataFormat': 'json'
});
evaluate infer_storage_schema_with_suggestions(options)

Example input data

    {
        "source": "DataExplorer",
        "created_at": "2022-04-10 15:47:57",
        "author_id": 739144091473215488,
        "time_millisec":1547083647000
    }

Output

{
  "Columns": [
    {
      "OriginalColumn": {
        "Name": "source",
        "CslType": {
          "type": "string",
          "IsNumeric": false,
          "IsSummable": false
        }
      },
      "RecommendedColumn": {
        "Name": "source",
        "CslType": {
          "type": "string",
          "IsNumeric": false,
          "IsSummable": false
        }
      },
      "ApplicableTransformationMapping": "None"
    },
    {
      "OriginalColumn": {
        "Name": "created_at",
        "CslType": {
          "type": "datetime",
          "IsNumeric": false,
          "IsSummable": true
        }
      },
      "RecommendedColumn": {
        "Name": "created_at",
        "CslType": {
          "type": "datetime",
          "IsNumeric": false,
          "IsSummable": true
        }
      },
      "ApplicableTransformationMapping": "None"
    },
    {
      "OriginalColumn": {
        "Name": "author_id",
        "CslType": {
          "type": "long",
          "IsNumeric": true,
          "IsSummable": true
        }
      },
      "RecommendedColumn": {
        "Name": "author_id",
        "CslType": {
          "type": "string",
          "IsNumeric": false,
          "IsSummable": false
        }
      },
      "ApplicableTransformationMapping": "None"
    },
    {
      "OriginalColumn": {
        "Name": "time_millisec",
        "CslType": {
          "type": "long",
          "IsNumeric": true,
          "IsSummable": true
        }
      },
      "RecommendedColumn": {
        "Name": "time_millisec",
        "CslType": {
          "type": "datetime",
          "IsNumeric": false,
          "IsSummable": true
        }
      },
      "ApplicableTransformationMapping": "DateTimeFromUnixMilliseconds"
    }
  ]
}

9.2.4 - ipv4_lookup plugin

Learn how to use the ipv4_lookup plugin to look up an IPv4 value in a lookup table.

The ipv4_lookup plugin looks up an IPv4 value in a lookup table and returns rows with matched values. The plugin is invoked with the evaluate operator.

Syntax

T | evaluate ipv4_lookup( LookupTable , SourceIPv4Key , IPv4LookupKey [, ExtraKey1 [.. , ExtraKeyN [, return_unmatched ]]] )

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose column SourceIPv4Key is used for IPv4 matching.
LookupTablestring✔️Table or tabular expression with IPv4 lookup data, whose column LookupKey is used for IPv4 matching. IPv4 values can be masked using IP-prefix notation.
SourceIPv4Keystring✔️The column of T with IPv4 string to be looked up in LookupTable. IPv4 values can be masked using IP-prefix notation.
IPv4LookupKeystring✔️The column of LookupTable with IPv4 string that is matched against each SourceIPv4Key value.
ExtraKey1 .. ExtraKeyNstringAdditional column references that are used for lookup matches. Similar to join operation: records with equal values are considered matching. Column name references must exist both is source table T and LookupTable.
return_unmatchedboolA boolean flag that defines if the result should include all or only matching rows (default: false - only matching rows returned).

Returns

The ipv4_lookup plugin returns a result of join (lookup) based on IPv4 key. The schema of the table is the union of the source table and the lookup table, similar to the result of the lookup operator.

If the return_unmatched argument is set to true, the resulting table includes both matched and unmatched rows (filled with nulls).

If the return_unmatched argument is set to false, or omitted (the default value of false is used), the resulting table has as many records as matching results. This variant of lookup has better performance compared to return_unmatched=true execution.

Examples

IPv4 lookup - matching rows only

// IP lookup table: IP_Data
// Partial data from: https://raw.githubusercontent.com/datasets/geoip2-ipv4/master/data/geoip2-ipv4.csv
let IP_Data = datatable(network:string, continent_code:string ,continent_name:string, country_iso_code:string, country_name:string)
[
  "111.68.128.0/17","AS","Asia","JP","Japan",
  "5.8.0.0/19","EU","Europe","RU","Russia",
  "223.255.254.0/24","AS","Asia","SG","Singapore",
  "46.36.200.51/32","OC","Oceania","CK","Cook Islands",
  "2.20.183.0/24","EU","Europe","GB","United Kingdom",
];
let IPs = datatable(ip:string)
[
  '2.20.183.12',   // United Kingdom
  '5.8.1.2',       // Russia
  '192.165.12.17', // Unknown
];
IPs
| evaluate ipv4_lookup(IP_Data, ip, network)

Output

ipnetworkcontinent_codecontinent_namecountry_iso_codecountry_name
2.20.183.122.20.183.0/24EUEuropeGBUnited Kingdom
5.8.1.25.8.0.0/19EUEuropeRURussia

IPv4 lookup - return both matching and nonmatching rows

// IP lookup table: IP_Data
// Partial data from: 
// https://raw.githubusercontent.com/datasets/geoip2-ipv4/master/data/geoip2-ipv4.csv
let IP_Data = datatable(network:string,continent_code:string ,continent_name:string ,country_iso_code:string ,country_name:string )
[
    "111.68.128.0/17","AS","Asia","JP","Japan",
    "5.8.0.0/19","EU","Europe","RU","Russia",
    "223.255.254.0/24","AS","Asia","SG","Singapore",
    "46.36.200.51/32","OC","Oceania","CK","Cook Islands",
    "2.20.183.0/24","EU","Europe","GB","United Kingdom",
];
let IPs = datatable(ip:string)
[
    '2.20.183.12',   // United Kingdom
    '5.8.1.2',       // Russia
    '192.165.12.17', // Unknown
];
IPs
| evaluate ipv4_lookup(IP_Data, ip, network, return_unmatched = true)

Output

ipnetworkcontinent_codecontinent_namecountry_iso_codecountry_name
2.20.183.122.20.183.0/24EUEuropeGBUnited Kingdom
5.8.1.25.8.0.0/19EUEuropeRURussia
192.165.12.17

IPv4 lookup - using source in external_data()

let IP_Data = external_data(network:string,geoname_id:long,continent_code:string,continent_name:string ,country_iso_code:string,country_name:string,is_anonymous_proxy:bool,is_satellite_provider:bool)
    ['https://raw.githubusercontent.com/datasets/geoip2-ipv4/master/data/geoip2-ipv4.csv'];
let IPs = datatable(ip:string)
[
    '2.20.183.12',   // United Kingdom
    '5.8.1.2',       // Russia
    '192.165.12.17', // Sweden
];
IPs
| evaluate ipv4_lookup(IP_Data, ip, network, return_unmatched = true)

Output

ipnetworkgeoname_idcontinent_codecontinent_namecountry_iso_codecountry_nameis_anonymous_proxyis_satellite_provider
2.20.183.122.20.183.0/242635167EUEuropeGBUnited Kingdom00
5.8.1.25.8.0.0/192017370EUEuropeRURussia00
192.165.12.17192.165.8.0/212661886EUEuropeSESweden00

IPv4 lookup - using extra columns for matching

let IP_Data = external_data(network:string,geoname_id:long,continent_code:string,continent_name:string ,country_iso_code:string,country_name:string,is_anonymous_proxy:bool,is_satellite_provider:bool)
    ['https://raw.githubusercontent.com/datasets/geoip2-ipv4/master/data/geoip2-ipv4.csv'];
let IPs = datatable(ip:string, continent_name:string, country_iso_code:string)
[
    '2.20.183.12',   'Europe', 'GB', // United Kingdom
    '5.8.1.2',       'Europe', 'RU', // Russia
    '192.165.12.17', 'Europe', '',   // Sweden is 'SE' - so it won't be matched
];
IPs
| evaluate ipv4_lookup(IP_Data, ip, network, continent_name, country_iso_code)

Output

ipcontinent_namecountry_iso_codenetworkgeoname_idcontinent_codecountry_nameis_anonymous_proxyis_satellite_provider
2.20.183.12EuropeGB2.20.183.0/242635167EUUnited Kingdom00
5.8.1.2EuropeRU5.8.0.0/192017370EURussia00

9.2.5 - ipv6_lookup plugin

Learn how to use the ipv6_lookup plugin to look up an IPv6 address in a lookup table.

The ipv6_lookup plugin looks up an IPv6 value in a lookup table and returns rows with matched values. The plugin is invoked with the evaluate operator.

Syntax

T | evaluate ipv6_lookup( LookupTable , SourceIPv6Key , IPv6LookupKey [, return_unmatched ] )

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose column SourceIPv6Key is used for IPv6 matching.
LookupTablestring✔️Table or tabular expression with IPv6 lookup data, whose column LookupKey is used for IPv6 matching. IPv6 values can be masked using IP-prefix notation.
SourceIPv6Keystring✔️The column of T with IPv6 string to be looked up in LookupTable. IPv6 values can be masked using IP-prefix notation.
IPv6LookupKeystring✔️The column of LookupTable with IPv6 string that is matched against each SourceIPv6Key value.
return_unmatchedboolA boolean flag that defines if the result should include all or only matching rows (default: false - only matching rows returned).

Returns

The ipv6_lookup plugin returns a result of join (lookup) based on IPv6 key. The schema of the table is the union of the source table and the lookup table, similar to the result of the lookup operator.

If the return_unmatched argument is set to true, the resulting table includes both matched and unmatched rows (filled with nulls).

If the return_unmatched argument is set to false, or omitted (the default value of false is used), the resulting table has as many records as matching results. This variant of lookup has better performance compared to return_unmatched=true execution.

Examples

IPv6 lookup - matching rows only

// IP lookup table: IP_Data (the data is generated by ChatGPT).
let IP_Data = datatable(network:string, continent_code:string ,continent_name:string, country_iso_code:string, country_name:string)
[
  "2001:0db8:85a3::/48","NA","North America","US","United States",
  "2404:6800:4001::/48","AS","Asia","JP","Japan",
  "2a00:1450:4001::/48","EU","Europe","DE","Germany",
  "2800:3f0:4001::/48","SA","South America","BR","Brazil",
  "2c0f:fb50:4001::/48","AF","Africa","ZA","South Africa",
  "2607:f8b0:4001::/48","NA","North America","CA","Canada",
  "2a02:26f0:4001::/48","EU","Europe","FR","France",
  "2400:cb00:4001::/48","AS","Asia","IN","India",
  "2801:0db8:85a3::/48","SA","South America","AR","Argentina",
  "2a03:2880:4001::/48","EU","Europe","GB","United Kingdom"
];
let IPs = datatable(ip:string)
[
  "2001:0db8:85a3:0000:0000:8a2e:0370:7334", // United States
  "2404:6800:4001:0001:0000:8a2e:0370:7334", // Japan
  "2a02:26f0:4001:0006:0000:8a2e:0370:7334", // France
  "a5e:f127:8a9d:146d:e102:b5d3:c755:abcd",  // N/A
  "a5e:f127:8a9d:146d:e102:b5d3:c755:abce"   // N/A
];
IPs
| evaluate ipv6_lookup(IP_Data, ip, network)

Output

networkcontinent_codecontinent_namecountry_iso_codecountry_nameip
2001:0db8:85a3::/48NANorth AmericaUSUnited States2001:0db8:85a3:0000:0000:8a2e:0370:7334
2404:6800:4001::/48ASAsiaJPJapan2404:6800:4001:0001:0000:8a2e:0370:7334
2a02:26f0:4001::/48EUEuropeFRFrance2a02:26f0:4001:0006:0000:8a2e:0370:7334

IPv6 lookup - return both matching and nonmatching rows

// IP lookup table: IP_Data (the data is generated by ChatGPT).
let IP_Data = datatable(network:string, continent_code:string ,continent_name:string, country_iso_code:string, country_name:string)
[
  "2001:0db8:85a3::/48","NA","North America","US","United States",
  "2404:6800:4001::/48","AS","Asia","JP","Japan",
  "2a00:1450:4001::/48","EU","Europe","DE","Germany",
  "2800:3f0:4001::/48","SA","South America","BR","Brazil",
  "2c0f:fb50:4001::/48","AF","Africa","ZA","South Africa",
  "2607:f8b0:4001::/48","NA","North America","CA","Canada",
  "2a02:26f0:4001::/48","EU","Europe","FR","France",
  "2400:cb00:4001::/48","AS","Asia","IN","India",
  "2801:0db8:85a3::/48","SA","South America","AR","Argentina",
  "2a03:2880:4001::/48","EU","Europe","GB","United Kingdom"
];
let IPs = datatable(ip:string)
[
  "2001:0db8:85a3:0000:0000:8a2e:0370:7334", // United States
  "2404:6800:4001:0001:0000:8a2e:0370:7334", // Japan
  "2a02:26f0:4001:0006:0000:8a2e:0370:7334", // France
  "a5e:f127:8a9d:146d:e102:b5d3:c755:abcd",  // N/A
  "a5e:f127:8a9d:146d:e102:b5d3:c755:abce"   // N/A
];
IPs
| evaluate ipv6_lookup(IP_Data, ip, network, true)

Output

networkcontinent_codecontinent_namecountry_iso_codecountry_nameip
2001:0db8:85a3::/48NANorth AmericaUSUnited States2001:0db8:85a3:0000:0000:8a2e:0370:7334
2404:6800:4001::/48ASAsiaJPJapan2404:6800:4001:0001:0000:8a2e:0370:7334
2a02:26f0:4001::/48EUEuropeFRFrance2a02:26f0:4001:0006:0000:8a2e:0370:7334
a5e:f127:8a9d:146d:e102:b5d3:c755:abcd
a5e:f127:8a9d:146d:e102:b5d3:c755:abce

9.2.6 - preview plugin

Learn how to use the preview plugin to return two tables, one with the specified number of rows, and the other with the total number of records.

Returns a table with up to the specified number of rows from the input record set, and the total number of records in the input record set.

Syntax

T | evaluate preview(NumberOfRows)

Parameters

NameTypeRequiredDescription
Tstring✔️The table to preview.
NumberOfRowsint✔️The number of rows to preview from the table.

Returns

The preview plugin returns two result tables:

  • A table with up to the specified number of rows. For example, the sample query above is equivalent to running T | take 50.
  • A table with a single row/column, holding the number of records in the input record set. For example, the sample query above is equivalent to running T | count.

Example

StormEvents | evaluate preview(5)

Table1

The following output table only includes the first 6 columns. To see the full result, run the query.

|StartTime|EndTime|EpisodeId|EventId|State|EventType|…| |–|–|–| |2007-12-30T16:00:00Z|2007-12-30T16:05:00Z|11749|64588|GEORGIA| Thunderstorm Wind|…| |2007-12-20T07:50:00Z|2007-12-20T07:53:00Z|12554|68796|MISSISSIPPI| Thunderstorm Wind|…| |2007-09-29T08:11:00Z|2007-09-29T08:11:00Z|11091|61032|ATLANTIC SOUTH| Waterspout|…| |2007-09-20T21:57:00Z|2007-09-20T22:05:00Z|11078|60913|FLORIDA| Tornado|…| |2007-09-18T20:00:00Z|2007-09-19T18:00:00Z|11074|60904|FLORIDA| Heavy Rain|…|

Table2

Count
59066

9.2.7 - schema_merge plugin

Learn how to use the schema_merge plugin to merge tabular schema definitions into a unified schema.

Merges tabular schema definitions into a unified schema.

Schema definitions are expected to be in the format produced by the getschema operator.

The schema merge operation joins columns in input schemas and tries to reduce data types to common ones. If data types can’t be reduced, an error is displayed on the problematic column.

The plugin is invoked with the evaluate operator.

Syntax

T | evaluate schema_merge(PreserveOrder)

Parameters

NameTypeRequiredDescription
PreserveOrderboolWhen set to true, directs the plugin to validate the column order as defined by the first tabular schema that is kept. If the same column is in several schemas, the column ordinal must be like the column ordinal of the first schema that it appeared in. Default value is true.

Returns

The schema_merge plugin returns output similar to what getschema operator returns.

Examples

Merge with a schema that has a new column appended.

let schema1 = datatable(Uri:string, HttpStatus:int)[] | getschema;
let schema2 = datatable(Uri:string, HttpStatus:int, Referrer:string)[] | getschema;
union schema1, schema2 | evaluate schema_merge()

Output

ColumnNameColumnOrdinalDataTypeColumnType
Uri0System.Stringstring
HttpStatus1System.Int32int
Referrer2System.Stringstring

Merge with a schema that has different column ordering (HttpStatus ordinal changes from 1 to 2 in the new variant).

let schema1 = datatable(Uri:string, HttpStatus:int)[] | getschema;
let schema2 = datatable(Uri:string, Referrer:string, HttpStatus:int)[] | getschema;
union schema1, schema2 | evaluate schema_merge()

Output

ColumnNameColumnOrdinalDataTypeColumnType
Uri0System.Stringstring
Referrer1System.Stringstring
HttpStatus-1ERROR(unknown CSL type:ERROR(columns are out of order))ERROR(columns are out of order)

Merge with a schema that has different column ordering, but with PreserveOrder set to false.

let schema1 = datatable(Uri:string, HttpStatus:int)[] | getschema;
let schema2 = datatable(Uri:string, Referrer:string, HttpStatus:int)[] | getschema;
union schema1, schema2 | evaluate schema_merge(PreserveOrder = false)

Output

ColumnNameColumnOrdinalDataTypeColumnType
Uri0System.Stringstring
Referrer1System.Stringstring
HttpStatus2System.Int32int

9.3 - Language plugins

9.3.1 - Python plugin

Learn how to use the Python plugin to run user-defined functions using a Python script.

9.3.2 - Python plugin packages

Learn about the Python packages available in the Python plugin.

This article lists the available Python packages in the Python plugin. For more information, see Python plugin.

3.11.7 (Preview)

Python engine 3.11.7 + common data science and ML packages

PackageVersion
annotated-types0.6.0
anytree2.12.1
arrow1.3.0
attrs23.2.0
blinker1.7.0
blis0.7.11
Bottleneck1.3.8
Brotli1.1.0
brotlipy0.7.0
catalogue2.0.10
certifi2024.2.2
cffi1.16.0
chardet5.2.0
charset-normalizer3.3.2
click8.1.7
cloudpathlib0.16.0
cloudpickle3.0.0
colorama0.4.6
coloredlogs15.0.1
confection0.1.4
contourpy1.2.1
cycler0.12.1
cymem2.0.8
Cython3.0.10
daal2024.3.0
daal4py2024.3.0
dask2024.4.2
diff-match-patch20230430
dill0.3.8
distributed2024.4.2
filelock3.13.4
flashtext2.7
Flask3.0.3
Flask-Compress1.15
flatbuffers24.3.25
fonttools4.51.0
fsspec2024.3.1
gensim4.3.2
humanfriendly10.0
idna3.7
importlib_metadata7.1.0
intervaltree3.1.0
itsdangerous2.2.0
jellyfish1.0.3
Jinja23.1.3
jmespath1.0.1
joblib1.4.0
json50.9.25
jsonschema4.21.1
jsonschema-specifications2023.12.1
kiwisolver1.4.5
langcodes3.4.0
language_data1.2.0
locket1.0.0
lxml5.2.1
marisa-trie1.1.0
MarkupSafe2.1.5
mlxtend0.23.1
mpmath1.3.0
msgpack1.0.8
murmurhash1.0.10
networkx3.3
nltk3.8.1
numpy1.26.4
onnxruntime1.17.3
packaging24.0
pandas2.2.2
partd1.4.1
patsy0.5.6
pillow10.3.0
platformdirs4.2.1
plotly5.21.0
preshed3.0.9
protobuf5.26.1
psutil5.9.8
pycparser2.22
pydantic2.7.1
pydantic_core2.18.2
pyfpgrowth1.0
pyparsing3.1.2
pyreadline33.4.1
python-dateutil2.9.0.post0
pytz2024.1
PyWavelets1.6.0
PyYAML6.0.1
queuelib1.6.2
referencing0.35.0
regex2024.4.16
requests2.31.0
requests-file2.0.0
rpds-py0.18.0
scikit-learn1.4.2
scipy1.13.0
sip6.8.3
six1.16.0
smart-open6.4.0
snowballstemmer2.2.0
sortedcollections2.1.0
sortedcontainers2.4.0
spacy3.7.4
spacy-legacy3.0.12
spacy-loggers1.0.5
srsly2.4.8
statsmodels0.14.2
sympy1.12
tbb2021.12.0
tblib3.0.0
tenacity8.2.3
textdistance4.6.2
thinc8.2.3
threadpoolctl3.4.0
three-merge0.1.1
tldextract5.1.2
toolz0.12.1
tornado6.4
tqdm4.66.2
typer0.9.4
types-python-dateutil2.9.0.20240316
typing_extensions4.11.0
tzdata2024.1
ujson5.9.0
Unidecode1.3.8
urllib32.2.1
wasabi1.1.2
weasel0.3.4
Werkzeug3.0.2
xarray2024.3.0
zict3.0.0
zipp3.18.1
zstandard0.22.0

3.11.7 DL (Preview)

Python engine 3.11.7 + common data science and ML packages + deep learning packages (tensorflow & torch)

PackageVersion
absl-py2.1.0
alembic1.13.1
aniso86019.0.1
annotated-types0.6.0
anytree2.12.1
arch7.0.0
arrow1.3.0
astunparse1.6.3
attrs23.2.0
blinker1.7.0
blis0.7.11
Bottleneck1.3.8
Brotli1.1.0
brotlipy0.7.0
cachetools5.3.3
catalogue2.0.10
certifi2024.2.2
cffi1.16.0
chardet5.2.0
charset-normalizer3.3.2
click8.1.7
cloudpathlib0.16.0
cloudpickle3.0.0
colorama0.4.6
coloredlogs15.0.1
confection0.1.4
contourpy1.2.1
cycler0.12.1
cymem2.0.8
Cython3.0.10
daal2024.3.0
daal4py2024.3.0
dask2024.4.2
Deprecated1.2.14
diff-match-patch20230430
dill0.3.8
distributed2024.4.2
docker7.1.0
entrypoints0.4
filelock3.13.4
flashtext2.7
Flask3.0.3
Flask-Compress1.15
flatbuffers24.3.25
fonttools4.51.0
fsspec2024.3.1
gast0.5.4
gensim4.3.2
gitdb4.0.11
GitPython3.1.43
google-pasta0.2.0
graphene3.3
graphql-core3.2.3
graphql-relay3.2.0
greenlet3.0.3
grpcio1.64.0
h5py3.11.0
humanfriendly10.0
idna3.7
importlib-metadata7.0.0
iniconfig2.0.0
intervaltree3.1.0
itsdangerous2.2.0
jellyfish1.0.3
Jinja23.1.3
jmespath1.0.1
joblib1.4.0
json50.9.25
jsonschema4.21.1
jsonschema-specifications2023.12.1
keras3.3.3
kiwisolver1.4.5
langcodes3.4.0
language_data1.2.0
libclang18.1.1
locket1.0.0
lxml5.2.1
Mako1.3.5
marisa-trie1.1.0
Markdown3.6
markdown-it-py3.0.0
MarkupSafe2.1.5
mdurl0.1.2
ml-dtypes0.3.2
mlflow2.13.0
mlxtend0.23.1
mpmath1.3.0
msgpack1.0.8
murmurhash1.0.10
namex0.0.8
networkx3.3
nltk3.8.1
numpy1.26.4
onnxruntime1.17.3
opentelemetry-api1.24.0
opentelemetry-sdk1.24.0
opentelemetry-semantic-conventions0.45b0
opt-einsum3.3.0
optree0.11.0
packaging24.0
pandas2.2.2
partd1.4.1
patsy0.5.6
pillow10.3.0
platformdirs4.2.1
plotly5.21.0
pluggy1.5.0
preshed3.0.9
protobuf4.25.3
psutil5.9.8
pyarrow15.0.2
pycparser2.22
pydantic2.7.1
pydantic_core2.18.2
pyfpgrowth1.0
Pygments2.18.0
pyparsing3.1.2
pyreadline33.4.1
pytest8.2.1
python-dateutil2.9.0.post0
pytz2024.1
PyWavelets1.6.0
pywin32306
PyYAML6.0.1
querystring-parser1.2.4
queuelib1.6.2
referencing0.35.0
regex2024.4.16
requests2.31.0
requests-file2.0.0
rich13.7.1
rpds-py0.18.0
rstl0.1.3
scikit-learn1.4.2
scipy1.13.0
seasonal0.3.1
sip6.8.3
six1.16.0
smart-open6.4.0
smmap5.0.1
snowballstemmer2.2.0
sortedcollections2.1.0
sortedcontainers2.4.0
spacy3.7.4
spacy-legacy3.0.12
spacy-loggers1.0.5
SQLAlchemy2.0.30
sqlparse0.5.0
srsly2.4.8
statsmodels0.14.2
sympy1.12
tbb2021.12.0
tblib3.0.0
tenacity8.2.3
tensorboard2.16.2
tensorboard-data-server0.7.2
tensorflow2.16.1
tensorflow-intel2.16.1
tensorflow-io-gcs-filesystem0.31.0
termcolor2.4.0
textdistance4.6.2
thinc8.2.3
threadpoolctl3.4.0
three-merge0.1.1
time-series-anomaly-detector0.2.7
tldextract5.1.2
toolz0.12.1
torch2.2.2
torchaudio2.2.2
torchvision0.17.2
tornado6.4
tqdm4.66.2
typer0.9.4
types-python-dateutil2.9.0.20240316
typing_extensions4.11.0
tzdata2024.1
ujson5.9.0
Unidecode1.3.8
urllib32.2.1
waitress3.0.0
wasabi1.1.2
weasel0.3.4
Werkzeug3.0.2
wrapt1.16.0
xarray2024.3.0
zict3.0.0
zipp3.18.1
zstandard0.22.0

3.10.8

Python engine 3.10.8 + common data science and ML packages

PackageVersion
alembic1.11.1
anytree2.8.0
arrow1.2.3
attrs22.2.0
blis0.7.9
Bottleneck1.3.5
Brotli1.0.9
brotlipy0.7.0
catalogue2.0.8
certifi2022.12.7
cffi1.15.1
chardet5.0.0
charset-normalizer2.1.1
click8.1.3
cloudpickle2.2.1
colorama0.4.6
coloredlogs15.0.1
confection0.0.4
contourpy1.0.7
cycler0.11.0
cymem2.0.7
Cython0.29.28
daal2021.6.0
daal4py2021.6.3
dask2022.10.2
databricks-cli0.17.7
diff-match-patch20200713
dill0.3.6
distributed2022.10.2
docker6.1.3
entrypoints0.4
filelock3.9.1
flashtext2.7
Flask2.2.3
Flask-Compress1.13
flatbuffers23.3.3
fonttools4.39.0
fsspec2023.3.0
gensim4.2.0
gitdb4.0.10
GitPython3.1.31
greenlet2.0.2
HeapDict1.0.1
humanfriendly10.0
idna3.4
importlib-metadata6.7.0
intervaltree3.1.0
itsdangerous2.1.2
jellyfish0.9.0
Jinja23.1.2
jmespath1.0.1
joblib1.2.0
json50.9.10
jsonschema4.16.0
kiwisolver1.4.4
langcodes3.3.0
locket1.0.0
lxml4.9.1
Mako1.2.4
Markdown3.4.3
MarkupSafe2.1.2
mlflow2.4.1
mlxtend0.21.0
mpmath1.3.0
msgpack1.0.5
murmurhash1.0.9
networkx2.8.7
nltk3.7
numpy1.23.4
oauthlib3.2.2
onnxruntime1.13.1
packaging23.0
pandas1.5.1
partd1.3.0
pathy0.10.1
patsy0.5.3
Pillow9.4.0
pip23.0.1
platformdirs2.5.2
plotly5.11.0
ply3.11
preshed3.0.8
protobuf4.22.1
psutil5.9.3
pyarrow12.0.1
pycparser2.21
pydantic1.10.6
pyfpgrowth1.0
PyJWT2.7.0
pyparsing3.0.9
pyreadline33.4.1
pyrsistent0.19.3
python-dateutil2.8.2
pytz2022.7.1
PyWavelets1.4.1
pywin32306
PyYAML6.0
querystring-parser1.2.4
queuelib1.6.2
regex2022.10.31
requests2.28.2
requests-file1.5.1
scikit-learn1.1.3
scipy1.9.3
setuptools67.6.0
sip6.7.3
six1.16.0
smart-open6.3.0
smmap5.0.0
snowballstemmer2.2.0
sortedcollections2.1.0
sortedcontainers2.4.0
spacy3.4.2
spacy-legacy3.0.12
spacy-loggers1.0.4
SQLAlchemy2.0.18
sqlparse0.4.4
srsly2.4.5
statsmodels0.13.2
sympy1.11.1
tabulate0.9.0
tbb2021.7.1
tblib1.7.0
tenacity8.2.2
textdistance4.5.0
thinc8.1.9
threadpoolctl3.1.0
three-merge0.1.1
tldextract3.4.0
toml0.10.2
toolz0.12.0
tornado6.1
tqdm4.65.0
typer0.4.2
typing_extensions4.5.0
ujson5.5.0
Unidecode1.3.6
urllib31.26.15
waitress2.1.2
wasabi0.10.1
websocket-client1.6.1
Werkzeug2.2.3
wheel0.40.0
xarray2022.10.0
zict2.2.0
zipp3.15.0

3.10.8 DL

Not supported

3.6.5 (Legacy)

Not supported

This article lists the available managed Python packages in the Python plugin. For more information, see Python plugin.

To create a custom image, see Create a custom image.

3.11.7 (Preview)

Python engine 3.11.7 + common data science and ML packages

PackageVersion
annotated-types0.6.0
anytree2.12.1
arrow1.3.0
attrs23.2.0
blinker1.7.0
blis0.7.11
Bottleneck1.3.8
Brotli1.1.0
brotlipy0.7.0
catalogue2.0.10
certifi2024.2.2
cffi1.16.0
chardet5.2.0
charset-normalizer3.3.2
click8.1.7
cloudpathlib0.16.0
cloudpickle3.0.0
colorama0.4.6
coloredlogs15.0.1
confection0.1.4
contourpy1.2.1
cycler0.12.1
cymem2.0.8
Cython3.0.10
daal2024.3.0
daal4py2024.3.0
dask2024.4.2
diff-match-patch20230430
dill0.3.8
distributed2024.4.2
filelock3.13.4
flashtext2.7
Flask3.0.3
Flask-Compress1.15
flatbuffers24.3.25
fonttools4.51.0
fsspec2024.3.1
gensim4.3.2
humanfriendly10.0
idna3.7
importlib_metadata7.1.0
intervaltree3.1.0
itsdangerous2.2.0
jellyfish1.0.3
Jinja23.1.3
jmespath1.0.1
joblib1.4.0
json50.9.25
jsonschema4.21.1
jsonschema-specifications2023.12.1
kiwisolver1.4.5
langcodes3.4.0
language_data1.2.0
locket1.0.0
lxml5.2.1
marisa-trie1.1.0
MarkupSafe2.1.5
matplotlib3.8.4
mlxtend0.23.1
mpmath1.3.0
msgpack1.0.8
murmurhash1.0.10
networkx3.3
nltk3.8.1
numpy1.26.4
onnxruntime1.17.3
packaging24.0
pandas2.2.2
partd1.4.1
patsy0.5.6
pillow10.3.0
platformdirs4.2.1
plotly5.21.0
preshed3.0.9
protobuf5.26.1
psutil5.9.8
pycparser2.22
pydantic2.7.1
pydantic_core2.18.2
pyfpgrowth1.0
pyparsing3.1.2
pyreadline33.4.1
python-dateutil2.9.0.post0
pytz2024.1
PyWavelets1.6.0
PyYAML6.0.1
queuelib1.6.2
referencing0.35.0
regex2024.4.16
requests2.31.0
requests-file2.0.0
rpds-py0.18.0
scikit-learn1.4.2
scipy1.13.0
sip6.8.3
six1.16.0
smart-open6.4.0
snowballstemmer2.2.0
sortedcollections2.1.0
sortedcontainers2.4.0
spacy3.7.4
spacy-legacy3.0.12
spacy-loggers1.0.5
srsly2.4.8
statsmodels0.14.2
sympy1.12
tbb2021.12.0
tblib3.0.0
tenacity8.2.3
textdistance4.6.2
thinc8.2.3
threadpoolctl3.4.0
three-merge0.1.1
tldextract5.1.2
toolz0.12.1
tornado6.4
tqdm4.66.2
typer0.9.4
types-python-dateutil2.9.0.20240316
typing_extensions4.11.0
tzdata2024.1
ujson5.9.0
Unidecode1.3.8
urllib32.2.1
wasabi1.1.2
weasel0.3.4
Werkzeug3.0.2
xarray2024.3.0
zict3.0.0
zipp3.18.1
zstandard0.22.0

3.11.7 DL (Preview)

Python engine 3.11.7 + common data science and ML packages + deep learning packages (tensorflow & torch)

PackageVersion
absl-py2.1.0
alembic1.13.1
aniso86019.0.1
annotated-types0.6.0
anytree2.12.1
arch7.0.0
arrow1.3.0
astunparse1.6.3
attrs23.2.0
blinker1.7.0
blis0.7.11
Bottleneck1.3.8
Brotli1.1.0
brotlipy0.7.0
cachetools5.3.3
catalogue2.0.10
certifi2024.2.2
cffi1.16.0
chardet5.2.0
charset-normalizer3.3.2
click8.1.7
cloudpathlib0.16.0
cloudpickle3.0.0
colorama0.4.6
coloredlogs15.0.1
confection0.1.4
contourpy1.2.1
cycler0.12.1
cymem2.0.8
Cython3.0.10
daal2024.3.0
daal4py2024.3.0
dask2024.4.2
Deprecated1.2.14
diff-match-patch20230430
dill0.3.8
distributed2024.4.2
docker7.1.0
entrypoints0.4
filelock3.13.4
flashtext2.7
Flask3.0.3
Flask-Compress1.15
flatbuffers24.3.25
fonttools4.51.0
fsspec2024.3.1
gast0.5.4
gensim4.3.2
gitdb4.0.11
GitPython3.1.43
google-pasta0.2.0
graphene3.3
graphql-core3.2.3
graphql-relay3.2.0
greenlet3.0.3
grpcio1.64.0
h5py3.11.0
humanfriendly10.0
idna3.7
importlib-metadata7.0.0
iniconfig2.0.0
intervaltree3.1.0
itsdangerous2.2.0
jellyfish1.0.3
Jinja23.1.3
jmespath1.0.1
joblib1.4.0
json50.9.25
jsonschema4.21.1
jsonschema-specifications2023.12.1
keras3.3.3
kiwisolver1.4.5
langcodes3.4.0
language_data1.2.0
libclang18.1.1
locket1.0.0
lxml5.2.1
Mako1.3.5
marisa-trie1.1.0
Markdown3.6
markdown-it-py3.0.0
MarkupSafe2.1.5
matplotlib3.8.4
mdurl0.1.2
ml-dtypes0.3.2
mlflow2.13.0
mlxtend0.23.1
mpmath1.3.0
msgpack1.0.8
murmurhash1.0.10
namex0.0.8
networkx3.3
nltk3.8.1
numpy1.26.4
onnxruntime1.17.3
opentelemetry-api1.24.0
opentelemetry-sdk1.24.0
opentelemetry-semantic-conventions0.45b0
opt-einsum3.3.0
optree0.11.0
packaging24.0
pandas2.2.2
partd1.4.1
patsy0.5.6
pillow10.3.0
platformdirs4.2.1
plotly5.21.0
pluggy1.5.0
preshed3.0.9
protobuf4.25.3
psutil5.9.8
pyarrow15.0.2
pycparser2.22
pydantic2.7.1
pydantic_core2.18.2
pyfpgrowth1.0
Pygments2.18.0
pyparsing3.1.2
pyreadline33.4.1
pytest8.2.1
python-dateutil2.9.0.post0
pytz2024.1
PyWavelets1.6.0
pywin32306
PyYAML6.0.1
querystring-parser1.2.4
queuelib1.6.2
referencing0.35.0
regex2024.4.16
requests2.31.0
requests-file2.0.0
rich13.7.1
rpds-py0.18.0
rstl0.1.3
scikit-learn1.4.2
scipy1.13.0
seasonal0.3.1
sip6.8.3
six1.16.0
smart-open6.4.0
smmap5.0.1
snowballstemmer2.2.0
sortedcollections2.1.0
sortedcontainers2.4.0
spacy3.7.4
spacy-legacy3.0.12
spacy-loggers1.0.5
SQLAlchemy2.0.30
sqlparse0.5.0
srsly2.4.8
statsmodels0.14.2
sympy1.12
tbb2021.12.0
tblib3.0.0
tenacity8.2.3
tensorboard2.16.2
tensorboard-data-server0.7.2
tensorflow2.16.1
tensorflow-intel2.16.1
tensorflow-io-gcs-filesystem0.31.0
termcolor2.4.0
textdistance4.6.2
thinc8.2.3
threadpoolctl3.4.0
three-merge0.1.1
time-series-anomaly-detector0.2.7
tldextract5.1.2
toolz0.12.1
torch2.2.2
torchaudio2.2.2
torchvision0.17.2
tornado6.4
tqdm4.66.2
typer0.9.4
types-python-dateutil2.9.0.20240316
typing_extensions4.11.0
tzdata2024.1
ujson5.9.0
Unidecode1.3.8
urllib32.2.1
waitress3.0.0
wasabi1.1.2
weasel0.3.4
Werkzeug3.0.2
wrapt1.16.0
xarray2024.3.0
zict3.0.0
zipp3.18.1
zstandard0.22.0

3.10.8

Python engine 3.10.8 + common data science and ML packages

PackageVersion
alembic1.11.1
anytree2.8.0
arrow1.2.3
attrs22.2.0
blis0.7.9
Bottleneck1.3.5
Brotli1.0.9
brotlipy0.7.0
catalogue2.0.8
certifi2022.12.7
cffi1.15.1
chardet5.0.0
charset-normalizer2.1.1
click8.1.3
cloudpickle2.2.1
colorama0.4.6
coloredlogs15.0.1
confection0.0.4
contourpy1.0.7
cycler0.11.0
cymem2.0.7
Cython0.29.28
daal2021.6.0
daal4py2021.6.3
dask2022.10.2
databricks-cli0.17.7
diff-match-patch20200713
dill0.3.6
distributed2022.10.2
docker6.1.3
entrypoints0.4
filelock3.9.1
flashtext2.7
Flask2.2.3
Flask-Compress1.13
flatbuffers23.3.3
fonttools4.39.0
fsspec2023.3.0
gensim4.2.0
gitdb4.0.10
GitPython3.1.31
greenlet2.0.2
HeapDict1.0.1
humanfriendly10.0
idna3.4
importlib-metadata6.7.0
intervaltree3.1.0
itsdangerous2.1.2
jellyfish0.9.0
Jinja23.1.2
jmespath1.0.1
joblib1.2.0
json50.9.10
jsonschema4.16.0
kiwisolver1.4.4
langcodes3.3.0
locket1.0.0
lxml4.9.1
Mako1.2.4
Markdown3.4.3
MarkupSafe2.1.2
mlflow2.4.1
mlxtend0.21.0
mpmath1.3.0
msgpack1.0.5
murmurhash1.0.9
networkx2.8.7
nltk3.7
numpy1.23.4
oauthlib3.2.2
onnxruntime1.13.1
packaging23.0
pandas1.5.1
partd1.3.0
pathy0.10.1
patsy0.5.3
Pillow9.4.0
pip23.0.1
platformdirs2.5.2
plotly5.11.0
ply3.11
preshed3.0.8
protobuf4.22.1
psutil5.9.3
pyarrow12.0.1
pycparser2.21
pydantic1.10.6
pyfpgrowth1.0
PyJWT2.7.0
pyparsing3.0.9
pyreadline33.4.1
pyrsistent0.19.3
python-dateutil2.8.2
pytz2022.7.1
PyWavelets1.4.1
pywin32306
PyYAML6.0
querystring-parser1.2.4
queuelib1.6.2
regex2022.10.31
requests2.28.2
requests-file1.5.1
scikit-learn1.1.3
scipy1.9.3
setuptools67.6.0
sip6.7.3
six1.16.0
smart-open6.3.0
smmap5.0.0
snowballstemmer2.2.0
sortedcollections2.1.0
sortedcontainers2.4.0
spacy3.4.2
spacy-legacy3.0.12
spacy-loggers1.0.4
SQLAlchemy2.0.18
sqlparse0.4.4
srsly2.4.5
statsmodels0.13.2
sympy1.11.1
tabulate0.9.0
tbb2021.7.1
tblib1.7.0
tenacity8.2.2
textdistance4.5.0
thinc8.1.9
threadpoolctl3.1.0
three-merge0.1.1
tldextract3.4.0
toml0.10.2
toolz0.12.0
tornado6.1
tqdm4.65.0
typer0.4.2
typing_extensions4.5.0
ujson5.5.0
Unidecode1.3.6
urllib31.26.15
waitress2.1.2
wasabi0.10.1
websocket-client1.6.1
Werkzeug2.2.3
wheel0.40.0
xarray2022.10.0
zict2.2.0
zipp3.15.0

3.10.8 DL

Python engine 3.10.8 + common data science and ML packages + deep learning packages (tensorflow & torch)

PackageVersion
absl-py1.4.0
alembic1.11.1
anytree2.8.0
arrow1.2.3
astunparse1.6.3
attrs22.1.0
blis0.7.9
Bottleneck1.3.5
Brotli1.0.9
brotlipy0.7.0
cachetools5.3.0
catalogue2.0.8
certifi2022.9.24
cffi1.15.1
chardet5.0.0
charset-normalizer2.1.1
click8.1.3
cloudpickle2.2.0
colorama0.4.6
coloredlogs15.0.1
confection0.0.3
contourpy1.0.6
cycler0.11.0
cymem2.0.7
Cython0.29.28
daal2021.6.0
daal4py2021.6.3
dask2022.10.2
databricks-cli0.17.7
diff-match-patch20200713
dill0.3.6
distributed2022.10.2
docker6.1.3
entrypoints0.4
filelock3.8.0
flashtext2.7
Flask2.2.2
Flask-Compress1.13
flatbuffers22.10.26
fonttools4.38.0
fsspec2022.10.0
gast0.4.0
gensim4.2.0
gitdb4.0.10
GitPython3.1.31
google-auth2.16.2
google-auth-oauthlib0.4.6
google-pasta0.2.0
greenlet2.0.2
grpcio1.51.3
h5py3.8.0
HeapDict1.0.1
humanfriendly10.0
idna3.4
importlib-metadata6.7.0
intervaltree3.1.0
itsdangerous2.1.2
jax0.4.6
jellyfish0.9.0
Jinja23.1.2
jmespath1.0.1
joblib1.2.0
json50.9.10
jsonschema4.16.0
keras2.12.0
kiwisolver1.4.4
langcodes3.3.0
libclang16.0.0
locket1.0.0
lxml4.9.1
Mako1.2.4
Markdown3.4.2
MarkupSafe2.1.1
mlflow2.4.1
mlxtend0.21.0
mpmath1.2.1
msgpack1.0.4
murmurhash1.0.9
networkx2.8.7
nltk3.7
numpy1.23.4
oauthlib3.2.2
onnxruntime1.13.1
opt-einsum3.3.0
packaging21.3
pandas1.5.1
partd1.3.0
pathy0.6.2
patsy0.5.3
Pillow9.3.0
pip23.0.1
platformdirs2.5.2
plotly5.11.0
ply3.11
preshed3.0.8
protobuf4.21.9
psutil5.9.3
pyarrow12.0.1
pyasn10.4.8
pyasn1-modules0.2.8
pycparser2.21
pydantic1.10.2
pyfpgrowth1.0
PyJWT2.7.0
pyparsing3.0.9
pyreadline33.4.1
pyrsistent0.19.1
python-dateutil2.8.2
pytz2022.5
PyWavelets1.4.1
pywin32306
PyYAML6.0
querystring-parser1.2.4
queuelib1.6.2
regex2022.10.31
requests2.28.1
requests-file1.5.1
requests-oauthlib1.3.1
rsa4.9
scikit-learn1.1.3
scipy1.9.3
setuptools67.6.0
sip6.7.3
six1.16.0
smart-open5.2.1
smmap5.0.0
snowballstemmer2.2.0
sortedcollections2.1.0
sortedcontainers2.4.0
spacy3.4.2
spacy-legacy3.0.10
spacy-loggers1.0.3
SQLAlchemy2.0.18
sqlparse0.4.4
srsly2.4.5
statsmodels0.13.2
sympy1.11.1
tabulate0.9.0
tbb2021.7.0
tblib1.7.0
tenacity8.1.0
tensorboard2.12.0
tensorboard-data-server0.7.0
tensorboard-plugin-wit1.8.1
tensorflow2.12.0
tensorflow-estimator2.12.0
tensorflow-intel2.12.0
tensorflow-io-gcs-filesystem0.31.0
termcolor2.2.0
textdistance4.5.0
thinc8.1.5
threadpoolctl3.1.0
three-merge0.1.1
tldextract3.4.0
toml0.10.2
toolz0.12.0
torch2.0.0
torchaudio2.0.1
torchvision0.15.1
tornado6.1
tqdm4.64.1
typer0.4.2
typing_extensions4.4.0
ujson5.5.0
Unidecode1.3.6
urllib31.26.12
waitress2.1.2
wasabi0.10.1
websocket-client1.6.1
Werkzeug2.2.2
wheel0.40.0
wrapt1.14.1
xarray2022.10.0
zict2.2.0
zipp3.15.0

3.6.5 (Legacy)

PackageVersion
adal1.2.0
anaconda_navigator1.8.7
anytree2.8.0
argparse1.1
asn1crypto0.24.0
astor0.7.1
astroid1.6.3
astropy3.0.2
attr18.1.0
babel2.5.3
backcall0.1.0
bitarray0.8.1
bleach2.1.3
bokeh0.12.16
boto2.48.0
boto31.9.109
botocore1.12.109
bottleneck1.2.1
bs44.6.0
certifi2018.04.16
cffi1.11.5
cgi2.6
chardet3.0.4
click6.7
cloudpickle0.5.3
clyent1.2.2
colorama0.3.9
conda4.5.4
conda_build3.10.5
conda_env4.5.4
conda_verify2.0.0
Crypto2.6.1
cryptography2.2.2
csv1
ctypes1.1.0
cycler0.10.0
cython0.28.2
Cython0.28.2
cytoolz0.9.0.1
dask0.17.5
datashape0.5.4
dateutil2.7.3
decimal1.7
decorator4.3.0
dill0.2.8.2
distributed1.21.8
distutils3.6.5
docutils0.14
entrypoints0.2.3
et_xmlfile1.0.1
fastcache1.0.2
filelock3.0.4
flask1.0.2
flask_cors3.0.4
future0.17.1
gensim3.7.1
geohash0.8.5
gevent1.3.0
glob2“(0, 6)”
greenlet0.4.13
h5py2.7.1
html5lib1.0.1
idna2.6
imageio2.3.0
imaplib2.58
ipaddress1
IPython6.4.0
ipython_genutils0.2.0
isort4.3.4
jdcal1.4
jedi0.12.0
jinja22.1
jmespath0.9.4
joblib0.13.0
json2.0.9
jsonschema2.6.0
jupyter_core4.4.0
jupyterlab0.32.1
jwt1.7.1
keras2.2.4
keras_applications1.0.6
keras_preprocessing1.0.5
kiwisolver1.0.1
lazy_object_proxy1.3.1
llvmlite0.23.1
logging0.5.1.2
markdown3.0.1
markupsafe1
matplotlib2.2.2
mccabe0.6.1
menuinst1.4.14
mistune0.8.3
mkl1.1.2
mlxtend0.15.0.0
mpmath1.0.0
msrest0.6.2
msrestazure0.6.0
multipledispatch0.5.0
navigator_updater0.2.1
nbconvert5.3.1
nbformat4.4.0
networkx2.1
nltk3.3
nose1.3.7
notebook5.5.0
numba0.38.0
numexpr2.6.5
numpy1.19.1
numpydoc0.8.0
oauthlib2.1.0
olefile0.45.1
onnxruntime1.4.0
openpyxl2.5.3
OpenSSL18.0.0
optparse1.5.3
packaging17.1
pandas0.24.1
parso0.2.0
past0.17.1
path11.0.1
patsy0.5.0
pep81.7.1
phonenumbers8.10.6
pickleshare0.7.4
PIL5.1.0
pint0.8.1
pip21.3.1
plac0.9.6
platform1.0.8
plotly4.8.2
pluggy0.6.0
ply3.11
prompt_toolkit1.0.15
psutil5.4.5
py1.5.3
pycodestyle2.4.0
pycosat0.6.3
pycparser2.18
pyflakes1.6.0
pyfpgrowth1
pygments2.2.0
pylint1.8.4
pyparsing2.2.0
pytest3.5.1
pytest_arraydiff0.2
pytz2018.4
pywt0.5.2
qtconsole4.3.1
re2.2.1
regex2.4.136
requests2.18.4
requests_oauthlib1.0.0
ruamel_yaml0.15.35
s3transfer0.2.0
sandbox_utils1.2
scipy1.1.0
scrubadub1.2.0
setuptools39.1.0
six1.11.0
sklearn0.20.3
socketserver0.4
socks1.6.7
sortedcollections0.6.1
sortedcontainers1.5.10
spacy2.0.18
sphinx1.7.4
spyder3.2.8
sqlalchemy1.2.7
statsmodels0.9.0
surprise1.0.6
sympy1.1.1
tables3.4.3
tabnanny6
tblib1.3.2
tensorflow1.12.0
terminado0.8.1
testpath0.3.1
textblob0.10.0
tlz0.9.0.1
toolz0.9.0
torch1.0.0
tqdm4.31.1
traitlets4.3.2
ujson1.35
unicodecsv0.14.1
urllib31.22
werkzeug0.14.1
wheel0.31.1
widgetsnbextension3.2.1
win32rcparser0.11
winpty0.5.1
wrapt1.10.11
xgboost0.81
xlsxwriter1.0.4
yaml3.12
zict0.1.3

9.3.3 - R plugin (Preview)

Learn how to use the R plugin (Preview) to run a user-defined function using an R script.

The R plugin runs a user-defined function (UDF) using an R script.

The script gets tabular data as its input, and produces tabular output. The plugin’s runtime is hosted in a sandbox on the cluster’s nodes. The sandbox provides an isolated and secure environment.

Syntax

T | evaluate [hint.distribution = (single | per_node)] r(output_schema, script [, script_parameters] [, external_artifacts])

Parameters

NameTypeRequiredDescription
output_schemastring✔️A type literal that defines the output schema of the tabular data, returned by the R code. The format is: typeof(ColumnName: ColumnType[, …]). For example: typeof(col1:string, col2:long). To extend the input schema, use the following syntax: typeof(*, col1:string, col2:long).
scriptstring✔️The valid R script to be executed.
script_parametersdynamicA property bag of name and value pairs to be passed to the R script as the reserved kargs dictionary. For more information, see Reserved R variables.
hint.distributionstringHint for the plugin’s execution to be distributed across multiple cluster nodes. The default value is single. single means that a single instance of the script will run over the entire query data. per_node means that if the query before the R block is distributed, an instance of the script will run on each node over the data that it contains.
external_artifactsdynamicA property bag of name and URL pairs for artifacts that are accessible from cloud storage. They can be made available for the script to use at runtime. URLs referenced in this property bag are required to be included in the cluster’s callout policy and in a publicly available location, or contain the necessary credentials, as explained in storage connection strings. The artifacts are made available for the script to consume from a local temporary directory, .\Temp. The names provided in the property bag are used as the local file names. See Example. For more information, see Install packages for the R plugin.

Reserved R variables

The following variables are reserved for interaction between Kusto Query Language and the R code:

  • df: The input tabular data (the values of T above), as an R DataFrame.
  • kargs: The value of the script_parameters argument, as an R dictionary.
  • result: An R DataFrame created by the R script. The value becomes the tabular data that gets sent to any Kusto query operator that follows the plugin.

Enable the plugin

R sandbox image

Examples

range x from 1 to 360 step 1
| evaluate r(
//
typeof(*, fx:double),               //  Output schema: append a new fx column to original table 
//
'result <- df\n'                    //  The R decorated script
'n <- nrow(df)\n'
'g <- kargs$gain\n'
'f <- kargs$cycles\n'
'result$fx <- g * sin(df$x / n * 2 * pi * f)'
//
, bag_pack('gain', 100, 'cycles', 4)    //  dictionary of parameters
)
| render linechart 

Sine demo.

Performance tips

  • Reduce the plugin’s input dataset to the minimum amount required (columns/rows).

  • Use filters on the source dataset using the Kusto Query Language, when possible.

  • To make a calculation on a subset of the source columns, project only those columns before invoking the plugin.

  • Use hint.distribution = per_node whenever the logic in your script is distributable.

  • You can also use the partition operator for partitioning the input data et.

  • Whenever possible, use the Kusto Query Language to implement the logic of your R script.

    For example:

    .show operations
    | where StartedOn > ago(1d) // Filtering out irrelevant records before invoking the plugin
    | project d_seconds = Duration / 1s // Projecting only a subset of the necessary columns
    | evaluate hint.distribution = per_node r( // Using per_node distribution, as the script's logic allows it
        typeof(*, d2:double),
        'result <- df\n'
        'result$d2 <- df$d_seconds\n' // Negative example: this logic should have been written using Kusto's query language
      )
    | summarize avg = avg(d2)
    

Usage tips

  • To avoid conflicts between Kusto string delimiters and R string delimiters:

    • Use single quote characters (') for Kusto string literals in Kusto queries.
    • Use double quote characters (") for R string literals in R scripts.
  • Use the external data operator to obtain the content of a script that you’ve stored in an external location, such as Azure blob storage or a public GitHub repository.

    For example:

    let script = 
        externaldata(script:string)
        [h'https://kustoscriptsamples.blob.core.windows.net/samples/R/sample_script.r']
        with(format = raw);
    range x from 1 to 360 step 1
    | evaluate r(
        typeof(*, fx:double),
        toscalar(script), 
        bag_pack('gain', 100, 'cycles', 4))
    | render linechart 
    

Install packages for the R plugin

Follow these step by step instructions to install package(s) that aren’t included in the plugin’s base image.

Prerequisites

  1. Create a blob container to host the packages, preferably in the same place as your cluster. For example, https://artifactswestus.blob.core.windows.net/r, assuming your cluster is in West US.

  2. Alter the cluster’s callout policy to allow access to that location.

    • This change requires AllDatabasesAdmin permissions.

    • For example, to enable access to a blob located in https://artifactswestus.blob.core.windows.net/r, run the following command:

    .alter-merge cluster policy callout @'[ { "CalloutType": "sandbox_artifacts", "CalloutUriRegex": "artifactswestus\\.blob\\.core\\.windows\\.net/r/","CanCall": true } ]'
    

Install packages

The example snips below assume local R machine on Windows environment.

  1. Verify you’re using the appropriate R version – current R Sandbox version is 3.4.4:

    > R.Version()["version.string"]
    
    $version.string
    [1] "R version 3.4.4 (2018-03-15)"
    

    If needed you can download it from here.

  2. Launch the x64 RGui

  3. Create a new empty folder to be populated with all the relevant packages you would like to install. In this example we install the brglm2 package, so creating “C:\brglm2”.

  4. Add the newly created folder path to lib paths:

    > .libPaths("C://brglm2")
    
  5. Verify that the new folder is now the first path in .libPaths():

    > .libPaths()
    
    [1] "C:/brglm2"    "C:/Program Files/R/R-3.4.4/library"
    
  6. Once this setup is done, any package that we install shall be added to this new folder. Let’s install the requested package and its dependencies:

    > install.packages("brglm2")
    

    In case the question “Do you want to install from sources the packages which need compilation?” pops up, answer “Y”.

  7. Verify that new folders were added to “C:\brglm2”:

    Screenshot of library directory content.

  8. Select all items in that folder and zip them to e.g. libs.zip (do not zip the parent folder). You should get an archive structure like this:

    libs.zip:

    • brglm2 (folder)
    • enrichwith (folder)
    • numDeriv (folder)
  9. Upload libs.zip to the blob container that was set above

  10. Call the r plugin.

    • Specify the external_artifacts parameter with a property bag of name and reference to the ZIP file (the blob’s URL, including a SAS token).
    • In your inline r code, import zipfile from sandboxutils and call its install() method with the name of the ZIP file.

Example

Install the brglm2 package:

print x=1
| evaluate r(typeof(*, ver:string),
    'library(sandboxutils)\n'
    'zipfile.install("brglm2.zip")\n'
    'library("brglm2")\n'
    'result <- df\n'
    'result$ver <-packageVersion("brglm2")\n'
    ,external_artifacts=bag_pack(brglm2.zip', 'https://artifactswestus.blob.core.windows.net/r/libs.zip?*** REPLACE WITH YOUR SAS TOKEN ***'))
xver
11.8.2

Make sure that the archive’s name (first value in pack pair) has the *.zip suffix to prevent collisions when unzipping folders whose name is identical to the archive name.

9.4 - Machine learning plugins

9.4.1 - autocluster plugin

Learn how to use the autocluster plugin to find common patterns in data.

autocluster finds common patterns of discrete attributes (dimensions) in the data. It then reduces the results of the original query, whether it’s 100 or 100,000 rows, to a few patterns. The plugin was developed to help analyze failures (such as exceptions or crashes) but can potentially work on any filtered dataset. The plugin is invoked with the evaluate operator.

Syntax

T | evaluate autocluster ([SizeWeight [, WeightColumn [, NumSeeds [, CustomWildcard [, … ]]]]])

Parameters

The parameters must be ordered as specified in the syntax. To indicate that the default value should be used, put the string tilde value ~. For more information, see Examples.

NameTypeRequiredDescription
Tstring✔️The input tabular expression.
SizeWeightdoubleA double between 0 and 1 that controls the balance between generic (high coverage) and informative (many shared) values. Increasing this value typically reduces the quantity of patterns while expanding coverage. Conversely, decreasing this value generates more specific patterns characterized by increased shared values and a smaller percentage coverage. The default is 0.5. The formula is a weighted geometric mean with weights SizeWeight and 1-SizeWeight.
WeightColumnstringConsiders each row in the input according to the specified weight. Each row has a default weight of 1. The argument must be a name of a numeric integer column. A common usage of a weight column is to take into account sampling or bucketing or aggregation of the data that is already embedded into each row.
NumSeedsintDetermines the number of initial local search points. Adjusting the number of seeds impacts result quantity or quality based on data structure. Increasing seeds can enhance results but with a slower query tradeoff. Decreasing below five yields negligible improvements, while increasing above 50 rarely generates more patterns. The default is 25.
CustomWildcardstringA type literal that sets the wildcard value for a specific type in the results table, indicating no restriction on this column. The default is null, which represents an empty string. If the default is a good value in the data, a different wildcard value should be used, such as *. You can include multiple custom wildcards by adding them consecutively.

Returns

The autocluster plugin usually returns a small set of patterns. The patterns capture portions of the data with shared common values across multiple discrete attributes. Each pattern in the results is represented by a row.

The first column is the segment ID. The next two columns are the count and percentage of rows out of the original query that are captured by the pattern. The remaining columns are from the original query. Their value is either a specific value from the column, or a wildcard value (which are by default null) meaning variable values.

The patterns aren’t distinct, may be overlapping, and usually don’t cover all the original rows. Some rows may not fall under any pattern.

Examples

Using evaluate

T | evaluate autocluster()

Using autocluster

StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State , EventType , Damage
| evaluate autocluster(0.6)

Output

SegmentIdCountPercentStateEventTypeDamage
0227838.7HailNO
15128.7Thunderstorm WindYES
289815.3TEXAS

Using custom wildcards

StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State , EventType , Damage
| evaluate autocluster(0.2, '~', '~', '*')

Output

SegmentIdCountPercentStateEventTypeDamage
0227838.7*HailNO
15128.7*Thunderstorm WindYES
289815.3TEXAS**

9.4.2 - basket plugin

Learn how to use the basket plugin to find frequent patterns in data that exceed a frequency threshold.

The basket plugin finds frequent patterns of attributes in the data and returns the patterns that pass a frequency threshold in that data. A pattern represents a subset of the rows that have the same value across one or more columns. The basket plugin is based on the Apriori algorithm originally developed for basket analysis data mining.

Syntax

T | evaluate basket ([ Threshold, WeightColumn, MaxDimensions, CustomWildcard, CustomWildcard, … ])

Parameters

NameTypeRequiredDescription
ThresholdlongA double in the range of 0.015 to 1 that sets the minimal ratio of the rows to be considered frequent. Patterns with a smaller ratio won’t be returned. The default value is 0.05. To use the default value, input the tilde: ~.

Example: `T
WeightColumnstringThe column name to use to consider each row in the input according to the specified weight. Must be a name of a numeric type column, such as int, long, real. By default, each row has a weight of 1. To use the default value, input the tilde: ~. A common use of a weight column is to take into account sampling or bucketing/aggregation of the data that is already embedded into each row.

Example: `T
MaxDimensionsintSets the maximal number of uncorrelated dimensions per basket, limited by default, to minimize the query runtime. The default is 5. To use the default value, input the tilde: ~.

Example: `T
CustomWildcardstringSets the wildcard value for a specific type in the result table that will indicate that the current pattern doesn’t have a restriction on this column. The default is null except for string columns whose default value is an empty string. If the default is a good value in the data, a different wildcard value should be used, such as *. To use the default value, input the tilde: ~.

Example: `T

Returns

The basket plugin returns frequent patterns that pass a ratio threshold. The default threshold is 0.05.

Each pattern is represented by a row in the results. The first column is the segment ID. The next two columns are the count and percentage of rows, from the original query that match the pattern. The remaining columns relate to the original query, with either a specific value from the column or a wildcard value, which is by default null, meaning a variable value.

Example

StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State, EventType, Damage, DamageCrops
| evaluate basket(0.2)

Output

SegmentIdCountPercentStateEventTypeDamageDamageCrops
0457477.7NO0
1227838.7HailNO0
2567596.40
3237140.3Hail0
4127921.7Thunderstorm Wind0
5246841.9Hail
6131022.3YES
7129121.9Thunderstorm Wind

Example with custom wildcards

StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State, EventType, Damage, DamageCrops
| evaluate basket(0.2, '~', '~', '*', int(-1))

Output

SegmentIdCountPercentStateEventTypeDamageDamageCrops
0457477.7**NO0
1227838.7*HailNO0
2567596.4***0
3237140.3*Hail*0
4127921.7*Thunderstorm Wind*0
5246841.9*Hail*-1
6131022.3**YES-1
7129121.9*Thunderstorm Wind*-1

9.4.3 - diffpatterns plugin

Learn how to use the diffpatterns plugin to compare two datasets of the same structure to find the differences between the two datasets.

Compares two datasets of the same structure and finds patterns of discrete attributes (dimensions) that characterize differences between the two datasets. The plugin is invoked with the evaluate operator.

diffpatterns was developed to help analyze failures (for example, by comparing failures to non-failures in a given time frame), but can potentially find differences between any two datasets of the same structure.

Syntax

T | evaluate diffpatterns(SplitColumn, SplitValueA, SplitValueB [, WeightColumn,Threshold, MaxDimensions, CustomWildcard, …])

Parameters

NameTypeRequiredDescription
SplitColumnstring✔️The column name that tells the algorithm how to split the query into datasets. According to the specified values for the SplitValueA and SplitValueB arguments, the algorithm splits the query into two datasets, “A” and “B”, and analyzes the differences between them. As such, the split column must have at least two distinct values.
SplitValueAstring✔️A string representation of one of the values in the SplitColumn that was specified. All the rows that have this value in their SplitColumn considered as dataset “A”.
SplitValueBstring✔️A string representation of one of the values in the SplitColumn that was specified. All the rows that have this value in their SplitColumn considered as dataset “B”.
WeightColumnstringThe column used to consider each row in the input according to the specified weight. Must be a name of a numeric column, such as int, long, real. By default each row has a weight of ‘1’. To use the default value, input the tilde: ~. A common usage of a weight column is to take into account sampling or bucketing/aggregation of the data that is already embedded into each row.

Example: `T
ThresholdrealA real in the range of 0.015 to 1. This value sets the minimal pattern ratio difference between the two sets. The default is 0.05. To use the default value, input the tilde: ~.

Example: `T
MaxDimensionsintSets the maximum number of uncorrelated dimensions per result pattern. By specifying a limit, you decrease the query runtime. The default is unlimited. To use the default value, input the tilde: ~.

Example: `T
CustomWildcardstringSets the wildcard value for a specific type in the result table that will indicate that the current pattern doesn’t have a restriction on this column. The default is null, except for string columns for which the default is an empty string. If the default is a viable value in the data, a different wildcard value should be used. For example, *. To use the default value, input the tilde: ~.

Example: `T

Returns

diffpatterns returns a small set of patterns that capture different portions of the data in the two sets (that is, a pattern capturing a large percentage of the rows in the first dataset and low percentage of the rows in the second set). Each pattern is represented by a row in the results.

The result of diffpatterns returns the following columns:

  • SegmentId: the identity assigned to the pattern in the current query (note: IDs aren’t guaranteed to be the same in repeating queries).

  • CountA: the number of rows captured by the pattern in Set A (Set A is the equivalent of where tostring(splitColumn) == SplitValueA).

  • CountB: the number of rows captured by the pattern in Set B (Set B is the equivalent of where tostring(splitColumn) == SplitValueB).

  • PercentA: the percentage of rows in Set A captured by the pattern (100.0 * CountA / count(SetA)).

  • PercentB: the percentage of rows in Set B captured by the pattern (100.0 * CountB / count(SetB)).

  • PercentDiffAB: the absolute percentage point difference between A and B (|PercentA - PercentB|) is the main measure of significance of patterns in describing the difference between the two sets.

  • Rest of the columns: are the original schema of the input and describe the pattern, each row (pattern) represents the intersection of the non-wildcard values of the columns (equivalent of where col1==val1 and col2==val2 and ... colN=valN for each non-wildcard value in the row).

For each pattern, columns that aren’t set in the pattern (that is, without restriction on a specific value) will contain a wildcard value, which is null by default. See in the Arguments section below how wildcards can be manually changed.

  • Note: the patterns are often not distinct. They may be overlapping, and usually don’t cover all the original rows. Some rows may not fall under any pattern.

Example

StormEvents 
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , 1 , 0)
| project State , EventType , Source , Damage, DamageCrops
| evaluate diffpatterns(Damage, "0", "1" )

Output

SegmentIdCountACountBPercentAPercentBPercentDiffABStateEventTypeSourceDamageCrops
022789349.87.142.7Hail0
177951217.0339.0822.05Thunderstorm Wind
2109811824.019.0115Trained Spotter0
31361582.9712.069.09Newspaper
43592147.8516.348.49Flash Flood
5501221.099.318.22IOWA
665527914.3221.36.98Law Enforcement
71501173.288.935.65Flood
83621767.9113.445.52Emergency Manager

9.4.4 - diffpatterns_text plugin

Learn how to use the diffpatterns_text plugin to compare two string value datasets to find the differences between the two datasets.

Compares two datasets of string values and finds text patterns that characterize differences between the two datasets. The plugin is invoked with the evaluate operator.

The diffpatterns_text returns a set of text patterns that capture different portions of the data in the two sets. For example, a pattern capturing a large percentage of the rows when the condition is true and low percentage of the rows when the condition is false. The patterns are built from consecutive tokens separated by white space, with a token from the text column or a * representing a wildcard. Each pattern is represented by a row in the results.

Syntax

T | evaluate diffpatterns_text(TextColumn, BooleanCondition [, MinTokens, Threshold , MaxTokens])

Parameters

NameTypeRequiredDescription
TextColumnstring✔️The text column to analyze.
BooleanConditionstring✔️An expression that evaluates to a boolean value. The algorithm splits the query into the two datasets to compare based on this expression.
MinTokensintAn integer value between 0 and 200 that represents the minimal number of non-wildcard tokens per result pattern. The default is 1.
ThresholddecimalA decimal value between 0.015 and 1 that sets the minimal pattern ratio difference between the two sets. Default is 0.05. See diffpatterns.
MaxTokensintAn integer value between 0 and 20 that sets the maximal number of tokens per result pattern, specifying a lower limit decreases the query runtime.

Returns

The result of diffpatterns_text returns the following columns:

  • Count_of_True: The number of rows matching the pattern when the condition is true.
  • Count_of_False: The number of rows matching the pattern when the condition is false.
  • Percent_of_True: The percentage of rows matching the pattern from the rows when the condition is true.
  • Percent_of_False: The percentage of rows matching the pattern from the rows when the condition is false.
  • Pattern: The text pattern containing tokens from the text string and ‘*’ for wildcards.

Example

The following example uses data from the StormEvents table in the help cluster. To access this data, sign in to https://dataexplorer.azure.com/clusters/help/databases/Samples. In the left menu, browse to help > Samples > Tables > Storm_Events.

The examples in this tutorial use the StormEvents table, which is publicly available in the Weather analytics sample data.

StormEvents     
| where EventNarrative != "" and monthofyear(StartTime) > 1 and monthofyear(StartTime) < 9
| where EventType == "Drought" or EventType == "Extreme Cold/Wind Chill"
| evaluate diffpatterns_text(EpisodeNarrative, EventType == "Extreme Cold/Wind Chill", 2)

Output

Count_of_TrueCount_of_FalsePercent_of_TruePercent_of_FalsePattern
1106.290Winds shifting northwest in * wake * a surface trough brought heavy lake effect snowfall downwind * Lake Superior from
905.140Canadian high pressure settled * * region * produced the coldest temperatures since February * 2006. Durations * freezing temperatures
03406.24* * * * * * * * * * * * * * * * * * West Tennessee,
04207.71* * * * * * caused * * * * * * * * across western Colorado. *
04508.26* * below normal *
0110020.18Below normal *

9.5 - Query connectivity plugins

9.5.1 - ai_embed_text plugin (Preview)

Learn how to use the ai_embed_text plugin to embed text via language models, enabling various AI-related scenarios such as RAG application and semantic search.

The ai_embed_text plugin allows embedding of text using language models, enabling various AI-related scenarios such as Retrieval Augmented Generation (RAG) applications and semantic search. The plugin supports Azure OpenAI Service embedding models accessed using managed identity.

Prerequisites

Syntax

evaluate ai_embed_text (text, connectionString [, options [, IncludeErrorMessages]])

Parameters

NameTypeRequiredDescription
textstring✔️The text to embed. The value can be a column reference or a constant scalar.
connectionStringstring✔️The connection string for the language model in the format <ModelDeploymentUri>;<AuthenticationMethod>; replace <ModelDeploymentUri> and <AuthenticationMethod> with the AI model deployment URI and the authentication method respectively.
optionsdynamicThe options that control calls to the embedding model endpoint. See Options.
IncludeErrorMessagesboolIndicates whether to output errors in a new column in the output table. Default value: false.

Options

The following table describes the options that control the way the requests are made to the embedding model endpoint.

NameTypeDescription
RecordsPerRequestintSpecifies the number of records to process per request. Default value: 1.
CharsPerRequestintSpecifies the maximum number of characters to process per request. Default value: 0 (unlimited). Azure OpenAI counts tokens, with each token approximately translating to four characters.
RetriesOnThrottlingintSpecifies the number of retry attempts when throttling occurs. Default value: 0.
GlobalTimeouttimespanSpecifies the maximum time to wait for a response from the embedding model. Default value: null
ModelParametersdynamicParameters specific to the embedding model, such as embedding dimensions or user identifiers for monitoring purposes. Default value: null.

Configure managed identity and callout policies

To use the ai_embed_text plugin, you must configure the following policies:

  • managed identity: Allow the system-assigned managed identity to authenticate to Azure OpenAI services.
  • callout: Authorize the AI model endpoint domain.

To configure these policies, use the commands in the following steps:

  1. Configure the managed identity:

    .alter-merge cluster policy managed_identity
    ```
    [
      {
        "ObjectId": "system",
        "AllowedUsages": "AzureAI"
      }
    ]
    ```
    
  2. Configure the callout policy:

    .alter-merge cluster policy callout
    ```
    [
        {
            "CalloutType": "azure_openai",
            "CalloutUriRegex": "https://[A-Za-z0-9\\-]{3,63}\\.openai\\.azure\\.com/.*",
            "CanCall": true
        }
    ]
    ```
    

Returns

Returns the following new embedding columns:

  • A column with the _embedding suffix that contains the embedding values
  • If configured to return errors, a column with the _embedding_error suffix, which contains error strings or is left empty if the operation is successful.

Depending on the input type, the plugin returns different results:

  • Column reference: Returns one or more records with additional columns are prefixed by the reference column name. For example, if the input column is named TextData, the output columns are named TextData_embedding and, if configured to return errors, TextData_embedding_error.
  • Constant scalar: Returns a single record with additional columns that are not prefixed. The column names are _embedding and, if configured to return errors, _embedding_error.

Examples

The following example embeds the text Embed this text using AI using the Azure OpenAI Embedding model.

let expression = 'Embed this text using AI';
let connectionString = 'https://myaccount.openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2024-06-01;managed_identity=system';
evaluate ai_embed_text(expression, connectionString)

The following example embeds multiple texts using the Azure OpenAI Embedding model.

let connectionString = 'https://myaccount.openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2024-06-01;managed_identity=system';
let options = dynamic({
    "RecordsPerRequest": 10,
    "CharsPerRequest": 10000,
    "RetriesOnThrottling": 1,
    "GlobalTimeout": 2m
});
datatable(TextData: string)
[
    "First text to embed",
    "Second text to embed",
    "Third text to embed"
]
| evaluate ai_embed_text(TextData, connectionString, options , true)

Best practices

Azure OpenAI embedding models are subject to heavy throttling, and frequent calls to this plugin can quickly reach throttling limits.

To efficiently use the ai_embed_text plugin while minimizing throttling and costs, follow these best practices:

  • Control request size: Adjust the number of records (RecordsPerRequest) and characters per request (CharsPerRequest).
  • Control query timeout: Set GlobalTimeout to a value lower than the query timeout to ensure progress isn’t lost on successful calls up to that point.
  • Handle rate limits more gracefully: Set retries on throttling (RetriesOnThrottling).

9.5.2 - azure_digital_twins_query_request plugin

Learn how to use the azure_digital_twins_query_request plugin to run an Azure Digital Twins query as part of a Kusto query.

The azure_digital_twins_query_request plugin runs an Azure Digital Twins query as part of a Kusto Query Language (KQL) query. The plugin is invoked with the evaluate operator.

Using the plugin, you can query across data in both Azure Digital Twins and any data source accessible through KQL. For example, you can perform time series analytics.

For more information about the plugin, see Azure Digital Twins query plugin.

Syntax

evaluate azure_digital_twins_query_request ( AdtInstanceEndpoint , AdtQuery )

Parameters

NameTypeRequiredDescription
AdtInstanceEndpointstring✔️The Azure Digital Twins instance endpoint to be queried.
AdtQuerystring✔️The query to run against the Azure Digital Twins endpoint. This query is written in a custom SQL-like query language for Azure Digital Twins, called the Azure Digital Twins query language. For more information, see Query language for Azure Digital Twins.

Authentication and authorization

The azure_digital_twins_query_request plugin uses the Microsoft Entra account of the user running the query to authenticate. To run a query, a user must at least be granted the Azure Digital Twins Data Reader role. Information on how to assign this role can be found in Security for Azure Digital Twins solutions.

Examples

The following examples show how you can run various Azure Digital Twins queries, including queries that use additional Kusto expressions.

Retrieval of all twins within an Azure Digital Twins instance

The following example returns all digital twins within an Azure Digital Twins instance.

evaluate azure_digital_twins_query_request(
  'https://contoso.api.wcus.digitaltwins.azure.net',
  'SELECT T AS Twins FROM DIGITALTWINS T')

Screenshot of the twins present in the Azure Digital Twins instance.

Projection of twin properties as columns along with additional Kusto expressions

The following example returns the result from the plugin as separate columns, and then performs additional operations using Kusto expressions.

evaluate azure_digital_twins_query_request(
  'https://contoso.api.wcus.digitaltwins.azure.net',
  'SELECT T.Temperature, T.Humidity FROM DIGITALTWINS T WHERE IS_PRIMITIVE(T.Temperature) AND IS_PRIMITIVE(T.Humidity)')
| where Temperature > 20
| project TemperatureInC = Temperature, Humidity

Output

TemperatureInCHumidity
2148
4934
8032

Perform time series analytics

You can use the data history integration feature of Azure Digital Twins to historize digital twin property updates. To learn how to view the historized twin updates, see View the historized twin updates

9.5.3 - cosmosdb_sql_request plugin

Learn how to use the cosmosdb_sql_request plugin to send a SQL query to an Azure Cosmos DB SQL network endpoint to query small datasets.

The cosmosdb_sql_request plugin sends a SQL query to an Azure Cosmos DB SQL network endpoint and returns the results of the query. This plugin is primarily designed for querying small datasets, for example, enriching data with reference data stored in Azure Cosmos DB. The plugin is invoked with the evaluate operator.

Syntax

evaluate cosmosdb_sql_request ( ConnectionString , SqlQuery [, SqlParameters [, Options]] ) [: OutputSchema]

Parameters

NameTypeRequiredDescription
ConnectionStringstring✔️The connection string that points to the Azure Cosmos DB collection to query. It must include AccountEndpoint, Database, and Collection. It might include AccountKey if a master key is used for authentication. For more information, see Authentication and authorization.
Example: 'AccountEndpoint=https://cosmosdbacc.documents.azure.com/;Database=<MyDatabase>;Collection=<MyCollection>;AccountKey='h'<AccountKey>'
SqlQuerystring✔️The query to execute.
SqlParametersdynamicThe property bag object to pass as parameters along with the query. Parameter names must begin with @.
OutputSchemaThe names and types of the expected columns of the cosmosdb_sql_request plugin output. Use the following syntax: ( ColumnName : ColumnType [, …] ). Specifying this parameter enables multiple query optimizations.
OptionsdynamicA property bag object of advanced settings. If an AccountKey isn’t provided in the ConnectionString, then the armResourceId field of this parameter is required. For more information, see Supported options.

Supported options

The following table describes the supported fields of the Options parameter.

NameTypeDescription
armResourceIdstringThe Azure Resource Manager resource ID of the Cosmos DB database. If an account key isn’t provided in the connection string argument, this field is required. In such a case, the armResourceId is used to authenticate to Cosmos DB.
Example: armResourceId='/subscriptions/<SubscriptionId>/resourceGroups/<ResourceGroup>/providers/Microsoft.DocumentDb/databaseAccounts/<DatabaseAccount>'
tokenstringA Microsoft Entra access token of a principal with access to the Cosmos DB database. This token is used along with the armResourceId to authenticate with the Azure Resource Manager. If unspecified, the token of the principal that made the query is used.

If armResourceId isn’t specified, the token is used directly to access the Cosmos DB database. For more information about the token authentication method, see Authentication and authorization.
preferredLocationsstringThe region from which to query the data.
Example: ['East US']

Authentication and authorization

To authorize to an Azure Cosmos DB SQL network endpoint, you need to specify the authorization information. The following table provides the supported authentication methods and the description for how to use that method.

Authentication methodDescription
Managed identity (Recommended)Append Authentication="Active Directory Managed Identity";User Id={object_id}; to the connection string. The request is made on behalf of a managed identity which must have the appropriate permissions to the database.
To enable managed identity authentication, you must add the managed identity to your cluster and alter the managed identity policy. For more information, see Managed Identity policy.
Azure Resource Manager resource IDThis authentication method requires specifying the armResourceId and optionally the token in the options. The armResourceId identifies the Cosmos DB database account, and the token must be a valid Microsoft Entra bearer token for a principal with access permissions to the Cosmos DB database. If no token is provided, the Microsoft Entra token of the requesting principal will be used for authentication.
Account keyYou can add the account key directly to the ConnectionString argument. However, this approach is less secure as it involves including the secret in the query text, and is less resilient to future changes in the account key. To enhance security, hide the secret as an obfuscated string literal.
TokenYou can add a token value in the plugin options. The token must belong to a principal with relevant permissions. To enhance security, hide the token as an obfuscated string literal.

Set callout policy

The plugin makes callouts to the Azure Cosmos DB instance. Make sure that the cluster’s callout policy enables calls of type cosmosdb to the target CosmosDbUri.

The following example shows how to define the callout policy for Azure Cosmos DB. It’s recommended to restrict it to specific endpoints (my_endpoint1, my_endpoint2).

[
  {
    "CalloutType": "CosmosDB",
    "CalloutUriRegex": "my_endpoint1\\.documents\\.azure\\.com",
    "CanCall": true
  },
  {
    "CalloutType": "CosmosDB",
    "CalloutUriRegex": "my_endpoint2\\.documents\\.azure\\.com",
    "CanCall": true
  }
]

The following example shows an alter callout policy command for cosmosdb CalloutType

.alter cluster policy callout @'[{"CalloutType": "cosmosdb", "CalloutUriRegex": "\\.documents\\.azure\\.com", "CanCall": true}]'

Examples

The following examples use placeholder text, in brackets.

Query Azure Cosmos DB with a query-defined output schema

The following example uses the cosmosdb_sql_request plugin to send a SQL query while selecting only specific columns. This query uses explicit schema definitions that allow various optimizations before the actual query is run against Cosmos DB.

evaluate cosmosdb_sql_request(
  'AccountEndpoint=https://cosmosdbacc.documents.azure.com/;Database=<MyDatabase>;Collection=<MyCollection>;AccountKey='h'<AccountKey>',
  'SELECT c.Id, c.Name from c') : (Id:long, Name:string) 

Query Azure Cosmos DB

The following example uses the cosmosdb_sql_request plugin to send a SQL query to fetch data from Azure Cosmos DB using its Azure Cosmos DB for NoSQL.

evaluate cosmosdb_sql_request(
  'AccountEndpoint=https://cosmosdbacc.documents.azure.com/;Database=<MyDatabase>;Collection=<MyCollection>;AccountKey='h'<AccountKey>',
  'SELECT * from c') // OutputSchema is unknown, so it is not specified. This may harm the performance of the query.

Query Azure Cosmos DB with parameters

The following example uses SQL query parameters and queries the data from an alternate region. For more information, see preferredLocations.

evaluate cosmosdb_sql_request(
   'AccountEndpoint=https://cosmosdbacc.documents.azure.com/;Database=<MyDatabase>;Collection=<MyCollection>;AccountKey='h'<AccountKey>',
    "SELECT c.id, c.lastName, @param0 as Column0 FROM c WHERE c.dob >= '1970-01-01T00:00:00Z'",
    dynamic({'@param0': datetime(2019-04-16 16:47:26.7423305)}),
    dynamic({'preferredLocations': ['East US']})) : (Id:long, Name:string, Column0: datetime) 
| where lastName == 'Smith'

Query Azure Cosmos DB and join data with a database table

The following example joins partner data from an Azure Cosmos DB with partner data in a database using the Partner field. It results in a list of partners with their phone numbers, website, and contact email address sorted by partner name.

evaluate cosmosdb_sql_request(
    'AccountEndpoint=https://cosmosdbacc.documents.azure.com/;Database=<MyDatabase>;Collection=<MyCollection>;AccountKey='h'<AccountKey>',
    "SELECT c.id, c.Partner, c. phoneNumber FROM c') : (Id:long, Partner:string, phoneNumber:string) 
| join kind=innerunique Partner on Partner
| project id, Partner, phoneNumber, website, Contact
| sort by Partner

Query Azure Cosmos DB using token authentication

The following example joins partner data from an Azure Cosmos DB with partner data in a database using the Partner field. It results in a list of partners with their phone numbers, website, and contact email address sorted by partner name.

evaluate cosmosdb_sql_request(
    'AccountEndpoint=https://cosmosdbacc.documents.azure.com/;Database=<MyDatabase>;Collection=<MyCollection>;',
    "SELECT c.Id, c.Name, c.City FROM c",
    dynamic(null),
    dynamic({'token': h'abc123...'})
) : (Id:long, Name:string, City:string)

Query Azure Cosmos DB using Azure Resource Manager resource ID for authentication

The following example uses the Azure Resource Manager resource ID for authentication and the Microsoft Entra token of the requesting principal, since a token isn’t specified. It sends a SQL query while selecting only specific columns and specifies explicit schema definitions.

evaluate cosmosdb_sql_request(
    'AccountEndpoint=https://cosmosdbacc.documents.azure.com/;Database=<MyDatabase>;Collection=<MyCollection>;',
    "SELECT c.Id, c.Name, c.City FROM c",
    dynamic({'armResourceId': '/subscriptions/<SubscriptionId>/resourceGroups/<ResourceGroup>/providers/Microsoft.DocumentDb/databaseAccounts/<DatabaseAccount>'})
) : (Id:long, Name:string, City:string)

9.5.4 - http_request plugin

Learn how to use the http_request plugin to send an HTTP request and convert the response into a table.

services: data-explorer

http_request plugin

The http_request plugin sends an HTTP GET request and converts the response into a table.

Prerequisites

Syntax

evaluate http_request ( Uri [, RequestHeaders [, Options]] )

Parameters

NameTypeRequiredDescription
Uristring✔️The destination URI for the HTTP or HTTPS request.
RequestHeadersdynamicA property bag containing HTTP headers to send with the request.
OptionsdynamicA property bag containing additional properties of the request.

Authentication and authorization

To authenticate, use the HTTP standard Authorization header or any custom header supported by the web service.

Returns

The plugin returns a table that has a single record with the following dynamic columns:

  • ResponseHeaders: A property bag with the response header.
  • ResponseBody: The response body parsed as a value of type dynamic.

If the HTTP response indicates (via the Content-Type response header) that the media type is application/json, the response body is automatically parsed as-if it’s a JSON object. Otherwise, it’s returned as-is.

Headers

The RequestHeaders argument can be used to add custom headers to the outgoing HTTP request. In addition to the standard HTTP request headers and the user-provided custom headers, the plugin also adds the following custom headers:

NameDescription
x-ms-client-request-idA correlation ID that identifies the request. Multiple invocations of the plugin in the same query will all have the same ID.
x-ms-readonlyA flag indicating that the processor of this request shouldn’t make any persistent changes.

Example

The following example retrieves Azure retails prices for Azure Purview in west Europe:

let Uri = "https://prices.azure.com/api/retail/prices?$filter=serviceName eq 'Azure Purview' and location eq 'EU West'";
evaluate http_request(Uri)
| project ResponseBody.Items
| mv-expand ResponseBody_Items
| evaluate bag_unpack(ResponseBody_Items)

Output

armRegionNamearmSkuNamecurrencyCodeeffectiveStartDateisPrimaryMeterRegionlocationmeterIdmeterNameproductIdproductNameretailPriceserviceFamilyserviceIdserviceNameskuIdskuNametierMinimumUnitstypeunitOfMeasureunitPrice
westeuropeData InsightsUSD2022-06-01T00:00:00ZfalseEU West8ce915f7-20db-564d-8cc3-5702a7c952abData Insights Report ConsumptionDZH318Z08M22Azure Purview Data Map0.21AnalyticsDZH318Q66D0FAzure PurviewDZH318Z08M22/006CCatalog Insights0Consumption1 API Calls0.21
westeuropeData Map Enrichment - Data Insights GenerationUSD2022-06-01T00:00:00ZfalseEU West7ce2db1d-59a0-5193-8a57-0431a10622b6Data Map Enrichment - Data Insights Generation vCoreDZH318Z08M22Azure Purview Data Map0.82AnalyticsDZH318Q66D0FAzure PurviewDZH318Z08M22/005CData Map Enrichment - Insight Generation0Consumption1 Hour0.82
westeuropeUSD2021-09-28T00:00:00ZfalseEU West053e2dcb-82c0-5e50-86cd-1f1c8d803705Power BI vCoreDZH318Z08M23Azure Purview Scanning Ingestion and Classification0AnalyticsDZH318Q66D0FAzure PurviewDZH318Z08M23/0005Power BI0Consumption1 Hour0
westeuropeUSD2021-09-28T00:00:00ZfalseEU Westa7f57f26-5f31-51e5-a5ed-ffc2b0da37b9Resource Set vCoreDZH318Z08M22Azure Purview Data Map0.21AnalyticsDZH318Q66D0FAzure PurviewDZH318Z08M22/000XResource Set0Consumption1 Hour0.21
westeuropeUSD2021-09-28T00:00:00ZfalseEU West5d157295-441c-5ea7-ba7c-5083026dc456SQL Server vCoreDZH318Z08M23Azure Purview Scanning Ingestion and Classification0AnalyticsDZH318Q66D0FAzure PurviewDZH318Z08M23/000FSQL Server0Consumption1 Hour0
westeuropeUSD2021-09-28T00:00:00ZfalseEU West0745df0d-ce4f-52db-ac31-ac574d4dcfe5Standard Capacity UnitDZH318Z08M22Azure Purview Data Map0.411AnalyticsDZH318Q66D0FAzure PurviewDZH318Z08M22/0002Standard0Consumption1 Hour0.411
westeuropeUSD2021-09-28T00:00:00ZfalseEU West811e3118-5380-5ee8-a5d9-01d48d0a0627Standard vCoreDZH318Z08M23Azure Purview Scanning Ingestion and Classification0.63AnalyticsDZH318Q66D0FAzure PurviewDZH318Z08M23/0009Standard0Consumption1 Hour0.63

9.5.5 - http_request_post plugin

Learn how to use the http_request_post plugin to send an HTTP request and convert the response into a table.

services: data-explorer

http_request_post plugin

The http_request_post plugin sends an HTTP POST request and converts the response into a table.

Prerequisites

Syntax

evaluate http_request_post ( Uri [, RequestHeaders [, Options [, Content]]] )

Parameters

NameTypeRequiredDescription
Uristring✔️The destination URI for the HTTP or HTTPS request.
RequestHeadersdynamicA property bag containing HTTP headers to send with the request.
OptionsdynamicA property bag containing additional properties of the request.
ContentstringThe body content to send with the request. The content is encoded in UTF-8 and the media type for the Content-Type attribute is application/json.

Authentication and authorization

To authenticate, use the HTTP standard Authorization header or any custom header supported by the web service.

Returns

The plugin returns a table that has a single record with the following dynamic columns:

  • ResponseHeaders: A property bag with the response header.
  • ResponseBody: The response body parsed as a value of type dynamic.

If the HTTP response indicates (via the Content-Type response header) that the media type is application/json, the response body is automatically parsed as-if it’s a JSON object. Otherwise, it’s returned as-is.

Headers

The RequestHeaders argument can be used to add custom headers to the outgoing HTTP request. In addition to the standard HTTP request headers and the user-provided custom headers, the plugin also adds the following custom headers:

NameDescription
x-ms-client-request-idA correlation ID that identifies the request. Multiple invocations of the plugin in the same query will all have the same ID.
x-ms-readonlyA flag indicating that the processor of this request shouldn’t make any persistent changes.

Example

The following example is for a hypothetical HTTPS web service that accepts additional request headers and must be authenticated to using Microsoft Entra ID:

let uri='https://example.com/node/js/on/eniac';
let headers=dynamic({'x-ms-correlation-vector':'abc.0.1.0', 'authorization':'bearer ...Azure-AD-bearer-token-for-target-endpoint...'});
evaluate http_request_post(uri, headers)

9.5.6 - mysql_request plugin

Learn how to use the mysql_request plugin to send a SQL query to a MySQL server network endpoint.

The mysql_request plugin sends a SQL query to an Azure MySQL Server network endpoint and returns the first rowset in the results. The query may return more than one rowset, but only the first rowset is made available for the rest of the Kusto query.

The plugin is invoked with the evaluate operator.

Syntax

evaluate mysql_request ( ConnectionString , SqlQuery [, SqlParameters] ) [: OutputSchema]

Parameters

NameTypeRequiredDescription
ConnectionStringstring✔️The connection string that points at the MySQL Server network endpoint. See authentication and how to specify the network endpoint.
SqlQuerystring✔️The query that is to be executed against the SQL endpoint. Must return one or more row sets. Only the first set is made available for the rest of the query.
SqlParametersdynamicA property bag object that holds key-value pairs to pass as parameters along with the query.
OutputSchemaThe names and types for the expected columns of the mysql_request plugin output.

Syntax: ( ColumnName : ColumnType [, …] )

Authentication and authorization

To authorize to a MySQL Server network endpoint, you need to specify the authorization information in the connection string. The supported authorization method is via username and password.

Set callout policy

The plugin makes callouts to the MySql database. Make sure that the cluster’s callout policy enables calls of type mysql to the target MySqlDbUri.

The following example shows how to define the callout policy for MySQL databases. We recommend restricting the callout policy to specific endpoints (my_endpoint1, my_endpoint2).

[
  {
    "CalloutType": "mysql",
    "CalloutUriRegex": "my_endpoint1\\.mysql\\.database\\.azure\\.com",
    "CanCall": true
  },
  {
    "CalloutType": "mysql",
    "CalloutUriRegex": "my_endpoint2\\.mysql\\.database\\.azure\\.com",
    "CanCall": true
  }
]

The following example shows an .alter callout policy command for mysql CalloutType:

.alter cluster policy callout @'[{"CalloutType": "mysql", "CalloutUriRegex": "\\.mysql\\.database\\.azure\\.com", "CanCall": true}]'

Username and password authentication

The mysql_request plugin only supports username and password authentication to the MySQL server endpoint and doesn’t integrate with Microsoft Entra authentication.

The username and password are provided as part of the connections string using the following parameters:

User ID=...; Password=...;

Encryption and server validation

For security, SslMode is unconditionally set to Required when connecting to a MySQL server network endpoint. As a result, the server must be configured with a valid SSL/TLS server certificate.

Specify the network endpoint

Specify the MySQL network endpoint as part of the connection string.

Syntax:

Server = FQDN [Port = Port]

Where:

  • FQDN is the fully qualified domain name of the endpoint.
  • Port is the TCP port of the endpoint. By default, 3306 is assumed.

Examples

SQL query to Azure MySQL DB

The following example sends a SQL query to an Azure MySQL database. It retrieves all records from [dbo].[Table], and then processes the results.

evaluate mysql_request(
    'Server=contoso.mysql.database.azure.com; Port = 3306;'
    'Database=Fabrikam;'
    h'UID=USERNAME;'
    h'Pwd=PASSWORD;',
    'select * from `dbo`.`Table`') : (Id: int, Name: string)
| where Id > 0
| project Name

SQL query to an Azure MySQL database with modifications

The following example sends a SQL query to an Azure MySQL database retrieving all records from [dbo].[Table], while appending another datetime column, and then processes the results on the Kusto side. It specifies a SQL parameter (@param0) to be used in the SQL query.

evaluate mysql_request(
    'Server=contoso.mysql.database.azure.com; Port = 3306;'
    'Database=Fabrikam;'
    h'UID=USERNAME;'
    h'Pwd=PASSWORD;',
    'select *, @param0 as dt from `dbo`.`Table`',
    dynamic({'param0': datetime(2020-01-01 16:47:26.7423305)})) : (Id:long, Name:string, dt: datetime)
| where Id > 0
| project Name

SQL query to an Azure MySQL database without a query-defined output schema

The following example sends a SQL query to an Azure MySQL database without an output schema. This is not recommended unless the schema is unknown, as it may impact the performance of the query.

evaluate mysql_request(
    'Server=contoso.mysql.database.azure.com; Port = 3306;'
    'Database=Fabrikam;'
    h'UID=USERNAME;'
    h'Pwd=PASSWORD;',
    'select * from `dbo`.`Table`')
| where Id > 0
| project Name

9.5.7 - postgresql_request plugin

Learn how to use the postgresql_request plugin to send a SQL query to a PostgreSQL server network endpoint.

The postgresql_request plugin sends a SQL query to an Azure PostgreSQL Server network endpoint and returns the first rowset in the results. The query may return more than one rowset, but only the first rowset is made available for the rest of the Kusto query.

The plugin is invoked with the evaluate operator.

Syntax

evaluate postgresql_request ( ConnectionString , SqlQuery [, SqlParameters] ) [: OutputSchema]

Parameters

NameTypeRequiredDescription
ConnectionStringstring✔️The connection string that points at the PostgreSQL Server network endpoint. See authentication and how to specify the network endpoint.
SqlQuerystring✔️The query that is to be executed against the SQL endpoint. Must return one or more row sets. Only the first set is made available for the rest of the query.
SqlParametersdynamicA property bag object that holds key-value pairs to pass as parameters along with the query.
OutputSchemaThe names and types for the expected columns of the postgresql_request plugin output.

Syntax: ( ColumnName : ColumnType [, …] )

Authentication and authorization

To authorize a PostgreSQL Server network endpoint, you must specify the authorization information in the connection string. The supported authorization method is via username and password.

Set callout policy

The plugin makes callouts to the PostgreSQL database. Make sure that the cluster’s callout policy enables calls of type postgresql to the target PostgreSqlDbUri.

The following example shows how to define the callout policy for PostgreSQL databases. We recommend restricting the callout policy to specific endpoints (my_endpoint1, my_endpoint2).

[
  {
    "CalloutType": "postgresql",
    "CalloutUriRegex": "my_endpoint1\\.postgres\\.database\\.azure\\.com",
    "CanCall": true
  },
  {
    "CalloutType": "postgresql",
    "CalloutUriRegex": "my_endpoint2\\.postgres\\.database\\.azure\\.com",
    "CanCall": true
  }
]

The following example shows a .alter callout policy command for postgresql CalloutType:

.alter cluster policy callout @'[{"CalloutType": "postgresql", "CalloutUriRegex": "\\.postgresql\\.database\\.azure\\.com", "CanCall": true}]'

Username and password authentication

The postgresql_request plugin only supports username and password authentication to the PostgreSQL server endpoint and doesn’t integrate with Microsoft Entra authentication.

The username and password are provided as part of the connections string using the following parameters:

User ID=...; Password=...;

Encryption and server validation

For security, SslMode is unconditionally set to Required when connecting to a PostgreSQL server network endpoint. As a result, the server must be configured with a valid SSL/TLS server certificate.

Specify the network endpoint

Specify the PostgreSQL network endpoint as part of the connection string.

Syntax:

Host = FQDN [Port = Port]

Where:

  • FQDN is the fully qualified domain name of the endpoint.
  • Port is the TCP port of the endpoint.

Examples

SQL query to Azure PostgreSQL DB

The following example sends a SQL query to an Azure PostgreSQL database. It retrieves all records from public."Table", and then processes the results.

evaluate postgresql_request(
    'Host=contoso.postgres.database.azure.com; Port = 5432;'
    'Database=Fabrikam;'
    h'User Id=USERNAME;'
    h'Password=PASSWORD;',
    'select * from public."Table"') : (Id: int, Name: string)
| where Id > 0
| project Name

SQL query to an Azure PostgreSQL database with modifications

The following example sends a SQL query to an Azure PostgreSQL database retrieving all records from public."Table", while appending another datetime column, and then processes the results. It specifies a SQL parameter (@param0) to be used in the SQL query.

evaluate postgresql_request(
    'Server=contoso.postgres.database.azure.com; Port = 5432;'
    'Database=Fabrikam;'
    h'User Id=USERNAME;'
    h'Password=PASSWORD;',
    'select *, @param0 as dt from public."Table"',
    dynamic({'param0': datetime(2020-01-01 16:47:26.7423305)})) : (Id: int, Name: string, dt: datetime)
| where Id > 0
| project Name

SQL query to an Azure PostgreSQL database without a query-defined output schema

The following example sends a SQL query to an Azure PostgreSQL database without an output schema. This is not recommended unless the schema is unknown, as it may impact the performance of the query

evaluate postgresql_request(
    'Host=contoso.postgres.database.azure.com; Port = 5432;'
    'Database=Fabrikam;'
    h'User Id=USERNAME;'
    h'Password=PASSWORD;',
    'select * from public."Table"')
| where Id > 0
| project Name

9.5.8 - sql_request plugin

Learn how to use the sql_request plugin to send an SQL query to an SQL server network endpoint.

The sql_request plugin sends a SQL query to an Azure SQL Server network endpoint and returns the results. If more than one rowset is returned by SQL, only the first one is used. The plugin is invoked with the evaluate operator.

Syntax

evaluate sql_request ( ConnectionString , SqlQuery [, SqlParameters [, Options]] ) [: OutputSchema]

Parameters

NameTypeRequiredDescription
ConnectionStringstring✔️The connection string that points at the SQL Server network endpoint. See valid methods of authentication and how to specify the network endpoint.
SqlQuerystring✔️The query that is to be executed against the SQL endpoint. The query must return one or more row sets, but only the first one is made available for the rest of the Kusto query.
SqlParametersdynamicA property bag of key-value pairs to pass as parameters along with the query.
OptionsdynamicA property bag of key-value pairs to pass more advanced settings along with the query. Currently, only token can be set, to pass a caller-provided Microsoft Entra access token that is forwarded to the SQL endpoint for authentication.
OutputSchemastringThe names and types for the expected columns of the sql_request plugin output. Use the following syntax: ( ColumnName : ColumnType [, …] ).

Authentication and authorization

The sql_request plugin supports the following three methods of authentication to the SQL Server endpoint.

|Authentication method|Syntax|How|Description| |–|–|–| |Microsoft Entra integrated|Authentication="Active Directory Integrated"|Add to the ConnectionString parameter.| The user or application authenticates via Microsoft Entra ID to your cluster, and the same token is used to access the SQL Server network endpoint.
The principal must have the appropriate permissions on the SQL resource to perform the requested action. For example, to read from the database the principal needs table SELECT permissions, and to write to an existing table the principal needs UPDATE and INSERT permissions. To write to a new table, CREATE permissions are also required.| |Managed identity|Authentication="Active Directory Managed Identity";User Id={object_id}|Add to the ConnectionString parameter.| The request is executed on behalf of a managed identity. The managed identity must have the appropriate permissions on the SQL resource to perform the requested action.
To enable managed identity authentication, you must add the managed identity to your cluster and alter the managed identity policy. For more information, see Managed Identity policy. | |Username and password|User ID=...; Password=...;|Add to the ConnectionString parameter.|When possible, avoid this method as it may be less secure.| |Microsoft Entra access token|dynamic({'token': h"eyJ0..."})|Add in the Options parameter.|The access token is passed as token property in the Options argument of the plugin.|

Examples

Send a SQL query using Microsoft Entra integrated authentication

The following example sends a SQL query to an Azure SQL DB database. It retrieves all records from [dbo].[Table], and then processes the results on the Kusto side. Authentication reuses the calling user’s Microsoft Entra token.

evaluate sql_request(
  'Server=tcp:contoso.database.windows.net,1433;'
    'Authentication="Active Directory Integrated";'
    'Initial Catalog=Fabrikam;',
  'select * from [dbo].[Table]') : (Id:long, Name:string)
| where Id > 0
| project Name

Send a SQL query using Username/Password authentication

The following example is identical to the previous one, except that SQL authentication is done by username/password. For confidentiality, we use obfuscated strings here.

evaluate sql_request(
  'Server=tcp:contoso.database.windows.net,1433;'
    'Initial Catalog=Fabrikam;'
    h'User ID=USERNAME;'
    h'Password=PASSWORD;',
  'select * from [dbo].[Table]') : (Id:long, Name:string)
| where Id > 0
| project Name

Send a SQL query using a Microsoft Entra access token

The following example sends a SQL query to an Azure SQL database retrieving all records from [dbo].[Table], while appending another datetime column, and then processes the results on the Kusto side. It specifies a SQL parameter (@param0) to be used in the SQL query.

evaluate sql_request(
  'Server=tcp:contoso.database.windows.net,1433;'
    'Authentication="Active Directory Integrated";'
    'Initial Catalog=Fabrikam;',
  'select *, @param0 as dt from [dbo].[Table]',
  dynamic({'param0': datetime(2020-01-01 16:47:26.7423305)})) : (Id:long, Name:string, dt: datetime)
| where Id > 0
| project Name

Send a SQL query without a query-defined output schema

The following example sends a SQL query to an Azure SQL database without an output schema. This is not recommended unless the schema is unknown, as it may impact the performance of the query

evaluate sql_request(
  'Server=tcp:contoso.database.windows.net,1433;'
    'Initial Catalog=Fabrikam;'
    h'User ID=USERNAME;'
    h'Password=PASSWORD;',
  'select * from [dbo].[Table]')
| where Id > 0
| project Name

Encryption and server validation

The following connection properties are forced when connecting to a SQL Server network endpoint, for security reasons.

  • Encrypt is set to true unconditionally.
  • TrustServerCertificate is set to false unconditionally.

As a result, the SQL Server must be configured with a valid SSL/TLS server certificate.

Specify the network endpoint

Specifying the SQL network endpoint as part of the connection string is mandatory. The appropriate syntax is:

Server = tcp: FQDN [, Port]

Where:

  • FQDN is the fully qualified domain name of the endpoint.
  • Port is the TCP port of the endpoint. By default, 1433 is assumed.

9.6 - User and sequence analytics plugins

9.6.1 - active_users_count plugin

Learn how to use the active_users_count plugin to calculate the distinct count of values that appeared in a minimum number of periods in a lookback period.

Calculates distinct count of values, where each value has appeared in at least a minimum number of periods in a lookback period.

Useful for calculating distinct counts of “fans” only, while not including appearances of “non-fans”. A user is counted as a “fan” only if it was active during the lookback period. The lookback period is only used to determine whether a user is considered active (“fan”) or not. The aggregation itself doesn’t include users from the lookback window. In comparison, the sliding_window_counts aggregation is performed over a sliding window of the lookback period.

Syntax

T | evaluate active_users_count(IdColumn, TimelineColumn, Start, End, LookbackWindow, Period, ActivePeriodsCount, Bin , [dim1, dim2, …])

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input used to count active users.
IdColumnstring✔️The name of the column with ID values that represent user activity.
TimelineColumnstring✔️The name of the column that represents timeline.
Startdatetime✔️The analysis start period.
Enddatetime✔️The analysis end period.
LookbackWindowtimespan✔️The time window defining a period where user appearance is checked. The lookback period starts at ([current appearance] - [lookback window]) and ends on ([current appearance]).
Periodtimespan✔️A constant to count as single appearance (a user will be counted as active if it appears in at least distinct ActivePeriodsCount of this timespan.
ActivePeriodsCountdecimal✔️The minimal number of distinct active periods to decide if user is active. Active users are those users who appeared in at least (equal or greater than) active periods count.
Bindecimal, datetime, or timespan✔️A constant value of the analysis step period. May also be a string of week, month, or year. All periods will be the corresponding startofweek, startofmonth, orstartofyear functions.
dim1, dim2, …dynamicAn array of the dimensions columns that slice the activity metrics calculation.

Returns

Returns a table that has distinct count values for IDs that have appeared in ActivePeriodCounts in the following periods: the lookback period, each timeline period, and each existing dimensions combination.

Output table schema is:

TimelineColumndim1..dim_ndcount_values
type: as of TimelineColumn......long

Examples

Calculate weekly number of distinct users that appeared in at least three different days over a period of prior eight days. Period of analysis: July 2018.

let Start = datetime(2018-07-01);
let End = datetime(2018-07-31);
let LookbackWindow = 8d;
let Period = 1d;
let ActivePeriods = 3;
let Bin = 7d;
let T =  datatable(User:string, Timestamp:datetime)
[
    "B",      datetime(2018-06-29),
    "B",      datetime(2018-06-30),
    "A",      datetime(2018-07-02),
    "B",      datetime(2018-07-04),
    "B",      datetime(2018-07-08),
    "A",      datetime(2018-07-10),
    "A",      datetime(2018-07-14),
    "A",      datetime(2018-07-17),
    "A",      datetime(2018-07-20),
    "B",      datetime(2018-07-24)
];
T | evaluate active_users_count(User, Timestamp, Start, End, LookbackWindow, Period, ActivePeriods, Bin)

Output

Timestampdcount
2018-07-01 00:00:00.00000001
2018-07-15 00:00:00.00000001

A user is considered active if it fulfills both of the following criteria:

  • The user was seen in at least three distinct days (Period = 1d, ActivePeriods=3).
  • The user was seen in a lookback window of 8d before and including their current appearance.

In the illustration below, the only appearances that are active by this criteria are the following instances: User A on 7/20 and User B on 7/4 (see plugin results above). The appearances of User B are included for the lookback window on 7/4, but not for the Start-End time range of 6/29-30.

Graph showing active users based on the loopback window and active period specified in the query.

9.6.2 - activity_counts_metrics plugin

Learn how to use the activity_counts_metrics plugin to compare activity metrics in different time windows.

Calculates useful activity metrics for each time window compared/aggregated to all previous time windows. Metrics include: total count values, distinct count values, distinct count of new values, and aggregated distinct count. Compare this plugin to activity_metrics plugin, in which every time window is compared to its previous time window only.

Syntax

T | evaluate activity_counts_metrics(IdColumn, TimelineColumn, Start, End, Step [, Dimensions])

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input used to count activities.
IdColumnstring✔️The name of the column with ID values that represent user activity.
TimelineColumnstring✔️The name of the column that represents the timeline.
Startdatetime✔️The analysis start period.
Enddatetime✔️The analysis end period.
Stepdecimal, datetime, or timespan✔️The analysis window period. The value may also be a string of week, month, or year, in which case all periods would be startofweek, startofmonth, or startofyear.
DimensionsstringZero or more comma-separated dimensions columns that slice the activity metrics calculation.

Returns

Returns a table that has the total count values, distinct count values, distinct count of new values, and aggregated distinct count for each time window. If Dimensions are provided, then there’s another column for each dimension in the output table.

The following table describes the output table schema.

Column nameTypeDescription
TimestampSame as the provided TimelineColumn argumentThe time window start time.
countlongThe total records count in the time window and dim(s)
dcountlongThe distinct ID values count in the time window and dim(s)
new_dcountlongThe distinct ID values in the time window and dim(s) compared to all previous time windows.
aggregated_dcountlongThe total aggregated distinct ID values of dim(s) from first-time window to current (inclusive).

Examples

Daily activity counts

The next query calculates daily activity counts for the provided input table.

let start=datetime(2017-08-01);
let end=datetime(2017-08-04);
let window=1d;
let T = datatable(UserId:string, Timestamp:datetime)
[
'A', datetime(2017-08-01),
'D', datetime(2017-08-01),
'J', datetime(2017-08-01),
'B', datetime(2017-08-01),
'C', datetime(2017-08-02),
'T', datetime(2017-08-02),
'J', datetime(2017-08-02),
'H', datetime(2017-08-03),
'T', datetime(2017-08-03),
'T', datetime(2017-08-03),
'J', datetime(2017-08-03),
'B', datetime(2017-08-03),
'S', datetime(2017-08-03),
'S', datetime(2017-08-04),
];
 T
 | evaluate activity_counts_metrics(UserId, Timestamp, start, end, window)

Output

Timestampcountdcountnew_dcountaggregated_dcount
2017-08-01 00:00:00.00000004444
2017-08-02 00:00:00.00000003326
2017-08-03 00:00:00.00000006528
2017-08-04 00:00:00.00000001108

9.6.3 - activity_engagement plugin

Learn how to use the activity_engagement plugin to calculate activity engagement ratios.

Calculates activity engagement ratio based on ID column over a sliding timeline window.

The activity_engagement plugin can be used for calculating DAU/WAU/MAU (daily/weekly/monthly activities).

Syntax

T | evaluate activity_engagement(IdColumn, TimelineColumn, [Start, End,] InnerActivityWindow, OuterActivityWindow [, dim1, dim2, …])

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input used to calculate engagement.
IdCoumnstring✔️The name of the column with ID values that represent user activity.
TimelineColumnstring✔️The name of the column that represents timeline.
StartdatetimeThe analysis start period.
EnddatetimeThe analysis end period.
InnerActivityWindowtimespan✔️The inner-scope analysis window period.
OuterActivityWindowtimespan✔️The outer-scope analysis window period.
dim1, dim2, …dynamicAn array of the dimensions columns that slice the activity metrics calculation.

Returns

Returns a table that has a distinct count of ID values inside an inner-scope window, inside an outer-scope window, and the activity ratio for each inner-scope window period for each existing dimensions combination.

Output table schema is:

TimelineColumndcount_activities_innerdcount_activities_outeractivity_ratiodim1..dim_n
type: as of TimelineColumnlonglongdouble......

Examples

DAU/WAU calculation

The following example calculates DAU/WAU (Daily Active Users / Weekly Active Users ratio) over a randomly generated data.

// Generate random data of user activities
let _start = datetime(2017-01-01);
let _end = datetime(2017-01-31);
range _day from _start to _end  step 1d
| extend d = tolong((_day - _start)/1d)
| extend r = rand()+1
| extend _users=range(tolong(d*50*r), tolong(d*50*r+100*r-1), 1) 
| mv-expand id=_users to typeof(long) limit 1000000
// Calculate DAU/WAU ratio
| evaluate activity_engagement(['id'], _day, _start, _end, 1d, 7d)
| project _day, Dau_Wau=activity_ratio*100 
| render timechart 

:::image type=“content” source=“media/activity-engagement-plugin/activity-engagement-dau-wau.png” border=“false” alt-text=“Graph displaying the ratio of daily active users to weekly active users as specified in the query.”:::

DAU/MAU calculation

The following example calculates DAU/WAU (Daily Active Users / Weekly Active Users ratio) over a randomly generated data.

// Generate random data of user activities
let _start = datetime(2017-01-01);
let _end = datetime(2017-05-31);
range _day from _start to _end  step 1d
| extend d = tolong((_day - _start)/1d)
| extend r = rand()+1
| extend _users=range(tolong(d*50*r), tolong(d*50*r+100*r-1), 1) 
| mv-expand id=_users to typeof(long) limit 1000000
// Calculate DAU/MAU ratio
| evaluate activity_engagement(['id'], _day, _start, _end, 1d, 30d)
| project _day, Dau_Mau=activity_ratio*100 
| render timechart 

:::image type=“content” source=“media/activity-engagement-plugin/activity-engagement-dau-mau.png” border=“false” alt-text=“Graph displaying the ratio of daily active users to monthly active users as specified in the query.”:::

DAU/MAU calculation with additional dimensions

The following example calculates DAU/WAU (Daily Active Users / Weekly Active Users ratio) over a randomly generated data with additional dimension (mod3).

// Generate random data of user activities
let _start = datetime(2017-01-01);
let _end = datetime(2017-05-31);
range _day from _start to _end  step 1d
| extend d = tolong((_day - _start)/1d)
| extend r = rand()+1
| extend _users=range(tolong(d*50*r), tolong(d*50*r+100*r-1), 1) 
| mv-expand id=_users to typeof(long) limit 1000000
| extend mod3 = strcat("mod3=", id % 3)
// Calculate DAU/MAU ratio
| evaluate activity_engagement(['id'], _day, _start, _end, 1d, 30d, mod3)
| project _day, Dau_Mau=activity_ratio*100, mod3 
| render timechart 

:::image type=“content” source=“media/activity-engagement-plugin/activity-engagement-dau-mau-mod3.png” border=“false” alt-text=“Graph displaying the ratio of daily active users to monthly active users with modulo 3 as specified in the query.”:::

9.6.4 - activity_metrics plugin

Learn how to use the activity_metrics plugin to calculate activity metrics using the current time window compared to the previous window.

Calculates useful metrics that include distinct count values, distinct count of new values, retention rate, and churn rate. This plugin is different from activity_counts_metrics plugin in which every time window is compared to all previous time windows.

Syntax

T | evaluate activity_metrics(IdColumn, TimelineColumn, [Start, End,] Window [, dim1, dim2, …])

Parameters

NameTypeRequiredDescription
Tstring✔️The input used to calculate activity metrics.
IdCoumnstring✔️The name of the column with ID values that represent user activity.
TimelineColumnstring✔️The name of the column that represents timeline.
Startdatetime✔️The analysis start period.
Enddatetime✔️The analysis end period.
Stepdecimal, datetime, or timespan✔️The analysis window period. This value may also be a string of week, month, or year, in which case all periods will be startofweek, startofmonth, or startofyear respectively.
dim1, dim2, …dynamicAn array of the dimensions columns that slice the activity metrics calculation.

Returns

The plugin returns a table with the distinct count values, distinct count of new values, retention rate, and churn rate for each timeline period for each existing dimensions combination.

Output table schema is:

TimelineColumndcount_valuesdcount_newvaluesretention_ratechurn_ratedim1..dim_n
type: as of TimelineColumnlonglongdoubledouble......

Notes

Retention Rate Definition

Retention Rate over a period is calculated as:

where the # of customers returned during the period is defined as:

Retention Rate can vary from 0.0 to 1.0 A higher score means a larger number of returning users.

Churn Rate Definition

Churn Rate over a period is calculated as:

where the # of customer lost in the period is defined as:

Churn Rate can vary from 0.0 to 1.0 The higher score means the larger number of users are NOT returning to the service.

Churn vs. Retention Rate The churn vs. retention Rate is derived from the definition of Churn Rate and Retention Rate. The following calculation is always true:

Examples

Weekly retention rate and churn rate

The next query calculates retention and churn rate for week-over-week window.

// Generate random data of user activities
let _start = datetime(2017-01-02);
let _end = datetime(2017-05-31);
range _day from _start to _end  step 1d
| extend d = tolong((_day - _start)/1d)
| extend r = rand()+1
| extend _users=range(tolong(d*50*r), tolong(d*50*r+200*r-1), 1)
| mv-expand id=_users to typeof(long) limit 1000000
//
| evaluate activity_metrics(['id'], _day, _start, _end, 7d)
| project _day, retention_rate, churn_rate
| render timechart

Output

_dayretention_ratechurn_rate
2017-01-02 00:00:00.0000000NaNNaN
2017-01-09 00:00:00.00000000.1799100449775110.820089955022489
2017-01-16 00:00:00.00000000.7443744374437440.255625562556256
2017-01-23 00:00:00.00000000.6120967741935480.387903225806452
2017-01-30 00:00:00.00000000.6811414392059550.318858560794045
2017-02-06 00:00:00.00000000.2781456953642380.721854304635762
2017-02-13 00:00:00.00000000.2231726283048210.776827371695179
2017-02-20 00:00:00.00000000.380.62
2017-02-27 00:00:00.00000000.2955190017016450.704480998298355
2017-03-06 00:00:00.00000000.2803877703206560.719612229679344
2017-03-13 00:00:00.00000000.3606281547952890.639371845204711
2017-03-20 00:00:00.00000000.2880080280983440.711991971901656
2017-03-27 00:00:00.00000000.3061349693251530.693865030674847
2017-04-03 00:00:00.00000000.3568665377176020.643133462282398
2017-04-10 00:00:00.00000000.4950980392156860.504901960784314
2017-04-17 00:00:00.00000000.1982968369829680.801703163017032
2017-04-24 00:00:00.00000000.06188118811881190.938118811881188
2017-05-01 00:00:00.00000000.2046577275935070.795342272406493
2017-05-08 00:00:00.00000000.5173913043478260.482608695652174
2017-05-15 00:00:00.00000000.1436672967863890.856332703213611
2017-05-22 00:00:00.00000000.1991223258365330.800877674163467
2017-05-29 00:00:00.00000000.0634689922480620.936531007751938

:::image type=“content” source=“media/activity-metrics-plugin/activity-metrics-churn-and-retention.png” border=“false” alt-text=“Table showing the calculated retention and churn rates per seven days as specified in the query.”:::

Distinct values and distinct ’new’ values

The next query calculates distinct values and ’new’ values (IDs that didn’t appear in previous time window) for week-over-week window.

// Generate random data of user activities
let _start = datetime(2017-01-02);
let _end = datetime(2017-05-31);
range _day from _start to _end  step 1d
| extend d = tolong((_day - _start)/1d)
| extend r = rand()+1
| extend _users=range(tolong(d*50*r), tolong(d*50*r+200*r-1), 1)
| mv-expand id=_users to typeof(long) limit 1000000
//
| evaluate activity_metrics(['id'], _day, _start, _end, 7d)
| project _day, dcount_values, dcount_newvalues
| render timechart

Output

_daydcount_valuesdcount_newvalues
2017-01-02 00:00:00.0000000630630
2017-01-09 00:00:00.0000000738575
2017-01-16 00:00:00.00000001187841
2017-01-23 00:00:00.00000001092465
2017-01-30 00:00:00.00000001261647
2017-02-06 00:00:00.000000017441043
2017-02-13 00:00:00.00000001563432
2017-02-20 00:00:00.00000001406818
2017-02-27 00:00:00.000000019561429
2017-03-06 00:00:00.00000001593848
2017-03-13 00:00:00.000000018011423
2017-03-20 00:00:00.000000017101017
2017-03-27 00:00:00.000000017961516
2017-04-03 00:00:00.000000013811008
2017-04-10 00:00:00.000000017561162
2017-04-17 00:00:00.000000018311409
2017-04-24 00:00:00.000000018231164
2017-05-01 00:00:00.000000018111353
2017-05-08 00:00:00.000000016911246
2017-05-15 00:00:00.000000018121608
2017-05-22 00:00:00.000000017401017
2017-05-29 00:00:00.0000000960756

:::image type=“content” source=“media/activity-metrics-plugin/activity-metrics-dcount-and-dcount-newvalues.png” border=“false” alt-text=“Table showing the count of distinct values (dcount_values) and of new distinct values (dcount_newvalues) that didn’t appear in previous time window as specified in the query.”:::

9.6.5 - funnel_sequence plugin

Learn how to use the funnel_sequence plugin to learn how to calculate the distinct count of users who have taken a sequence of states, and the distribution of previous/next states that have led to/were followed by the sequence.

Calculates distinct count of users who have taken a sequence of states, and the distribution of previous/next states that have led to/were followed by the sequence. The plugin is invoked with the evaluate operator.

Syntax

T | evaluate funnel_sequence(IdColumn, TimelineColumn, Start, End, MaxSequenceStepWindow, Step, StateColumn, Sequence)

Parameters

NameTypeRequiredDescription
Tstring✔️The input tabular expression.
IdColumstring✔️The column reference representing the ID. This column must be present in T.
TimelineColumnstring✔️The column reference representing the timeline. This column must be present in T.
Startdatetime, timespan, or long✔️The analysis start period.
Enddatetime, timespan, or long✔️The analysis end period.
MaxSequenceStepWindowdatetime, timespan, or long✔️The value of the max allowed timespan between two sequential steps in the sequence.
Stepdatetime, timespan, or long✔️The analysis step period, or bin.
StateColumnstring✔️The column reference representing the state. This column must be present in T.
Sequencedynamic✔️An array with the sequence values that are looked up in StateColumn.

Returns

Returns three output tables, which are useful for constructing a sankey diagram for the analyzed sequence:

  • Table #1 - prev-sequence-next dcount

    • TimelineColumn: the analyzed time window
    • prev: the prev state (may be empty if there were any users that only had events for the searched sequence, but not any events prior to it).
    • next: the next state (may be empty if there were any users that only had events for the searched sequence, but not any events that followed it).
    • dcount: distinct count of IdColumn in time window that transitioned prev –> Sequence –> next.
    • samples: an array of IDs (from IdColumn) corresponding to the row’s sequence (a maximum of 128 IDs are returned).
  • Table #2 - prev-sequence dcount

    • TimelineColumn: the analyzed time window
    • prev: the prev state (may be empty if there were any users that only had events for the searched sequence, but not any events prior to it).
    • dcount: distinct count of IdColumn in time window that transitioned prev –> Sequence –> next.
    • samples: an array of IDs (from IdColumn) corresponding to the row’s sequence (a maximum of 128 IDs are returned).
  • Table #3 - sequence-next dcount

    • TimelineColumn: the analyzed time window
    • next: the next state (may be empty if there were any users that only had events for the searched sequence, but not any events that followed it).
    • dcount: distinct count of IdColumn in time window that transitioned prev –> Sequence –> next.
    • samples: an array of IDs (from IdColumn) corresponding to the row’s sequence (a maximum of 128 IDs are returned).

Examples

Exploring storm events

The following query looks at the table StormEvents (weather statistics for 2007) and shows which events happened before/after all Tornado events occurred in 2007.

// Looking on StormEvents statistics: 
// Q1: What happens before Tornado event?
// Q2: What happens after Tornado event?
StormEvents
| evaluate funnel_sequence(EpisodeId, StartTime, datetime(2007-01-01), datetime(2008-01-01), 1d,365d, EventType, dynamic(['Tornado']))

Result includes three tables:

  • Table #1: All possible variants of what happened before and after the sequence. For example, the second line means that there were 87 different events that had following sequence: Hail -> Tornado -> Hail
StartTimeprevnextdcount
2007-01-01 00:00:00.0000000293
2007-01-01 00:00:00.0000000HailHail87
2007-01-01 00:00:00.0000000Thunderstorm WindThunderstorm Wind77
2007-01-01 00:00:00.0000000HailThunderstorm Wind28
2007-01-01 00:00:00.0000000Hail28
2007-01-01 00:00:00.0000000Hail27
2007-01-01 00:00:00.0000000Thunderstorm Wind25
2007-01-01 00:00:00.0000000Thunderstorm WindHail24
2007-01-01 00:00:00.0000000Thunderstorm Wind24
2007-01-01 00:00:00.0000000Flash FloodFlash Flood12
2007-01-01 00:00:00.0000000Thunderstorm WindFlash Flood8
2007-01-01 00:00:00.0000000Flash Flood8
2007-01-01 00:00:00.0000000Funnel CloudThunderstorm Wind6
2007-01-01 00:00:00.0000000Funnel Cloud6
2007-01-01 00:00:00.0000000Flash Flood6
2007-01-01 00:00:00.0000000Funnel CloudFunnel Cloud6
2007-01-01 00:00:00.0000000HailFlash Flood4
2007-01-01 00:00:00.0000000Flash FloodThunderstorm Wind4
2007-01-01 00:00:00.0000000HailFunnel Cloud4
2007-01-01 00:00:00.0000000Funnel CloudHail4
2007-01-01 00:00:00.0000000Funnel Cloud4
2007-01-01 00:00:00.0000000Thunderstorm WindFunnel Cloud3
2007-01-01 00:00:00.0000000Heavy RainThunderstorm Wind2
2007-01-01 00:00:00.0000000Flash FloodFunnel Cloud2
2007-01-01 00:00:00.0000000Flash FloodHail2
2007-01-01 00:00:00.0000000Strong WindThunderstorm Wind1
2007-01-01 00:00:00.0000000Heavy RainFlash Flood1
2007-01-01 00:00:00.0000000Heavy RainHail1
2007-01-01 00:00:00.0000000HailFlood1
2007-01-01 00:00:00.0000000LightningHail1
2007-01-01 00:00:00.0000000Heavy RainLightning1
2007-01-01 00:00:00.0000000Funnel CloudHeavy Rain1
2007-01-01 00:00:00.0000000Flash FloodFlood1
2007-01-01 00:00:00.0000000FloodFlash Flood1
2007-01-01 00:00:00.0000000Heavy Rain1
2007-01-01 00:00:00.0000000Funnel CloudLightning1
2007-01-01 00:00:00.0000000LightningThunderstorm Wind1
2007-01-01 00:00:00.0000000FloodThunderstorm Wind1
2007-01-01 00:00:00.0000000HailLightning1
2007-01-01 00:00:00.0000000Lightning1
2007-01-01 00:00:00.0000000Tropical StormHurricane (Typhoon)1
2007-01-01 00:00:00.0000000Coastal Flood1
2007-01-01 00:00:00.0000000Rip Current1
2007-01-01 00:00:00.0000000Heavy Snow1
2007-01-01 00:00:00.0000000Strong Wind1
  • Table #2: shows all distinct events grouped by the previous event. For example, the second line shows that there were a total of 150 events of Hail that happened just before Tornado.
StartTimeprevdcount
2007-01-01 00:00:00.0000000331
2007-01-01 00:00:00.0000000Hail150
2007-01-01 00:00:00.0000000Thunderstorm Wind135
2007-01-01 00:00:00.0000000Flash Flood28
2007-01-01 00:00:00.0000000Funnel Cloud22
2007-01-01 00:00:00.0000000Heavy Rain5
2007-01-01 00:00:00.0000000Flood2
2007-01-01 00:00:00.0000000Lightning2
2007-01-01 00:00:00.0000000Strong Wind2
2007-01-01 00:00:00.0000000Heavy Snow1
2007-01-01 00:00:00.0000000Rip Current1
2007-01-01 00:00:00.0000000Coastal Flood1
2007-01-01 00:00:00.0000000Tropical Storm1
  • Table #3: shows all distinct events grouped by next event. For example, the second line shows that there were a total of 143 events of Hail that happened after Tornado.
StartTimenextdcount
2007-01-01 00:00:00.0000000332
2007-01-01 00:00:00.0000000Hail145
2007-01-01 00:00:00.0000000Thunderstorm Wind143
2007-01-01 00:00:00.0000000Flash Flood32
2007-01-01 00:00:00.0000000Funnel Cloud21
2007-01-01 00:00:00.0000000Lightning4
2007-01-01 00:00:00.0000000Heavy Rain2
2007-01-01 00:00:00.0000000Flood2
2007-01-01 00:00:00.0000000Hurricane (Typhoon)1

Now, let’s try to find out how the following sequence continues:
Hail -> Tornado -> Thunderstorm Wind

StormEvents
| evaluate funnel_sequence(
               EpisodeId,
               StartTime,
               datetime(2007-01-01),
               datetime(2008-01-01),
               1d,
               365d,
               EventType, 
               dynamic(['Hail', 'Tornado', 'Thunderstorm Wind'])
           )

Skipping Table #1 and Table #2, and looking at Table #3, we can conclude that sequence Hail -> Tornado -> Thunderstorm Wind in 92 events ended with this sequence, continued as Hail in 41 events, and turned back to Tornado in 14.

StartTimenextdcount
2007-01-01 00:00:00.000000092
2007-01-01 00:00:00.0000000Hail41
2007-01-01 00:00:00.0000000Tornado14
2007-01-01 00:00:00.0000000Flash Flood11
2007-01-01 00:00:00.0000000Lightning2
2007-01-01 00:00:00.0000000Heavy Rain1
2007-01-01 00:00:00.0000000Flood1

9.6.6 - funnel_sequence_completion plugin

Learn how to use the funnel_sequence_completion plugin to calculate a funnel of completed sequence steps while comparing different time periods.

Calculates a funnel of completed sequence steps while comparing different time periods. The plugin is invoked with the evaluate operator.

Syntax

T | evaluate funnel_sequence_completion(IdColumn, TimelineColumn, Start, End, BinSize, StateColumn, Sequence, MaxSequenceStepWindows)

Parameters

NameTypeRequiredDescription
Tstring✔️The input tabular expression.
IdColumstring✔️The column reference representing the ID. The column must be present in T.
TimelineColumnstring✔️The column reference representing the timeline. The column must be present in T.
Startdatetime, timespan, or long✔️The analysis start period.
Enddatetime, timespan, or long✔️The analysis end period.
BinSizedatetime, timespan, or long✔️The analysis window size. Each window is analyzed separately.
StateColumnstring✔️The column reference representing the state. The column must be present in T.
Sequencedynamic✔️An array with the sequence values that are looked up in StateColumn.
MaxSequenceStepPeriodsdynamic✔️An array with the values of the max allowed timespan between the first and last sequential steps in the sequence. Each period in the array generates a funnel analysis result.

Returns

Returns a single table useful for constructing a funnel diagram for the analyzed sequence:

  • TimelineColumn: the analyzed time window (bin), each bin in the analysis timeframe (Start to End) generates a funnel analysis separately.
  • StateColumn: the state of the sequence.
  • Period: the maximal period allowed for completing steps in the funnel sequence measured from the first step in the sequence. Each value in MaxSequenceStepPeriods generates a funnel analysis with a separate period.
  • dcount: distinct count of IdColumn in time window that transitioned from first sequence state to the value of StateColumn.

Examples

Exploring Storm Events

The following query checks the completion funnel of the sequence: Hail -> Tornado -> Thunderstorm Wind in “overall” time of 1hour, 4hours, 1day.

let _start = datetime(2007-01-01);
let _end =  datetime(2008-01-01);
let _windowSize = 365d;
let _sequence = dynamic(['Hail', 'Tornado', 'Thunderstorm Wind']);
let _periods = dynamic([1h, 4h, 1d]);
StormEvents
| evaluate funnel_sequence_completion(EpisodeId, StartTime, _start, _end, _windowSize, EventType, _sequence, _periods) 

Output

StartTimeEventTypePerioddcount
2007-01-01 00:00:00.0000000Hail01:00:002877
2007-01-01 00:00:00.0000000Tornado01:00:00208
2007-01-01 00:00:00.0000000Thunderstorm Wind01:00:0087
2007-01-01 00:00:00.0000000Hail04:00:002877
2007-01-01 00:00:00.0000000Tornado04:00:00231
2007-01-01 00:00:00.0000000Thunderstorm Wind04:00:00141
2007-01-01 00:00:00.0000000Hail1.00:00:002877
2007-01-01 00:00:00.0000000Tornado1.00:00:00244
2007-01-01 00:00:00.0000000Thunderstorm Wind1.00:00:00155

Understanding the results:
The outcome is three funnels (for periods: One hour, 4 hours, and one day). For each funnel step, a number of distinct counts of are shown. You can see that the more time is given to complete the whole sequence of Hail -> Tornado -> Thunderstorm Wind, the higher dcount value is obtained. In other words, there were more occurrences of the sequence reaching the funnel step.

9.6.7 - new_activity_metrics plugin

Learn how to use the new_activity_metrics plugin to calculate activity metrics.

Calculates useful activity metrics (distinct count values, distinct count of new values, retention rate, and churn rate) for the cohort of New Users. Each cohort of New Users (all users, which were first seen in time window) is compared to all prior cohorts. Comparison takes into account all previous time windows. For example, for records from T2 to T3, the distinct count of users will be all users in T3 who weren’t seen in both T1 and T2. The plugin is invoked with the evaluate operator.

Syntax

TabularExpression | evaluate new_activity_metrics(IdColumn, TimelineColumn, Start, End, Window [, Cohort] [, dim1, dim2, …] [, Lookback] )

Parameters

NameTypeRequiredDescription
TabularExpressionstring✔️The tabular expression for which to calculate activity metrics.
IdColumnstring✔️The name of the column with ID values that represent user activity.
TimelineColumnstring✔️The name of the column that represents the timeline.
Startscalar✔️The value of the analysis start period.
Endscalar✔️The value of the analysis end period.
Windowscalar✔️The value of the analysis window period. Can be a numeric, datetime, or timespan value, or a string that is one of week, month or year, in which case all periods will be startofweek/startofmonth/startofyear accordingly. When using startofweek, make sure start time is a Sunday, otherwise first cohort will be empty (since startofweek is considered to be a Sunday).
CohortscalarIndicates a specific cohort. If not provided, all cohorts corresponding to the analysis time window are calculated and returned.
dim1, dim2, …dynamicAn array of the dimensions columns that slice the activity metrics calculation.
LookbackstringA tabular expression with a set of IDs that belong to the ’look back’ period.

Returns

Returns a table that contains the following for each combination of ‘from’ and ’to’ timeline periods and for each existing column (dimensions) combination:

  • distinct count values
  • distinct count of new values
  • retention rate
  • churn rate

Output table schema is:

from_TimelineColumnto_TimelineColumndcount_new_valuesdcount_retained_valuesdcount_churn_valuesretention_ratechurn_ratedim1..dim_n
type: as of TimelineColumnsamelonglongdoubledoubledouble......
  • from_TimelineColumn - the cohort of new users. Metrics in this record refer to all users who were first seen in this period. The decision on first seen takes into account all previous periods in the analysis period.
  • to_TimelineColumn - the period being compared to.
  • dcount_new_values - the number of distinct users in to_TimelineColumn that weren’t seen in all periods prior to and including from_TimelineColumn.
  • dcount_retained_values - out of all new users, first seen in from_TimelineColumn, the number of distinct users that were seen in to_TimelineCoumn.
  • dcount_churn_values - out of all new users, first seen in from_TimelineColumn, the number of distinct users that weren’t seen in to_TimelineCoumn.
  • retention_rate - the percent of dcount_retained_values out of the cohort (users first seen in from_TimelineColumn).
  • churn_rate - the percent of dcount_churn_values out of the cohort (users first seen in from_TimelineColumn).

Examples

The following sample dataset shows which users seen on which days. The table was generated based on a source Users table, as follows:

Users | summarize tostring(make_set(user)) by bin(Timestamp, 1d) | order by Timestamp asc;

Output

Timestampset_user
2019-11-01 00:00:00.0000000[0,2,3,4]
2019-11-02 00:00:00.0000000[0,1,3,4,5]
2019-11-03 00:00:00.0000000[0,2,4,5]
2019-11-04 00:00:00.0000000[0,1,2,3]
2019-11-05 00:00:00.0000000[0,1,2,3,4]

The output of the plugin for the original table is the following:

let StartDate = datetime(2019-11-01 00:00:00);
let EndDate = datetime(2019-11-07 00:00:00);
Users 
| evaluate new_activity_metrics(user, Timestamp, StartDate, EndDate-1tick, 1d) 
| where from_Timestamp < datetime(2019-11-03 00:00:00.0000000)

Output

Rfrom_Timestampto_Timestampdcount_new_valuesdcount_retained_valuesdcount_churn_valuesretention_ratechurn_rate
12019-11-01 00:00:00.00000002019-11-01 00:00:00.000000044010
22019-11-01 00:00:00.00000002019-11-02 00:00:00.00000002310.750.25
32019-11-01 00:00:00.00000002019-11-03 00:00:00.00000001310.750.25
42019-11-01 00:00:00.00000002019-11-04 00:00:00.00000001310.750.25
52019-11-01 00:00:00.00000002019-11-05 00:00:00.000000014010
62019-11-01 00:00:00.00000002019-11-06 00:00:00.000000000401
72019-11-02 00:00:00.00000002019-11-02 00:00:00.000000022010
82019-11-02 00:00:00.00000002019-11-03 00:00:00.00000000110.50.5
92019-11-02 00:00:00.00000002019-11-04 00:00:00.00000000110.50.5
102019-11-02 00:00:00.00000002019-11-05 00:00:00.00000000110.50.5
112019-11-02 00:00:00.00000002019-11-06 00:00:00.000000000201

Following is an analysis of a few records from the output:

  • Record R=3, from_TimelineColumn = 2019-11-01, to_TimelineColumn = 2019-11-03:

    • The users considered for this record are all new users seen on 11/1. Since this is the first period, these are all users in that bin – [0,2,3,4]
    • dcount_new_values – the number of users on 11/3 who weren’t seen on 11/1. This includes a single user – 5.
    • dcount_retained_values – out of all new users on 11/1, how many were retained until 11/3? There are three values ([0,2,4]), while count_churn_values is one (user=3).
    • retention_rate = 0.75 – the three retained users out of the four new users who were first seen in 11/1.
  • Record R=9, from_TimelineColumn = 2019-11-02, to_TimelineColumn = 2019-11-04:

    • This record focuses on the new users who were first seen on 11/2 – users 1 and 5.
    • dcount_new_values – the number of users on 11/4 who weren’t seen through all periods T0 .. from_Timestamp. Meaning, users who are seen on 11/4 but who weren’t seen on either 11/1 or 11/2 – there are no such users.
    • dcount_retained_values – out of all new users on 11/2 ([1,5]), how many were retained until 11/4? There’s one such user ([1]), while count_churn_values is one (user 5).
    • retention_rate is 0.5 – the single user that was retained on 11/4 out of the two new ones on 11/2.

Weekly retention rate, and churn rate (single week)

The next query calculates a retention and churn rate for week-over-week window for New Users cohort (users that arrived on the first week).

// Generate random data of user activities
let _start = datetime(2017-05-01);
let _end = datetime(2017-05-31);
range Day from _start to _end step 1d
| extend d = tolong((Day - _start) / 1d)
| extend r = rand() + 1
| extend _users=range(tolong(d * 50 * r), tolong(d * 50 * r + 200 * r - 1), 1) 
| mv-expand id=_users to typeof(long) limit 1000000
// Take only the first week cohort (last parameter)
| evaluate new_activity_metrics(['id'], Day, _start, _end, 7d, _start)
| project from_Day, to_Day, retention_rate, churn_rate

Output

from_Dayto_Dayretention_ratechurn_rate
2017-05-01 00:00:00.00000002017-05-01 00:00:00.000000010
2017-05-01 00:00:00.00000002017-05-08 00:00:00.00000000.5446327683615820.455367231638418
2017-05-01 00:00:00.00000002017-05-15 00:00:00.00000000.0316384180790960.968361581920904
2017-05-01 00:00:00.00000002017-05-22 00:00:00.000000001
2017-05-01 00:00:00.00000002017-05-29 00:00:00.000000001

Weekly retention rate, and churn rate (complete matrix)

The next query calculates retention and churn rate for week-over-week window for New Users cohort. If the previous example calculated the statistics for a single week - the following query produces an NxN table for each from/to combination.

// Generate random data of user activities
let _start = datetime(2017-05-01);
let _end = datetime(2017-05-31);
range Day from _start to _end step 1d
| extend d = tolong((Day - _start) / 1d)
| extend r = rand() + 1
| extend _users=range(tolong(d * 50 * r), tolong(d * 50 * r + 200 * r - 1), 1) 
| mv-expand id=_users to typeof(long) limit 1000000
// Last parameter is omitted - 
| evaluate new_activity_metrics(['id'], Day, _start, _end, 7d)
| project from_Day, to_Day, retention_rate, churn_rate

Output

from_Dayto_Dayretention_ratechurn_rate
2017-05-01 00:00:00.00000002017-05-01 00:00:00.000000010
2017-05-01 00:00:00.00000002017-05-08 00:00:00.00000000.1903973509933770.809602649006622
2017-05-01 00:00:00.00000002017-05-15 00:00:00.000000001
2017-05-01 00:00:00.00000002017-05-22 00:00:00.000000001
2017-05-01 00:00:00.00000002017-05-29 00:00:00.000000001
2017-05-08 00:00:00.00000002017-05-08 00:00:00.000000010
2017-05-08 00:00:00.00000002017-05-15 00:00:00.00000000.4052631578947370.594736842105263
2017-05-08 00:00:00.00000002017-05-22 00:00:00.00000000.2276315789473680.772368421052632
2017-05-08 00:00:00.00000002017-05-29 00:00:00.000000001
2017-05-15 00:00:00.00000002017-05-15 00:00:00.000000010
2017-05-15 00:00:00.00000002017-05-22 00:00:00.00000000.7854889589905360.214511041009464
2017-05-15 00:00:00.00000002017-05-29 00:00:00.00000000.2376445846477390.762355415352261
2017-05-22 00:00:00.00000002017-05-22 00:00:00.000000010
2017-05-22 00:00:00.00000002017-05-29 00:00:00.00000000.6218354430379750.378164556962025
2017-05-29 00:00:00.00000002017-05-29 00:00:00.000000010

Weekly retention rate with lookback period

The following query calculates the retention rate of New Users cohort when taking into consideration lookback period: a tabular query with set of Ids that are used to define the New Users cohort (all IDs that don’t appear in this set are New Users). The query examines the retention behavior of the New Users during the analysis period.

// Generate random data of user activities
let _lookback = datetime(2017-02-01);
let _start = datetime(2017-05-01);
let _end = datetime(2017-05-31);
let _data = range Day from _lookback to _end step 1d
    | extend d = tolong((Day - _lookback) / 1d)
    | extend r = rand() + 1
    | extend _users=range(tolong(d * 50 * r), tolong(d * 50 * r + 200 * r - 1), 1) 
    | mv-expand id=_users to typeof(long) limit 1000000;
//
let lookback_data = _data | where Day < _start | project Day, id;
_data
| evaluate new_activity_metrics(id, Day, _start, _end, 7d, _start, lookback_data)
| project from_Day, to_Day, retention_rate

Output

from_Dayto_Dayretention_rate
2017-05-01 00:00:00.00000002017-05-01 00:00:00.00000001
2017-05-01 00:00:00.00000002017-05-08 00:00:00.00000000.404081632653061
2017-05-01 00:00:00.00000002017-05-15 00:00:00.00000000.257142857142857
2017-05-01 00:00:00.00000002017-05-22 00:00:00.00000000.296326530612245
2017-05-01 00:00:00.00000002017-05-29 00:00:00.00000000.0587755102040816

9.6.8 - rolling_percentile plugin

Learn how to use the rolling_percentile plugin to calculate an estimate of the rolling percentile per bin for the specified value column.

Returns an estimate for the specified percentile of the ValueColumn population in a rolling (sliding) BinsPerWindow size window per BinSize.

The plugin is invoked with the evaluate operator.

Syntax

T | evaluate rolling_percentile(ValueColumn, Percentile, IndexColumn, BinSize, BinsPerWindow [, dim1, dim2, …] )

Parameters

NameTypeRequiredDescription
Tstring✔️The input tabular expression.
ValueColumnstring✔️The name of the column used to calculate the percentiles.
Percentileint, long, or real✔️Scalar with the percentile to calculate.
IndexColumnstring✔️The name of the column over which to run the rolling window.
BinSizeint, long, real, datetime, or timespan✔️Scalar with size of the bins to apply over the IndexColumn.
BinsPerWindowint✔️The number of bins included in each window.
dim1, dim2, …stringA list of the dimensions columns to slice by.

Returns

Returns a table with a row per each bin (and combination of dimensions if specified) that has the rolling percentile of values in the window ending at the bin (inclusive). Output table schema is:

IndexColumndim1dim_nrolling_BinsPerWindow_percentile_ValueColumn_Pct

Examples

Rolling 3-day median value per day

The next query calculates a 3-day median value in daily granularity. Each row in the output represents the median value for the last 3 bins (days), including the bin itself.

let T = 
    range idx from 0 to 24 * 10 - 1 step 1
    | project Timestamp = datetime(2018-01-01) + 1h * idx, val=idx + 1
    | extend EvenOrOdd = iff(val % 2 == 0, "Even", "Odd");
T  
| evaluate rolling_percentile(val, 50, Timestamp, 1d, 3)

Output

Timestamprolling_3_percentile_val_50
2018-01-01 00:00:00.000000012
2018-01-02 00:00:00.000000024
2018-01-03 00:00:00.000000036
2018-01-04 00:00:00.000000060
2018-01-05 00:00:00.000000084
2018-01-06 00:00:00.0000000108
2018-01-07 00:00:00.0000000132
2018-01-08 00:00:00.0000000156
2018-01-09 00:00:00.0000000180
2018-01-10 00:00:00.0000000204

Rolling 3-day median value per day by dimension

Same example from above, but now also calculates the rolling window partitioned for each value of the dimension.

let T = 
    range idx from 0 to 24 * 10 - 1 step 1
    | project Timestamp = datetime(2018-01-01) + 1h * idx, val=idx + 1
    | extend EvenOrOdd = iff(val % 2 == 0, "Even", "Odd");
T  
| evaluate rolling_percentile(val, 50, Timestamp, 1d, 3, EvenOrOdd)

Output

TimestampEvenOrOddrolling_3_percentile_val_50
2018-01-01 00:00:00.0000000Even12
2018-01-02 00:00:00.0000000Even24
2018-01-03 00:00:00.0000000Even36
2018-01-04 00:00:00.0000000Even60
2018-01-05 00:00:00.0000000Even84
2018-01-06 00:00:00.0000000Even108
2018-01-07 00:00:00.0000000Even132
2018-01-08 00:00:00.0000000Even156
2018-01-09 00:00:00.0000000Even180
2018-01-10 00:00:00.0000000Even204
2018-01-01 00:00:00.0000000Odd11
2018-01-02 00:00:00.0000000Odd23
2018-01-03 00:00:00.0000000Odd35
2018-01-04 00:00:00.0000000Odd59
2018-01-05 00:00:00.0000000Odd83
2018-01-06 00:00:00.0000000Odd107
2018-01-07 00:00:00.0000000Odd131
2018-01-08 00:00:00.0000000Odd155
2018-01-09 00:00:00.0000000Odd179
2018-01-10 00:00:00.0000000Odd203

9.6.9 - rows_near plugin

Learn how to use the rows_near plugin to find rows near a specified condition.

Finds rows near a specified condition.

The plugin is invoked with the evaluate operator.

Syntax

T | evaluate rows_near(Condition, NumRows, [, RowsAfter ])

Parameters

NameTypeRequiredDescription
Tstring✔️The input tabular expression.
Conditionbool✔️Represents the condition to find rows around.
NumRowsint✔️The number of rows to find before and after the condition.
RowsAfterintWhen specified, overrides the number of rows to find after the condition.

Returns

Every row from the input that is within NumRows from a true Condition, When RowsAfter is specified, returns every row from the input that is NumRows before or RowsAfter after a true Condition.

Example

Find rows with an "Error" State, and returns 2 rows before and after the "Error" record.

datatable (Timestamp:datetime, Value:long, State:string )
[
    datetime(2021-06-01), 1, "Success",
    datetime(2021-06-02), 4, "Success",
    datetime(2021-06-03), 3, "Success",
    datetime(2021-06-04), 11, "Success",
    datetime(2021-06-05), 15, "Success",
    datetime(2021-06-06), 2, "Success",
    datetime(2021-06-07), 19, "Error",
    datetime(2021-06-08), 12, "Success",
    datetime(2021-06-09), 7, "Success",
    datetime(2021-06-10), 9, "Success",
    datetime(2021-06-11), 4, "Success",
    datetime(2021-06-12), 1, "Success",
]
| sort by Timestamp asc 
| evaluate rows_near(State == "Error", 2)

Output

TimestampValueState
2021-06-05 00:00:00.000000015Success
2021-06-06 00:00:00.00000002Success
2021-06-07 00:00:00.000000019Error
2021-06-08 00:00:00.000000012Success
2021-06-09 00:00:00.00000007Success

9.6.10 - sequence_detect plugin

Learn how to use the sequence_detect plugin to detect sequence occurrences based on provided predicates.

Detects sequence occurrences based on provided predicates. The plugin is invoked with the evaluate operator.

Syntax

T | evaluate sequence_detect (TimelineColumn, MaxSequenceStepWindow, MaxSequenceSpan, Expr1, Expr2, …, Dim1, Dim2,)

Parameters

NameTypeRequiredDescription
Tstring✔️The input tabular expression.
TimelineColumnstring✔️The column reference representing timeline, must be present in the source expression.
MaxSequenceStepWindowtimespan✔️The value of the max allowed timespan between 2 sequential steps in the sequence.
MaxSequenceSpantimespan✔️The max timespan for the sequence to complete all steps.
Expr1, Expr2, …string✔️The boolean predicate expressions defining sequence steps.
Dim1, Dim2, …string✔️The dimension expressions that are used to correlate sequences.

Returns

Returns a single table where each row in the table represents a single sequence occurrence:

  • Dim1, Dim2, …: dimension columns that were used to correlate sequences.
  • Expr1TimelineColumn, Expr2TimelineColumn, …: Columns with time values, representing the timeline of each sequence step.
  • Duration: the overall sequence time window

Examples

The following query looks at the table T to search for relevant data from a specified time period.

T | evaluate sequence_detect(datetime_column, 10m, 1h, e1 = (Col1 == 'Val'), e2 = (Col2 == 'Val2'), Dim1, Dim2)

Exploring Storm Events

The following query looks on the table StormEvents (weather statistics for 2007) and shows cases where sequence of ‘Excessive Heat’ was followed by ‘Wildfire’ within 5 days.

StormEvents
| evaluate sequence_detect(
               StartTime,
               5d,  // step max-time
               5d,  // sequence max-time
               heat=(EventType == "Excessive Heat"), 
               wildfire=(EventType == 'Wildfire'), 
               State
           )

Output

Stateheat_StartTimewildfire_StartTimeDuration
CALIFORNIA2007-05-08 00:00:00.00000002007-05-08 16:02:00.000000016:02:00
CALIFORNIA2007-05-08 00:00:00.00000002007-05-10 11:30:00.00000002.11:30:00
CALIFORNIA2007-07-04 09:00:00.00000002007-07-05 23:01:00.00000001.14:01:00
SOUTH DAKOTA2007-07-23 12:00:00.00000002007-07-27 09:00:00.00000003.21:00:00
TEXAS2007-08-10 08:00:00.00000002007-08-11 13:56:00.00000001.05:56:00
CALIFORNIA2007-08-31 08:00:00.00000002007-09-01 11:28:00.00000001.03:28:00
CALIFORNIA2007-08-31 08:00:00.00000002007-09-02 13:30:00.00000002.05:30:00
CALIFORNIA2007-09-02 12:00:00.00000002007-09-02 13:30:00.000000001:30:00

9.6.11 - session_count plugin

Learn how to use the session_count plugin to calculate the session count based on the ID column over a timeline.

Calculates the session count based on the ID column over a timeline. The plugin is invoked with the evaluate operator.

Syntax

TabularExpression | evaluate session_count(IdColumn, TimelineColumn, Start, End, Bin, LookBackWindow [, dim1, dim2, …])

Parameters

NameTypeRequiredDescription
TabularExpressionstring✔️The tabular expression that serves as input.
IdColumnstring✔️The name of the column with ID values that represents user activity.
TimelineColumnstring✔️The name of the column that represents the timeline.
Startscalar✔️The start of the analysis period.
Endscalar✔️The end of the analysis period.
Binscalar✔️The session’s analysis step period.
LookBackWindowscalar✔️The session lookback period. If the ID from IdColumn appears in a time window within LookBackWindow, the session is considered to be an existing one. If the ID doesn’t appear, then the session is considered to be new.
dim1, dim2, …stringA list of the dimensions columns that slice the session count calculation.

Returns

Returns a table that has the session count values for each timeline period and for each existing dimensions combination.

Output table schema is:

TimelineColumndim1..dim_ncount_sessions
type: as of TimelineColumn......long

Examples

For this example, the data is deterministic, and we use a table with two columns:

  • Timeline: a running number from 1 to 10,000
  • Id: ID of the user from 1 to 50

Id appears at the specific Timeline slot if it’s a divider of Timeline (Timeline % Id == 0).

An event with Id==1 will appear at any Timeline slot, an event with Id==2 at every second Timeline slot, and so on.

Here are 20 lines of the data:

let _data = range Timeline from 1 to 10000 step 1
    | extend __key = 1
    | join kind=inner (range Id from 1 to 50 step 1 | extend __key=1) on __key
    | where Timeline % Id == 0
    | project Timeline, Id;
// Look on few lines of the data
_data
| order by Timeline asc, Id asc
| take 20

Output

TimelineId
11
21
22
31
33
41
42
44
51
55
61
62
63
66
71
77
81
82
84
88

Let’s define a session in next terms: session considered to be active as long as user (Id) appears at least once at a timeframe of 100 time slots, while session look-back window is 41 time slots.

The next query shows the count of active sessions according to the above definition.

let _data = range Timeline from 1 to 9999 step 1
    | extend __key = 1
    | join kind=inner (range Id from 1 to 50 step 1 | extend __key=1) on __key
    | where Timeline % Id == 0
    | project Timeline, Id;
// End of data definition
_data
| evaluate session_count(Id, Timeline, 1, 10000, 100, 41)
| render linechart 

Example session count.

9.6.12 - sliding_window_counts plugin

Learn how to use the sliding_window_counts plugin to calculate counts and distinct counts of values in a sliding window over a lookback period.

Calculates counts and distinct count of values in a sliding window over a lookback period, using the technique described in the Perform aggregations over a sliding window example. The plugin is invoked with the evaluate operator.

Syntax

T | evaluate sliding_window_counts(IdColumn, TimelineColumn, Start, End, LookbackWindow, Bin , [dim1, dim2, …])

Parameters

NameTypeRequiredDescription
Tstring✔️The input tabular expression.
IdColumnstring✔️The name of the column with ID values that represent user activity.
TimelineColumnstring✔️The name of the column representing the timeline.
Startint, long, real, datetime, or timespan✔️The analysis start period.
Endint, long, real, datetime, or timespan✔️The analysis end period.
LookbackWindowint, long, real, datetime, or timespan✔️The lookback period. This value should be a multiple of the Bin value, otherwise the LookbackWindow will be rounded down to a multiple of the Bin value. For example, for dcount users in past 7d: LookbackWindow = 7d.
Binint, long, real, datetime, timespan, or string✔️The analysis step period. The possible string values are week, month, and year for which all periods will be startofweek, startofmonth, startofyear respectively.
dim1, dim2, …stringA list of the dimensions columns that slice the activity metrics calculation.

Returns

Returns a table that has the count and distinct count values of Ids in the lookback period, for each timeline period (by bin) and for each existing dimensions combination.

Output table schema is:

TimelineColumndim1..dim_ncountdcount
type: as of TimelineColumn......longlong

Example

Calculate counts and dcounts for users in past week, for each day in the analysis period.

let start = datetime(2017 - 08 - 01);
let end = datetime(2017 - 08 - 07); 
let lookbackWindow = 3d;  
let bin = 1d;
let T = datatable(UserId: string, Timestamp: datetime)
    [
    'Bob', datetime(2017 - 08 - 01), 
    'David', datetime(2017 - 08 - 01), 
    'David', datetime(2017 - 08 - 01), 
    'John', datetime(2017 - 08 - 01), 
    'Bob', datetime(2017 - 08 - 01), 
    'Ananda', datetime(2017 - 08 - 02),  
    'Atul', datetime(2017 - 08 - 02), 
    'John', datetime(2017 - 08 - 02), 
    'Ananda', datetime(2017 - 08 - 03), 
    'Atul', datetime(2017 - 08 - 03), 
    'Atul', datetime(2017 - 08 - 03), 
    'John', datetime(2017 - 08 - 03), 
    'Bob', datetime(2017 - 08 - 03), 
    'Betsy', datetime(2017 - 08 - 04), 
    'Bob', datetime(2017 - 08 - 05), 
];
T
| evaluate sliding_window_counts(UserId, Timestamp, start, end, lookbackWindow, bin)

Output

TimestampCountdcount
2017-08-01 00:00:00.000000053
2017-08-02 00:00:00.000000085
2017-08-03 00:00:00.0000000135
2017-08-04 00:00:00.000000095
2017-08-05 00:00:00.000000075
2017-08-06 00:00:00.000000022
2017-08-07 00:00:00.000000011

9.6.13 - User Analytics

This article describes User Analytics.

This section describes Kusto extensions (plugins) for user analytics scenarios.

ScenarioPluginDetailsUser Experience
Counting new users over timeactivity_counts_metricsReturns counts/dcounts/new counts for each time window. Each time window is compared to all previous time windowsKusto.Explorer: Report Gallery
Period-over-period: retention/churn rate and new usersactivity_metricsReturns dcount, retention/churn rate for each time window. Each time window is compared to previous time windowKusto.Explorer: Report Gallery
Users count and dcount over sliding windowsliding_window_countsFor each time window, returns count and dcount over a lookback period, in a sliding window manner
New-users cohort: retention/churn rate and new usersnew_activity_metricsCompares between cohorts of new users (all users that were first seen in time window). Each cohort is compared to all prior cohorts. Comparison takes into account all previous time windowsKusto.Explorer: Report Gallery
Active Users: distinct countsactive_users_countReturns distinct users for each time window. A user is only considered if it appears in at least X distinct periods in a specified lookback period.
User Engagement: DAU/WAU/MAUactivity_engagementCompares between an inner time window (for example, daily) and an outer (for example, weekly) for computing engagement (for example, DAU/WAU)Kusto.Explorer: Report Gallery
Sessions: count active sessionssession_countCounts sessions, where a session is defined by a time period - a user record is considered a new session, if it hasn’t been seen in the lookback period from current record
Funnels: previous and next state sequence analysisfunnel_sequenceCounts distinct users who have taken a sequence of events, and the previous or next events that led or were followed by the sequence. Useful for constructing sankey diagrams
Funnels: sequence completion analysisfunnel_sequence_completionComputes the distinct count of users that have completed a specified sequence in each time window

||||

10 - Query statements

10.1 - Alias statement

Learn how to use an alias statement to define an alias for a database that is used for a query.

Alias statements allow you to define an alias for a database, which can be used in the same query.

The alias statement is useful as a shorthand name for a database so it can be referenced using that alias in the same query.

Syntax

alias database DatabaseAliasName = cluster(“QueryURI”).database("DatabaseName")

Parameters

NameTypeRequiredDescription
DatabaseAliasNamestring✔️An existing name or new database alias name. You can escape the name with brackets. For example, [“Name with spaces”].
QueryURIstring✔️The URI that can be used to run queries or management commands.
DatabaseNamestring✔️The name of the database to give an alias.

Examples

First, count the number of records in that table.

StormEvents
| count

Output

Count
59066

Then, give an alias to the Samples database and use that name to check the record count of the StormEvents table.

alias database samplesAlias = cluster("https://help.kusto.windows.net").database("Samples");
database("samplesAlias").StormEvents | count

Output

Count
59066

Create an alias name that contains spaces using the bracket syntax.

alias database ["Samples Database Alias"] = cluster("https://help.kusto.windows.net").database("Samples");
database("Samples Database Alias").StormEvents | count

Output

Count
59066

10.2 - Batches

This article describes Batches.

A query can include multiple tabular expression statements, as long as they’re delimited by a semicolon (;) character. The query then returns multiple tabular results. Results are produced by the tabular expression statements and ordered according to the order of the statements in the query text.

Examples

The following examples show how to create multiple tables simultaneously.

Name tabular results

The following query produces two tabular results. User agent tools can then display those results with the appropriate name associated with each (Count of events in Florida and Count of events in Guam, respectively).

StormEvents | where State == "FLORIDA" | count | as ['Count of events in Florida'];
StormEvents | where State == "GUAM" | count | as ['Count of events in Guam']

Output

Count of events in Florida

Count
1042

Count of events in Guam

Count
4

Share a calculation

Batching is useful for scenarios where a common calculation is shared by multiple subqueries, such as for dashboards. If the common calculation is complex, use the materialize() function and construct the query so that it will be executed only once.

let m = materialize(StormEvents | summarize n=count() by State);
m | where n > 2000;
m | where n < 10

Output

Table 1

Staten
ILLINOIS2022
IOWA2337
KANSAS3166
MISSOURI2016
TEXAS4701

Table 2

Staten
GUAM2022
GULF OF ALASKA2337
HAWAII WATERS3166
LAKE ONTARIO2016

10.3 - Let statement

Learn how to use the Let statement to set a variable name to define an expression or a function.

A let statement is used to set a variable name equal to an expression or a function, or to create views.

let statements are useful for:

  • Breaking up a complex expression into multiple parts, each represented by a variable.
  • Defining constants outside of the query body for readability.
  • Defining a variable once and using it multiple times within a query.

If the variable previously represented another value, for example in nested statements, the innermost let statement applies.

To optimize multiple uses of the let statement within a single query, see Optimize queries that use named expressions.

Syntax: Scalar or tabular expressions

let Name = Expression

Parameters

NameTypeRequiredDescription
Namestring✔️The variable name. You can escape the name with brackets. For example, ["Name with spaces"].
Expressionstring✔️An expression with a scalar or tabular result. For example, an expression with a scalar result would be let one=1;, and an expression with a tabular result would be `let RecentLog = Logs

Syntax: View or function

let Name = [view] ([ Parameters ]) { FunctionBody }

Parameters

NameTypeRequiredDescription
FunctionBodystring✔️An expression that yields a user defined function.
viewstringOnly relevant for a parameter-less let statement. When used, the let statement is included in queries with a union operator with wildcard selection of the tables/views. For an example, see Create a view or virtual table.
ParametersstringZero or more comma-separated tabular or scalar function parameters.

For each parameter of tabular type, the parameter should be in the format TableName:TableSchema, in which TableSchema is either a comma-separated list of columns in the format ColumnName:ColumnType or a wildcard (*). If columns are specified, then the input tabular argument must contain these columns. If a wildcard is specified, then the input tabular argument can have any schema. To reference columns in the function body, they must be specified. For examples, see Tabular argument with schema and Tabular argument with wildcard.

For each parameter of scalar type, provide the parameter name and parameter type in the format Name:Type. The name can appear in the FunctionBody and is bound to a particular value when the user defined function is invoked. The only supported types are bool, string, long, datetime, timespan, real, dynamic, and the aliases to these types.

Examples

The examples in this section show how to use the syntax to help you get started.

The query examples show the syntax and example usage of the operator, statement, or function.

Define scalar values

The following example uses a scalar expression statement.

let n = 10;  // number
let place = "Dallas";  // string
let cutoff = ago(62d); // datetime 
Events 
| where timestamp > cutoff 
    and city == place 
| take n

The following example binds the name some number using the ['name'] notation, and then uses it in a tabular expression statement.

let ['some number'] = 20;
range y from 0 to ['some number'] step 5

Output

y
0
5
10
15
20

Create a user defined function with scalar calculation

This example uses the let statement with arguments for scalar calculation. The query defines function MultiplyByN for multiplying two numbers.

let MultiplyByN = (val:long, n:long) { val * n };
range x from 1 to 5 step 1 
| extend result = MultiplyByN(x, 5)

Output

xresult
15
210
315
420
525

Create a user defined function that trims input

The following example removes leading and trailing ones from the input.

let TrimOnes = (s:string) { trim("1", s) };
range x from 10 to 15 step 1 
| extend result = TrimOnes(tostring(x))

Output

xresult
100
11
122
133
144
155

Use multiple let statements

This example defines two let statements where one statement (foo2) uses another (foo1).

let foo1 = (_start:long, _end:long, _step:long) { range x from _start to _end step _step};
let foo2 = (_step:long) { foo1(1, 100, _step)};
foo2(2) | count

Output

result
50

Create a view or virtual table

This example shows you how to use a let statement to create a view or virtual table.

let Range10 = view () { range MyColumn from 1 to 10 step 1 };
let Range20 = view () { range MyColumn from 1 to 20 step 1 };
search MyColumn == 5

Output

$tableMyColumn
Range105
Range205

Use a materialize function

The materialize() function lets you cache subquery results during the time of query execution. When you use the materialize() function, the data is cached, and any subsequent invocation of the result uses cached data.

let totalPagesPerDay = PageViews
| summarize by Page, Day = startofday(Timestamp)
| summarize count() by Day;
let materializedScope = PageViews
| summarize by Page, Day = startofday(Timestamp);
let cachedResult = materialize(materializedScope);
cachedResult
| project Page, Day1 = Day
| join kind = inner
(
    cachedResult
    | project Page, Day2 = Day
)
on Page
| where Day2 > Day1
| summarize count() by Day1, Day2
| join kind = inner
    totalPagesPerDay
on $left.Day1 == $right.Day
| project Day1, Day2, Percentage = count_*100.0/count_1

Output

Day1Day2Percentage
2016-05-01 00:00:00.00000002016-05-02 00:00:00.000000034.0645725975255
2016-05-01 00:00:00.00000002016-05-03 00:00:00.000000016.618368960101
2016-05-02 00:00:00.00000002016-05-03 00:00:00.000000014.6291376489636

Using nested let statements

Nested let statements are permitted, including within a user defined function expression. Let statements and arguments apply in both the current and inner scope of the function body.

let start_time = ago(5h); 
let end_time = start_time + 2h; 
T | where Time > start_time and Time < end_time | ...

Tabular argument with schema

The following example specifies that the table parameter T must have a column State of type string. The table T may include other columns as well, but they can’t be referenced in the function StateState because the aren’t declared.

let StateState=(T: (State: string)) { T | extend s_s=strcat(State, State) };
StormEvents
| invoke StateState()
| project State, s_s

Output

States_s
ATLANTIC SOUTHATLANTIC SOUTHATLANTIC SOUTH
FLORIDAFLORIDAFLORIDA
FLORIDAFLORIDAFLORIDA
GEORGIAGEORGIAGEORGIA
MISSISSIPPIMISSISSIPPIMISSISSIPPI

Tabular argument with wildcard

The table parameter T can have any schema, and the function CountRecordsInTable will work.

let CountRecordsInTable=(T: (*)) { T | count };
StormEvents | invoke CountRecordsInTable()

Output

Count
59,066

10.4 - Pattern statement

Learn how to use pattern statements to map string tuples to tabular expressions.

A pattern is a construct that maps string tuples to tabular expressions.

Each pattern must declare a pattern name and optionally define a pattern mapping. Patterns that define a mapping return a tabular expression when invoked. Separate any two statements by a semicolon.

Empty patterns are patterns that are declared but don’t define a mapping. When invoked, they return error SEM0036 along with the details of the missing pattern definitions in the HTTP header.

Middle-tier applications that provide a Kusto Query Language (KQL) experience can use the returned details as part of their process to enrich KQL query results. For more information, see Working with middle-tier applications.

Syntax

  • Declare an empty pattern:

    declare pattern PatternName ;

  • Declare and define a pattern:

    declare pattern PatternName = (ArgName : ArgType [, … ]) [[ PathName : PathArgType ]]

    {

          ( ArgValue1_1 [, ArgValue2_1, … ] ) [ .[ PathValue_1 ] ] = { expression1 } ;

        [ ( ArgValue1_2 [, ArgValue2_2, … ] ) [ .[ PathValue_2 ] ] = { expression2 } ; … ]

    } ;

  • Invoke a pattern:

    • PatternName ( ArgValue1 [, ArgValue2 …] ).PathValue
    • PatternName ( ArgValue1 [, ArgValue2 …] ).["PathValue"]

Parameters

NameTypeRequiredDescription
PatternNamestring✔️The name of the pattern.
ArgNamestring✔️The name of the argument. Patterns can have one or more arguments.
ArgTypestring✔️The scalar data type of the ArgName argument. Possible values: string
PathNamestringThe name of the path argument. Patterns can have no path or one path.
PathArgTypestringThe type of the PathArgType argument. Possible values: string
ArgValuestring✔️The ArgName and optional PathName tuple values to be mapped to an expression.
PathValuestringThe value to map for PathName.
expressionstring✔️A tabular or lambda expression that references a function returning tabular data. For example: `Logs

Examples

The examples in this section show how to use the syntax to help you get started.

Define a simple pattern

This example defines a pattern that maps states to an expression that returns its capital/major city.

declare pattern country = (name:string)[state:string]
{
  ("USA").["New York"] = { print Capital = "Albany" };
  ("USA").["Washington"] = { print Capital = "Olympia" };
  ("Canada").["Alberta"] = { print Capital = "Edmonton" };
};
country("Canada").Alberta

Output

Capital
Edmonton

Define a scoped pattern

This example defines a pattern to scope data and metrics of application data. The pattern is invoked to return a union of the data.

declare pattern App = (applicationId:string)[scope:string]
{
    ('a1').['Data']    = { range x from 1 to 5 step 1 | project App = "App #1", Data    = x };
    ('a1').['Metrics'] = { range x from 1 to 5 step 1 | project App = "App #1", Metrics = rand() };
    ('a2').['Data']    = { range x from 1 to 5 step 1 | project App = "App #2", Data    = 10 - x };
    ('a3').['Metrics'] = { range x from 1 to 5 step 1 | project App = "App #3", Metrics = rand() };
};
union App('a2').Data, App('a1').Metrics

Output

AppDataMetrics
App #29
App #28
App #27
App #26
App #25
App #10.53674122855537532
App #10.78304713305654439
App #10.20168860732346555
App #10.13249123867679469
App #10.19388305330563443

Normalization

There are syntax variations for invoking patterns. For example, the following union returns a single pattern expression since all the invocations are of the same pattern.

declare pattern app = (applicationId:string)[eventType:string]
{
    ("ApplicationX").["StopEvents"] = { database("AppX").Events | where EventType == "StopEvent" };
    ("ApplicationX").["StartEvents"] = { database("AppX").Events | where EventType == "StartEvent" };
};
union
  app("ApplicationX").StartEvents,
  app('ApplicationX').StartEvents,
  app("ApplicationX").['StartEvents'],
  app("ApplicationX").["StartEvents"]

No wildcards

There’s no special treatment given to wildcards in a pattern. For example, the following query returns a single missing pattern invocation.

declare pattern app = (applicationId:string)[eventType:string]
{
    ("ApplicationX").["StopEvents"] = { database("AppX").Events | where EventType == "StopEvent" };
    ("ApplicationX").["StartEvents"] = { database("AppX").Events | where EventType == "StartEvent" };
};
union app("ApplicationX").["*"]
| count

Output semantic error

Work with middle-tier applications

A middle-tier application provides its users with the ability to use KQL and wants to enhance the experience by enriching the query results with augmented data from its internal service.

To this end, the application provides users with a pattern statement that returns tabular data that their users can use in their queries. The pattern’s arguments are the keys the application will use to retrieve the enrichment data.

When the user runs the query, the application doesn’t parse the query itself but instead uses the error returned by an empty pattern to retrieve the keys it requires. So it prepends the query with the empty pattern declaration, sends it to the cluster for processing, and then parses the returned HTTP header to retrieve the values of missing pattern arguments. The application uses these values to look up the enrichment data and builds a new declaration that defines the appropriate enrichment data mapping.

Finally, the application prepends the new definition to the query, resends it for processing, and returns the result it receives to the user.

Example

In the examples, a pattern is declared, defined, and then invoked.

Declare an empty pattern

In this example, a middle-tier application enriches queries with longitude/latitude locations. The application uses an internal service to map IP addresses to longitude/latitude locations, and provides a pattern called map_ip_to_longlat. When the query is run, it returns an error with missing pattern definitions:

map_ip_to_longlat("10.10.10.10")

Declare and define a pattern

The application does not parse this query and hence does not know which IP address (10.10.10.10) was passed to the pattern. So it prepends the user query with an empty map_ip_to_longlat pattern declaration and sends it for processing:

declare pattern map_ip_to_longlat;
map_ip_to_longlat("10.10.10.10")

The application receives the following error in response.

Invoke a pattern

The application inspects the error, determines that the error indicates a missing pattern reference, and retrieves the missing IP address (10.10.10.10). It uses the IP address to look up the enrichment data in its internal service and builds a new pattern defining the mapping of the IP address to the corresponding longitude and latitude data. The new pattern is prepended to the user’s query and run again.

This time the query succeeds because the enrichment data is now declared in the query, and the result is sent to the user.

declare pattern map_ip_to_longlat = (address:string)
{
  ("10.10.10.10") = { print Lat=37.405992, Long=-122.078515 };
};
map_ip_to_longlat("10.10.10.10")

Output

LatLong
37.405992-122.078515

10.5 - Query parameters declaration statement

Learn how to use the query parameters declaration statement to parameterize queries and protect against injection attacks.

Queries sent to Kusto may include a set of name or value pairs. The pairs are called query parameters, together with the query text itself. The query may reference one or more values, by specifying names and type, in a query parameters declaration statement.

Query parameters have two main uses:

  • As a protection mechanism against injection attacks.
  • As a way to parameterize queries.

In particular, client applications that combine user-provided input in queries that they then send to Kusto should use the mechanism to protect against the Kusto equivalent of SQL Injection attacks.

Declaring query parameters

To reference query parameters, the query text, or functions it uses, must first declare which query parameter it uses. For each parameter, the declaration provides the name and scalar type. Optionally, the parameter can also have a default value. The default is used if the request doesn’t provide a concrete value for the parameter. Kusto then parses the query parameter’s value, according to its normal parsing rules for that type.

Syntax

declare query_parameters ( Name1 : Type1 [= DefaultValue1] [,…] );

Parameters

NameTypeRequiredDescription
Name1string✔️The name of a query parameter used in the query.
Type1string✔️The corresponding type, such as string or datetime. The values provided by the user are encoded as strings. The appropriate parse method is applied to the query parameter to get a strongly typed value.
DefaultValue1stringA default value for the parameter. This value must be a literal of the appropriate scalar type.

Example

The examples in this section show how to use the syntax to help you get started.

Declare query parameters

This query retrieves storm events from the StormEvents table where the total number of direct and indirect injuries exceeds a specified threshold (default is 90). It then projects the EpisodeId, EventType, and the total number of injuries for each of these events.

declare query_parameters(maxInjured:long = 90);
StormEvents 
| where InjuriesDirect + InjuriesIndirect > maxInjured
| project EpisodeId, EventType, totalInjuries = InjuriesDirect + InjuriesIndirect

Output

EpisodeIdEventTypetotalInjuries
12459Winter Weather137
10477Excessive Heat200
10391Heat187
10217Excessive Heat422
10217Excessive Heat519

Specify query parameters in a client application

The names and values of query parameters are provided as string values by the application making the query. No name may repeat.

The interpretation of the values is done according to the query parameters declaration statement. Every value is parsed as if it were a literal in the body of a query. The parsing is done according to the type specified by the query parameters declaration statement.

REST API

Query parameters are provided by client applications through the properties slot of the request body’s JSON object, in a nested property bag called Parameters. For example, here’s the body of a REST API call to Kusto that calculates the age of some user, presumably by having the application ask for the user’s birthday.

{
    "ns": null,
    "db": "myDB",
    "csl": "declare query_parameters(birthday:datetime); print strcat(\"Your age is: \", tostring(now() - birthday))",
    "properties": "{\"Options\":{},\"Parameters\":{\"birthday\":\"datetime(1970-05-11)\",\"courses\":\"dynamic(['Java', 'C++'])\"}}"
}

Kusto SDKs

To learn how to provide the names and values of query parameters when using Kusto client libraries, see Use query parameters to protect user input.

Kusto.Explorer

To set the query parameters sent when making a request to the service, use the Query parameters “wrench” icon (ALT + P).

10.6 - Query statements

This article lists the types of query statements.

A query consists of one or more query statements, delimited by a semicolon (;). At least one of these query statements must be a tabular expression statement. The tabular expression statement generates one or more tabular results. Any two statements must be separated by a semicolon. When the query has more than one tabular expression statement, the query has a batch of tabular expression statements, and the tabular results generated by these statements are all returned by the query.

Two types of query statements:

  • Statements that are primarily used by users (user query statements),
  • Statements that have been designed to support scenarios in which mid-tier applications take user queries and send a modified version of them to Kusto (application query statements).

Some query statements are useful in both scenarios.

User query statements

Following is a list of user query statements:

  • A let statement defines a binding between a name and an expression. Let statements can be used to break a long query into small named parts that are easier to understand.

  • A set statement sets a request property that affects how the query is processed and its results returned.

  • A tabular expression statement, the most important query statement, returns the “interesting” data back as results.

Application query statements

Following is a list of application query statements:

  • An alias statement defines an alias to another database (in the same cluster or on a remote cluster).

  • A pattern statement, which can be used by applications that are built on top of Kusto and expose the query language to their users to inject themselves into the query name resolution process.

  • A query parameters statement, which is used by applications that are built on top of Kusto to protect themselves against injection attacks (similar to how command parameters protect SQL against SQL injection attacks.)

  • A restrict statement, which is used by applications that are built on top of Kusto to restrict queries to a specific subset of data in Kusto (including restricting access to specific columns and records.)

10.7 - Restrict statement

Learn how to use the restrict statement to limit tabular views that are visible to subsequent query statements.

The restrict statement limits the set of table/view entities which are visible to query statements that follow it. For example, in a database that includes two tables (A, B), the application can prevent the rest of the query from accessing B and only “see” a limited form of table A by using a view.

The restrict statement’s main scenario is for middle-tier applications that accept queries from users and want to apply a row-level security mechanism over those queries. The middle-tier application can prefix the user’s query with a logical model, a set of let statements to define views that restrict the user’s access to data, for example ( T | where UserId == "..."). As the last statement being added, it restricts the user’s access to the logical model only.

Syntax

restrict access to (EntitySpecifiers)

Parameters

NameTypeRequiredDescription
EntitySpecifiersstring✔️One or more comma-separated entity specifiers. The possible values are:
- An identifier defined by a let statement as a tabular view
- A table or function reference, similar to one used by a union statement
- A pattern defined by a pattern declaration

Examples

The examples in this section show how to use the syntax to help you get started.

Let statement

The example uses a let statement appearing before restrict statement.

// Limit access to 'Test' let statement only
let Test = () { print x=1 };
restrict access to (Test);

Tables or functions

The example uses references to tables or functions that are defined in the database metadata.

// Assuming the database that the query uses has table Table1 and Func1 defined in the metadata, 
// and other database 'DB2' has Table2 defined in the metadata

restrict access to (database().Table1, database().Func1, database('DB2').Table2);

Patterns

The example uses wildcard patterns that can match multiples of let statements or tables/functions.

let Test1 = () { print x=1 };
let Test2 = () { print y=1 };
restrict access to (*);
// Now access is restricted to Test1, Test2 and no tables/functions are accessible.

// Assuming the database that the query uses has table Table1 and Func1 defined in the metadata.
// Assuming that database 'DB2' has table Table2 and Func2 defined in the metadata
restrict access to (database().*);
// Now access is restricted to all tables/functions of the current database ('DB2' is not accessible).

// Assuming the database that the query uses has table Table1 and Func1 defined in the metadata.
// Assuming that database 'DB2' has table Table2 and Func2 defined in the metadata
restrict access to (database('DB2').*);
// Now access is restricted to all tables/functions of the database 'DB2'

Prevent user from querying other user data

The example shows how a middle-tier application can prepend a user’s query with a logical model that prevents the user from querying any other user’s data.

// Assume the database has a single table, UserData,
// with a column called UserID and other columns that hold
// per-user private information.
//
// The middle-tier application generates the following statements.
// Note that "username@domain.com" is something the middle-tier application
// derives per-user as it authenticates the user.
let RestrictedData = view () { Data | where UserID == "username@domain.com" };
restrict access to (RestrictedData);
// The rest of the query is something that the user types.
// This part can only reference RestrictedData; attempting to reference Data
// will fail.
RestrictedData | summarize MonthlySalary=sum(Salary) by Year, Month
// Restricting access to Table1 in the current database (database() called without parameters)
restrict access to (database().Table1);
Table1 | count

// Restricting access to Table1 in the current database and Table2 in database 'DB2'
restrict access to (database().Table1, database('DB2').Table2);
union 
    (Table1),
    (database('DB2').Table2))
| count

// Restricting access to Test statement only
let Test = () { range x from 1 to 10 step 1 };
restrict access to (Test);
Test
 
// Assume that there is a table called Table1, Table2 in the database
let View1 = view () { Table1 | project Column1 };
let View2 = view () { Table2 | project Column1, Column2 };
restrict access to (View1, View2);
 
// When those statements appear before the command - the next works
let View1 = view () { Table1 | project Column1 };
let View2 = view () { Table2 | project Column1, Column2 };
restrict access to (View1, View2);
View1 |  count
 
// When those statements appear before the command - the next access is not allowed
let View1 = view () { Table1 | project Column1 };
let View2 = view () { Table2 | project Column1, Column2 };
restrict access to (View1, View2);
Table1 |  count

10.8 - Set statement

Learn how to use the set statement to set a request property for the duration of the query.

The set statement is used to set a request property for the duration of the query.

Request properties control how a query executes and returns results. They can be boolean flags, which are false by default, or have an integer value. A query may contain zero, one, or more set statements. Set statements affect only the tabular expression statements that trail them in the program order. Any two statements must be separated by a semicolon.

Request properties aren’t formally a part of the Kusto Query Language and may be modified without being considered as a breaking language change.

Syntax

set OptionName [= OptionValue]

Parameters

NameTypeRequiredDescription
OptionNamestring✔️The name of the request property.
OptionValue✔️The value of the request property.

Example

This query enables query tracing and then fetches the first 100 records from the StormEvents table.

set querytrace;
StormEvents | take 100

Output

The table shows the first few results.

StartTimeEndTimeEpisodeIdEventIdStateEventType
2007-01-15T12:30:00Z2007-01-15T16:00:00Z16367821OHIOFlood
2007-08-03T01:50:00Z2007-08-03T01:50:00Z1008556083NEW YORKThunderstorm Wind
2007-08-03T15:33:00Z2007-08-03T15:33:00Z1008656084NEW YORKHail
2007-08-03T15:40:00Z2007-08-03T15:40:00Z1008656085NEW YORKHail
2007-08-03T23:15:00Z2007-08-05T04:30:00Z656938232NEBRASKAFlood
2007-08-06T18:19:00Z2007-08-06T18:19:00Z671939781IOWAThunderstorm Wind

10.9 - Tabular expression statements

Learn how to use tabular expression statements to produce tabular datasets.

The tabular expression statement is what people usually have in mind when they talk about queries. This statement usually appears last in the statement list, and both its input and its output consists of tables or tabular datasets. Any two statements must be separated by a semicolon.

A tabular expression statement is generally composed of tabular data sources such as tables, tabular data operators such as filters and projections, and optional rendering operators. The composition is represented by the pipe character (|), giving the statement a regular form that visually represents the flow of tabular data from left to right. Each operator accepts a tabular dataset “from the pipe”, and other inputs including more tabular datasets from the body of the operator, then emits a tabular dataset to the next operator that follows.

Syntax

Source | Operator1 | Operator2 | RenderInstruction

Parameters

NameTypeRequiredDescription
Sourcestring✔️A tabular data source. See Tabular data sources.
Operatorstring✔️Tabular data operators, such as filters and projections.
RenderInstructionstringRendering operators or instructions.

Tabular data sources

A tabular data source produces sets of records, to be further processed by tabular data operators. The following list shows supported tabular data sources:

Examples

The examples in this section show how to use the syntax to help you get started.

Filter rows by condition

This query counts the number of records in the StormEvents table that have a value of “FLORIDA” in the State column.

StormEvents 
| where State == "FLORIDA"
| count

Output

Count
1042

Combine data from two tables

In this example, the join operator is used to combine records from two tabular data sources: the StormEvents table and the PopulationData table.

StormEvents 
| where InjuriesDirect + InjuriesIndirect > 50
| join (PopulationData) on State
| project State, Population, TotalInjuries = InjuriesDirect + InjuriesIndirect

Output

StatePopulationTotalInjuries
ALABAMA491869060
CALIFORNIA3956290061
KANSAS291527063
MISSOURI6153230422
OKLAHOMA3973710200
TENNESSEE6886720187
TEXAS29363100137

11 - Reference

11.1 - JSONPath syntax

Learn how to use JSONPath expressions to specify data mappings and KQL functions that process dynamic objects.

JSONPath notation describes the path to one or more elements in a JSON document.

The JSONPath notation is used in the following scenarios:

The following subset of the JSONPath notation is supported:

Path expressionDescription
$Root object
.Selects the specified property in a parent object.
Use this notation if the property doesn’t contain special characters.
['property'] or ["property"]Selects the specified property in a parent object. Make sure you put single quotes or double quotes around the property name.
Use this notation if the property name contains special characters, such as spaces, or begins with a character other than A..Za..z_.
[n]Selects the n-th element from an array. Indexes are 0-based.

Example

Given the following JSON document:

{
  "Source": "Server-01",
  "Timestamp": "2023-07-25T09:15:32.123Z",
  "Log Level": "INFO",
  "Message": "Application started successfully.",
  "Details": {
    "Service": "AuthService",
    "Endpoint": "/api/login",
    "Response Code": 200,
    "Response Time": 54.21,
    "User": {
      "User ID": "user123",
      "Username": "kiana_anderson",
      "IP Address": "192.168.1.100"
    }
  }
}

You can represent each of the fields with JSONPath notation as follows:

"$.Source"                     // Source field
"$.Timestamp"                  // Timestamp field
"$['Log Level']"               // Log Level field
"$.Message"                    // Message field
"$.Details.Service"            // Service field
"$.Details.Endpoint"           // Endpoint field
"$.Details['Response Code']"   // Response Code field
"$.Details['Response Time']"   // Response Time field
"$.Details.User['User ID']"    // User ID field
"$.Details.User.Username"      // Username field
"$.Details.User['IP Address']" // IP Address field

11.2 - KQL docs navigation guide

Learn how to understand which version of KQL documentation you are viewing and how to switch to a different version.

The behavior of KQL may vary when using this language in different services. When you view any KQL documentation article by using our Learn website, the currently chosen service name is visible above the table of contents (TOC) under the Version dropdown. Switch between services using the version dropdown to see the KQL behavior for the selected service.

Change service selection

Screen capture of selecting a different version in the TOC.

HTTPS parameter view=

Applies to services

Most of the KQL articles have the words Applies to under their title. On the same line, there follows a handy listing of services with indicators of which services are relevant for this article. For example, a certain function could be applicable to Fabric and Azure Data Explorer, but not Azure Monitor or others. If you do not see the service you are using, most likely the article is not relevant to your service.

Versions

The following table describes the different versions of KQL and the services they are associated with.

VersionDescription
Microsoft FabricMicrosoft Fabric is an end-to-end analytics and data platform designed for enterprises that require a unified solution. It encompasses data movement, processing, ingestion, transformation, real-time event routing, and report building. Within the suite of experiences offered in Microsof Fabric, Real-Time Intelligence is a powerful service that empowers everyone in your organization to extract insights and visualize their data in motion. It offers an end-to-end solution for event-driven scenarios, streaming data, and data logs.

The main query environment for KQL in Microsoft Fabric is the KQL queryset.

KQL in Microsoft Fabric supports query operators, functions, and management commands.
Azure Data ExplorerAzure Data Explorer is a fully managed, high-performance, big data analytics platform that makes it easy to analyze high volumes of data in near real time. There are several query environments and integrations that can be used in Azure Data Explorer, including the web UI.

KQL in Azure Data Explorer is the full, native version, which supports all query operators, functions, and management commands.
Azure MonitorLog Analytics is a tool in the Azure portal that’s used to edit and run log queries against data in the Azure Monitor Logs store. You interact with Log Anlytics in a Log Analytics workspace in the Azure portal.

KQL in Azure Monitor uses a subset of the overall KQL operators and functions.
Microsoft SentinelMicrosoft Sentinel is a scalable, cloud-native security information and event management (SIEM) that delivers an intelligent and comprehensive solution for SIEM and security orchestration, automation, and response (SOAR). Microsoft Sentinel provides cyberthreat detection, investigation, response, and proactive hunting, with a bird’s-eye view across your enterprise. Microsoft Sentinel is built on top of the Azure Monitor service and it uses Azure Monitor’s Log Analytics workspaces to store all of its data.

KQL in Microsoft Sentinel uses a subset of the overall KQL operators and functions.

11.3 - Regex syntax

Learn about the regular expression syntax supported by Kusto Query Language (KQL).

This article provides an overview of regular expression syntax supported by Kusto Query Language (KQL).

There are a number of KQL operators and functions that perform string matching, selection, and extraction with regular expressions, such as matches regex, parse, and replace_regex().

In KQL, regular expressions must be encoded as string literals and follow the string quoting rules. For example, the regular expression \A is represented in KQL as "\\A". The extra backslash indicates that the other backslash is part of the regular expression \A.

Syntax

The following sections document the regular expression syntax supported by Kusto.

Match one character

PatternDescription
.Any character except new line (includes new line with s flag).
[0-9]Any ASCII digit.
[^0-9]Any character that isn’t an ASCII digit.
\dDigit (\p{Nd}).
\DNot a digit.
\pXUnicode character class identified by a one-letter name.
\p{Greek}Unicode character class (general category or script).
\PXNegated Unicode character class identified by a one-letter name.
\P{Greek}Negated Unicode character class (general category or script).

Character classes

PatternDescription
[xyz]Character class matching either x, y or z (union).
[^xyz]Character class matching any character except x, y, and z.
[a-z]Character class matching any character in range a-z.
[[:alpha:]]ASCII character class ([A-Za-z]).
[[:^alpha:]]Negated ASCII character class ([^A-Za-z]).
[x[^xyz]]Nested/grouping character class (matching any character except y and z).
[a-y&&xyz]Intersection (matching x or y).
[0-9&&[^4]]Subtraction using intersection and negation (matching 0-9 except 4).
[0-9--4]Direct subtraction (matching 0-9 except 4).
[a-g~~b-h]Symmetric difference (matching a and h only).
[\[\]]Escape in character classes (matching [ or ]).
[a&&b]Empty character class matching nothing.

Precedence in character classes is from most binding to least binding:

  1. Ranges: [a-cd] == [[a-c]d]
  2. Union: [ab&&bc] == [[ab]&&[bc]]
  3. Intersection, difference, symmetric difference: All have equivalent precedence, and are evaluated from left-to-right. For example, [\pL--\p{Greek}&&\p{Uppercase}] == [[\pL--\p{Greek}]&&\p{Uppercase}].
  4. Negation: [^a-z&&b] == [^[a-z&&b]].

Composites

PatternDescription
xyConcatenation (x followed by y)
x|yAlternation (x or y , prefer x)

Repetitions

PatternDescription
x*Zero or more of x (greedy)
x+One or more of x (greedy)
x?Zero or one of x (greedy)
x*?Zero or more of x (ungreedy/lazy)
x+?One or more of x (ungreedy/lazy)
x??Zero or one of x (ungreedy/lazy)
x{n,m}At least n x and at most m x (greedy)
x{n,}At least n x (greedy)
x{n}Exactly n x
x{n,m}?At least n x and at most m x (ungreedy/lazy)
x{n,}?At least n x (ungreedy/lazy)
x{n}?Exactly n x

Empty matches

PatternDescription
^Beginning of a haystack or start-of-line with multi-line mode.
$End of a haystack or end-of-line with multi-line mode.
\AOnly the beginning of a haystack, even with multi-line mode enabled.
\zOnly the end of a haystack, even with multi-line mode enabled.
\bUnicode word boundary with \w on one side and \W, \A, or \z on other.
\BNot a Unicode word boundary.
\b{start}, \<Unicode start-of-word boundary with \W|\A at the start of the string and \w on the other side.
\b{end}, \>Unicode end-of-word boundary with \w on one side and \W|\z at the end.
\b{start-half}Half of a Unicode start-of-word boundary with \W|\A at the beginning of the boundary.
\b{end-half}Half of a Unicode end-of-word boundary with \W|\z at the end.

Grouping and flags

PatternDescription
(exp)Numbered capture group (indexed by opening parenthesis).
(?P<name>exp)Named capture group (names must be alpha-numeric).
(?<name>exp)Named capture group (names must be alpha-numeric).
(?:exp)Non-capturing group.
(?flags)Set flags within current group.
(?flags:exp)Set flags for exp (non-capturing).

Capture group names can contain only alpha-numeric Unicode codepoints, dots ., underscores _, and square brackets[ and ]. Names must start with either an _ or an alphabetic codepoint. Alphabetic codepoints correspond to the Alphabetic Unicode property, while numeric codepoints correspond to the union of the Decimal_Number, Letter_Number and Other_Number general categories.

Flags are single characters. For example, (?x) sets the flag x and (?-x) clears the flag x. Multiple flags can be set or cleared at the same time: (?xy) sets both the x and y flags and (?x-y) sets the x flag and clears the y flag. By default all flags are disabled unless stated otherwise. They are:

FlagDescription
iCase-insensitive: letters match both upper and lower case.
mMulti-line mode: ^ and $ match begin/end of line.
sAllow dot (.). to match \n.
REnables CRLF mode: when multi-line mode is enabled, \r\n is used.
USwap the meaning of x* and x*?.
uUnicode support (enabled by default).
xVerbose mode, ignores whitespace and allow line comments (starting with #).

In verbose mode, whitespace is ignored everywhere, including within character classes. To insert whitespace, use its escaped form or a hex literal. For example, \ or \x20 for an ASCII space.

Escape sequences

PatternDescription
\*Literal *, applies to all ASCII except [0-9A-Za-z<>]
\aBell (\x07)
\fForm feed (\x0C)
\tHorizontal tab
\nNew line
\rCarriage return
\vVertical tab (\x0B)
\AMatches at the beginning of a haystack
\zMatches at the end of a haystack
\bWord boundary assertion
\BNegated word boundary assertion
\b{start}, \<Start-of-word boundary assertion
\b{end}, \>End-of-word boundary assertion
\b{start-half}Half of a start-of-word boundary assertion
\b{end-half}Half of an end-of-word boundary assertion
\123Octal character code, up to three digits
\x7FHex character code (exactly two digits)
\x{10FFFF}Hex character code corresponding to a Unicode code point
\u007FHex character code (exactly four digits)
\u{7F}Hex character code corresponding to a Unicode code point
\U0000007FHex character code (exactly eight digits)
\U{7F}Hex character code corresponding to a Unicode code point
\p{Letter}Unicode character class
\P{Letter}Negated Unicode character class
\d, \s, \wPerl character class
\D, \S, \WNegated Perl character class

Perl character classes (Unicode friendly)

These classes are based on the definitions provided in UTS#18:

PatternDescription
\dDdigit (\p{Nd})
\DNot digit
\sWhitespace (\p{White_Space})
\SNot whitespace
\wWord character (\p{Alphabetic} + \p{M} + \d + \p{Pc} + \p{Join_Control})
\WNot word character

ASCII character classes

These classes are based on the definitions provided in UTS#18:

PatternDescription
[[:alnum:]]Alphanumeric ([0-9A-Za-z])
[[:alpha:]]Alphabetic ([A-Za-z])
[[:ascii:]]ASCII ([\x00-\x7F])
[[:blank:]]Blank ([\t ])
[[:cntrl:]]Control ([\x00-\x1F\x7F])
[[:digit:]]Digits ([0-9])
[[:graph:]]Graphical ([!-~])
[[:lower:]]Lower case ([a-z])
[[:print:]]Printable ([ -~])
[[:punct:]]Punctuation ([!-/:-@\[-`{-~])
[[:space:]]Whitespace ([\t\n\v\f\r ])
[[:upper:]]Upper case ([A-Z])
[[:word:]]Word characters ([0-9A-Za-z_])
[[:xdigit:]]Hex digit ([0-9A-Fa-f])

Performance

This section provides some guidance on speed and resource usage of regex expressions.

Unicode can affect memory usage and search speed

KQL regex provides first class support for Unicode. In many cases, the extra memory required to support Unicode is negligible and doesn’t typically affect search speed.

The following are some examples of Unicode character classes that can affect memory usage and search speed:

  • Memory usage: The effect of Unicode primarily arises from the use of Unicode character classes. Unicode character classes tend to be larger in size. For example, the \w character class matches around 140,000 distinct codepoints by default. This requires more memory and can slow down regex compilation. If ASCII satisfies your requirements, use ASCII classes instead of Unicode classes. The ASCII-only version of \w can be expressed in multiple ways, all of which are equivalent.

    [0-9A-Za-z_]
    (?-u:\w)
    [[:word:]]
    [\w&&\p{ascii}]
    
  • Search speed: Unicode tends to be handled well, even when using large Unicode character classes. However, some of the faster internal regex engines can’t handle a Unicode aware word boundary assertion. So if you don’t need Unicode-aware word boundary assertions, you might consider using (?-u:\b) instead of \b. The (?-u:\b) uses an ASCII-only definition of a word character, which can improve search speed.

Literals can accelerate searches

KQL regex has a strong ability to recognize literals within a regex pattern, which can significantly speed up searches. If possible, including literals in your pattern can greatly improve search performance. For example, in the regex \w+@\w+, first occurrences of @ are matched and then a reverse match is performed for \w+ to find the starting position.

11.4 - Splunk to Kusto map

Learn how to write log queries in Kusto Query Language by comparing Splunk and Kusto Query Language concept mappings.

This article is intended to assist users who are familiar with Splunk learn the Kusto Query Language to write log queries with Kusto. Direct comparisons are made between the two to highlight key differences and similarities, so you can build on your existing knowledge.

Structure and concepts

The following table compares concepts and data structures between Splunk and Kusto logs:

ConceptSplunkKustoComment
deployment unitclusterclusterKusto allows arbitrary cross-cluster queries. Splunk doesn’t.
data cachesbucketscaching and retention policiesControls the period and caching level for the data. This setting directly affects the performance of queries and the cost of the deployment.
logical partition of dataindexdatabaseAllows logical separation of the data. Both implementations allow unions and joining across these partitions.
structured event metadataN/AtableSplunk doesn’t expose the concept of event metadata to the search language. Kusto logs have the concept of a table, which has columns. Each event instance is mapped to a row.
recordeventrowTerminology change only.
record attributefieldcolumnIn Kusto, this setting is predefined as part of the table structure. In Splunk, each event has its own set of fields.
typesdatatypedatatypeKusto data types are more explicit because they’re set on the columns. Both have the ability to work dynamically with data types and roughly equivalent set of datatypes, including JSON support.
query and searchsearchqueryConcepts essentially are the same between Kusto and Splunk.
event ingestion timesystem timeingestion_time()In Splunk, each event gets a system timestamp of the time the event was indexed. In Kusto, you can define a policy called ingestion_time that exposes a system column that can be referenced through the ingestion_time() function.

Functions

The following table specifies functions in Kusto that are equivalent to Splunk functions.

SplunkKustoComment
strcatstrcat()(1)
splitsplit()(1)
ififf()(1)
tonumbertodouble()
tolong()
toint()
(1)
upper
lower
toupper()
tolower()
(1)
replacereplace_string(), replace_strings() or replace_regex()(1)
Although replace functions take three parameters in both products, the parameters are different.
substrsubstring()(1)
Also note that Splunk uses one-based indices. Kusto notes zero-based indices.
tolowertolower()(1)
touppertoupper()(1)
matchmatches regex(2)
regexmatches regexIn Splunk, regex is an operator. In Kusto, it’s a relational operator.
searchmatch==In Splunk, searchmatch allows searching for the exact string.
randomrand()
rand(n)
Splunk’s function returns a number between zero to 231-1. Kusto’s returns a number between 0.0 and 1.0, or if a parameter is provided, between 0 and n-1.
nownow()(1)
relative_timetotimespan()(1)
In Kusto, Splunk’s equivalent of relative_time(datetimeVal, offsetVal) is datetimeVal + totimespan(offsetVal).
For example, search | eval n=relative_time(now(), "-1d@d") becomes ... | extend myTime = now() - totimespan("1d").

(1) In Splunk, the function is invoked by using the eval operator. In Kusto, it’s used as part of extend or project.
(2) In Splunk, the function is invoked by using the eval operator. In Kusto, it can be used with the where operator.

Operators

The following sections give examples of how to use different operators in Splunk and Kusto.

In Splunk, you can omit the search keyword and specify an unquoted string. In Kusto, you must start each query with find, an unquoted string is a column name, and the lookup value must be a quoted string.

ProductOperatorExample
Splunksearchsearch Session.Id="c8894ffd-e684-43c9-9125-42adc25cd3fc" earliest=-24h
Kustofindfind Session.Id=="c8894ffd-e684-43c9-9125-42adc25cd3fc" and ingestion_time()> ago(24h)

Filter

Kusto log queries start from a tabular result set in which filter is applied. In Splunk, filtering is the default operation on the current index. You also can use the where operator in Splunk, but we don’t recommend it.

ProductOperatorExample
SplunksearchEvent.Rule="330009.2" Session.Id="c8894ffd-e684-43c9-9125-42adc25cd3fc" _indextime>-24h
KustowhereOffice_Hub_OHubBGTaskError
| where Session_Id == "c8894ffd-e684-43c9-9125-42adc25cd3fc" and ingestion_time() > ago(24h)

Get n events or rows for inspection

Kusto log queries also support take as an alias to limit. In Splunk, if the results are ordered, head returns the first n results. In Kusto, limit isn’t ordered, but it returns the first n rows that are found.

ProductOperatorExample
SplunkheadEvent.Rule=330009.2
| head 100
KustolimitOffice_Hub_OHubBGTaskError
| limit 100

Get the first n events or rows ordered by a field or column

For the bottom results, in Splunk, you use tail. In Kusto, you can specify ordering direction by using asc.

ProductOperatorExample
SplunkheadEvent.Rule="330009.2"
| sort Event.Sequence
| head 20
KustotopOffice_Hub_OHubBGTaskError
| top 20 by Event_Sequence

Extend the result set with new fields or columns

Splunk has an eval function, but it’s not comparable to the eval operator in Kusto. Both the eval operator in Splunk and the extend operator in Kusto support only scalar functions and arithmetic operators.

ProductOperatorExample
SplunkevalEvent.Rule=330009.2
| eval state= if(Data.Exception = "0", "success", "error")
KustoextendOffice_Hub_OHubBGTaskError
| extend state = iff(Data_Exception == 0,"success" ,"error")

Rename

Kusto uses the project-rename operator to rename a field. In the project-rename operator, a query can take advantage of any indexes that are prebuilt for a field. Splunk has a rename operator that does the same.

ProductOperatorExample
SplunkrenameEvent.Rule=330009.2
| rename Date.Exception as execption
Kustoproject-renameOffice_Hub_OHubBGTaskError
| project-rename exception = Date_Exception

Format results and projection

Splunk uses the table command to select which columns to include in the results. Kusto has a project operator that does the same and more.

ProductOperatorExample
SplunktableEvent.Rule=330009.2
| table rule, state
KustoprojectOffice_Hub_OHubBGTaskError
| project exception, state

Splunk uses the fields - command to select which columns to exclude from the results. Kusto has a project-away operator that does the same.

ProductOperatorExample
Splunkfields -Event.Rule=330009.2
| fields - quota, hightest_seller
Kustoproject-awayOffice_Hub_OHubBGTaskError
| project-away exception, state

Aggregation

See the list of summarize aggregations functions that are available.

Splunk operatorSplunk exampleKusto operatorKusto example
statssearch (Rule=120502.*)
| stats count by OSEnv, Audience
summarizeOffice_Hub_OHubBGTaskError
| summarize count() by App_Platform, Release_Audience
evenstats...
| stats count_i by time, category
| eventstats sum(count_i) AS count_total by _time_
joinT2
| join kind=inner (T1) on _time
| project _time, category, count_i, count_total

Join

join in Splunk has substantial limitations. The subquery has a limit of 10,000 results (set in the deployment configuration file), and a limited number of join flavors are available.

ProductOperatorExample
SplunkjoinEvent.Rule=120103* &#124; stats by Client.Id, Data.Alias
| join Client.Id max=0 [search earliest=-24h Event.Rule="150310.0" Data.Hresult=-2147221040]
Kustojoincluster("OAriaPPT").database("Office PowerPoint").Office_PowerPoint_PPT_Exceptions
| where Data_Hresult== -2147221040
| join kind = inner (Office_System_SystemHealthMetadata
| summarize by Client_Id, Data_Alias)on Client_Id

Sort

The default sort order is ascending. To specify descending order, add a minus sign (-) before the field name. Kusto also supports defining where to put nulls, either at the beginning or at the end.

ProductOperatorExample
SplunksortEvent.Rule=120103
| sort -Data.Hresult
Kustoorder byOffice_Hub_OHubBGTaskError
| order by Data_Hresult, desc

Multivalue expand

The multivalue expand operator is similar in both Splunk and Kusto.

ProductOperatorExample
Splunkmvexpandmvexpand solutions
Kustomv-expandmv-expand solutions

Result facets, interesting fields

In Log Analytics in the Azure portal, only the first column is exposed. All columns are available through the API.

ProductOperatorExample
SplunkfieldsEvent.Rule=330009.2
| fields App.Version, App.Platform
KustofacetsOffice_Excel_BI_PivotTableCreate
| facet by App_Branch, App_Version

Deduplicate

In Kusto, you can use summarize arg_min() to reverse the order of which record is chosen.

ProductOperatorExample
SplunkdedupEvent.Rule=330009.2
| dedup device_id sortby -batterylife
Kustosummarize arg_max()Office_Excel_BI_PivotTableCreate
| summarize arg_max(batterylife, *) by device_id

11.5 - SQL to Kusto query translation

Learn about the Kusto Query Language equivalent of SQL queries.

If you’re familiar with SQL and want to learn KQL, translate SQL queries into KQL by prefacing the SQL query with a comment line, --, and the keyword explain. The output shows the KQL version of the query, which can help you understand the KQL syntax and concepts.

--
explain
SELECT COUNT_BIG(*) as C FROM StormEvents 

Output

Query
StormEvents<br>| summarize C=count()<br>| project C

SQL to Kusto cheat sheet

The following table shows sample queries in SQL and their KQL equivalents.

| Category | SQL Query | Kusto Query | Learn more | |–|–|–| | Select data from table | SELECT * FROM dependencies | dependencies | Tabular expression statements | | – | SELECT name, resultCode FROM dependencies | dependencies | project name, resultCode | project | | – | SELECT TOP 100 * FROM dependencies | dependencies | take 100 | take | | Null evaluation | SELECT * FROM dependencies
WHERE resultCode IS NOT NULL | dependencies
| where isnotnull(resultCode) | isnotnull() | | Comparison operators (date) | SELECT * FROM dependencies
WHERE timestamp > getdate()-1 | dependencies
| where timestamp > ago(1d) | ago() | | – | SELECT * FROM dependencies
WHERE timestamp BETWEEN ... AND ... | dependencies
| where timestamp between (datetime(2016-10-01) .. datetime(2016-11-01)) | between | | Comparison operators (string) | SELECT * FROM dependencies
WHERE type = "Azure blob" | dependencies
| where type == "Azure blob" | Logical operators | | – | -- substring
SELECT * FROM dependencies
WHERE type like "%blob%" | // substring
dependencies
| where type has "blob" | has | | – | -- wildcard
SELECT * FROM dependencies
WHERE type like "Azure%" | // wildcard
dependencies
| where type startswith "Azure"
// or
dependencies
| where type matches regex "^Azure.*" | startswith
matches regex | | Comparison (boolean) | SELECT * FROM dependencies
WHERE !(success) | dependencies
| where success == False | Logical operators | | Grouping, Aggregation | SELECT name, AVG(duration) FROM dependencies
GROUP BY name | dependencies
| summarize avg(duration) by name | summarize
avg() | | Distinct | SELECT DISTINCT name, type FROM dependencies | dependencies
| summarize by name, type | summarize
distinct | | – | SELECT name, COUNT(DISTINCT type)
FROM dependencies
GROUP BY name | dependencies
| summarize by name, type | summarize count() by name
// or approximate for large sets
dependencies
| summarize dcount(type) by name | count()
dcount() | | Column aliases, Extending | SELECT operationName as Name, AVG(duration) as AvgD FROM dependencies
GROUP BY name | dependencies
| summarize AvgD = avg(duration) by Name=operationName | Alias statement | | – | SELECT conference, CONCAT(sessionid, ' ' , session_title) AS session FROM ConferenceSessions | ConferenceSessions
| extend session=strcat(sessionid, " ", session_title)
| project conference, session | strcat()
project | | Ordering | SELECT name, timestamp FROM dependencies
ORDER BY timestamp ASC | dependencies
| project name, timestamp
| sort by timestamp asc nulls last | sort | | Top n by measure | SELECT TOP 100 name, COUNT(*) as Count FROM dependencies
GROUP BY name
ORDER BY Count DESC | dependencies
| summarize Count = count() by name
| top 100 by Count desc | top | | Union | SELECT * FROM dependencies
UNION
SELECT * FROM exceptions | union dependencies, exceptions | union | | – | SELECT * FROM dependencies
WHERE timestamp > ...
UNION
SELECT * FROM exceptions
WHERE timestamp > ... | dependencies
| where timestamp > ago(1d)
| union
(exceptions
| where timestamp > ago(1d)) | | | Join | SELECT * FROM dependencies
LEFT OUTER JOIN exceptions
ON dependencies.operation_Id = exceptions.operation_Id | dependencies
| join kind = leftouter
(exceptions)
on $left.operation_Id == $right.operation_Id | join | | Nested queries | SELECT * FROM dependencies
WHERE resultCode ==
(SELECT TOP 1 resultCode FROM dependencies
WHERE resultId = 7
ORDER BY timestamp DESC) | dependencies
| where resultCode == toscalar(
dependencies
| where resultId == 7
| top 1 by timestamp desc
| project resultCode) | toscalar | | Having | SELECT COUNT(\*) FROM dependencies
GROUP BY name
HAVING COUNT(\*) > 3 | dependencies
| summarize Count = count() by name
| where Count > 3 | summarize
where |

11.6 - Timezone

This article is about the timezones supported by the Internet Assigned Numbers Authority Time Zone Database (IANA).

The following is a list of timezones supported by the Internet Assigned Numbers Authority (IANA) Time Zone Database.

Related functions:

Timezone
Africa/Abidjan
Africa/Accra
Africa/Addis_Ababa
Africa/Algiers
Africa/Asmara
Africa/Asmera
Africa/Bamako
Africa/Bangui
Africa/Banjul
Africa/Bissau
Africa/Blantyre
Africa/Brazzaville
Africa/Bujumbura
Africa/Cairo
Africa/Casablanca
Africa/Ceuta
Africa/Conakry
Africa/Dakar
Africa/Dar_es_Salaam
Africa/Djibouti
Africa/Douala
Africa/El_Aaiun
Africa/Freetown
Africa/Gaborone
Africa/Harare
Africa/Johannesburg
Africa/Juba
Africa/Kampala
Africa/Khartoum
Africa/Kigali
Africa/Kinshasa
Africa/Lagos
Africa/Libreville
Africa/Lome
Africa/Luanda
Africa/Lubumbashi
Africa/Lusaka
Africa/Malabo
Africa/Maputo
Africa/Maseru
Africa/Mbabane
Africa/Mogadishu
Africa/Monrovia
Africa/Nairobi
Africa/Ndjamena
Africa/Niamey
Africa/Nouakchott
Africa/Ouagadougou
Africa/Porto-Novo
Africa/Sao_Tome
Africa/Timbuktu
Africa/Tripoli
Africa/Tunis
Africa/Windhoek
America/Adak
America/Anchorage
America/Anguilla
America/Antigua
America/Araguaina
America/Argentina/Buenos_Aires
America/Argentina/Catamarca
America/Argentina/ComodRivadavia
America/Argentina/Cordoba
America/Argentina/Jujuy
America/Argentina/La_Rioja
America/Argentina/Mendoza
America/Argentina/Rio_Gallegos
America/Argentina/Salta
America/Argentina/San_Juan
America/Argentina/San_Luis
America/Argentina/Tucuman
America/Argentina/Ushuaia
America/Aruba
America/Asuncion
America/Atikokan
America/Atka
America/Bahia
America/Bahia_Banderas
America/Barbados
America/Belem
America/Belize
America/Blanc-Sablon
America/Boa_Vista
America/Bogota
America/Boise
America/Buenos_Aires
America/Cambridge_Bay
America/Campo_Grande
America/Cancun
America/Caracas
America/Catamarca
America/Cayenne
America/Cayman
America/Chicago
America/Chihuahua
America/Coral_Harbour
America/Cordoba
America/Costa_Rica
America/Creston
America/Cuiaba
America/Curacao
America/Danmarkshavn
America/Dawson
America/Dawson_Creek
America/Denver
America/Detroit
America/Dominica
America/Edmonton
America/Eirunepe
America/El_Salvador
America/Ensenada
America/Fort_Nelson
America/Fort_Wayne
America/Fortaleza
America/Glace_Bay
America/Godthab
America/Goose_Bay
America/Grand_Turk
America/Grenada
America/Guadeloupe
America/Guatemala
America/Guayaquil
America/Guyana
America/Halifax
America/Havana
America/Hermosillo
America/Indiana/Indianapolis
America/Indiana/Knox
America/Indiana/Marengo
America/Indiana/Petersburg
America/Indiana/Tell_City
America/Indiana/Vevay
America/Indiana/Vincennes
America/Indiana/Winamac
America/Indianapolis
America/Inuvik
America/Iqaluit
America/Jamaica
America/Jujuy
America/Juneau
America/Kentucky/Louisville
America/Kentucky/Monticello
America/Knox_IN
America/Kralendijk
America/La_Paz
America/Lima
America/Los_Angeles
America/Louisville
America/Lower_Princes
America/Maceio
America/Managua
America/Manaus
America/Marigot
America/Martinique
America/Matamoros
America/Mazatlan
America/Mendoza
America/Menominee
America/Merida
America/Metlakatla
America/Mexico_City
America/Miquelon
America/Moncton
America/Monterrey
America/Montevideo
America/Montreal
America/Montserrat
America/Nassau
America/New_York
America/Nipigon
America/Nome
America/Noronha
America/North_Dakota/Beulah
America/North_Dakota/Center
America/North_Dakota/New_Salem
America/Nuuk
America/Ojinaga
America/Panama
America/Pangnirtung
America/Paramaribo
America/Phoenix
America/Port-au-Prince
America/Port_of_Spain
America/Porto_Acre
America/Porto_Velho
America/Puerto_Rico
America/Punta_Arenas
America/Rainy_River
America/Rankin_Inlet
America/Recife
America/Regina
America/Resolute
America/Rio_Branco
America/Rosario
America/Santa_Isabel
America/Santarem
America/Santiago
America/Santo_Domingo
America/Sao_Paulo
America/Scoresbysund
America/Shiprock
America/Sitka
America/St_Barthelemy
America/St_Johns
America/St_Kitts
America/St_Lucia
America/St_Thomas
America/St_Vincent
America/Swift_Current
America/Tegucigalpa
America/Thule
America/Thunder_Bay
America/Tijuana
America/Toronto
America/Tortola
America/Vancouver
America/Virgin
America/Whitehorse
America/Winnipeg
America/Yakutat
America/Yellowknife
Antarctica/Casey
Antarctica/Davis
Antarctica/DumontDUrville
Antarctica/Macquarie
Antarctica/Mawson
Antarctica/McMurdo
Antarctica/Palmer
Antarctica/Rothera
Antarctica/South_Pole
Antarctica/Syowa
Antarctica/Troll
Antarctica/Vostok
Arctic/Longyearbyen
Asia/Aden
Asia/Almaty
Asia/Amman
Asia/Anadyr
Asia/Aqtau
Asia/Aqtobe
Asia/Ashgabat
Asia/Ashkhabad
Asia/Atyrau
Asia/Baghdad
Asia/Bahrain
Asia/Baku
Asia/Bangkok
Asia/Barnaul
Asia/Beirut
Asia/Bishkek
Asia/Brunei
Asia/Kolkata
Asia/Chita
Asia/Choibalsan
Asia/Chongqing
Asia/Colombo
Asia/Dacca
Asia/Damascus
Asia/Dhaka
Asia/Dili
Asia/Dubai
Asia/Dushanbe
Asia/Famagusta
Asia/Gaza
Asia/Harbin
Asia/Hebron
Asia/Ho_Chi_Minh_City
Asia/Hong_Kong
Asia/Hovd
Asia/Irkutsk
Asia/Istanbul
Asia/Jakarta
Asia/Jayapura
Asia/Jerusalem
Asia/Kabul
Asia/Kamchatka
Asia/Karachi
Asia/Kashgar
Asia/Kathmandu
Asia/Katmandu
Asia/Khandyga
Asia/Kolkata
Asia/Krasnoyarsk
Asia/Kuala_Lumpur
Asia/Kuching
Asia/Kuwait
Asia/Macao Special Administrative Region
Asia/Magadan
Asia/Makassar
Asia/Manila
Asia/Muscat
Asia/Nicosia
Asia/Novokuznetsk
Asia/Novosibirsk
Asia/Omsk
Asia/Oral
Asia/Phnom_Penh
Asia/Pontianak
Asia/Pyongyang
Asia/Qatar
Asia/Qostanay
Asia/Qyzylorda
Asia/Yangon (Rangoon)
Asia/Riyadh
Asia/Sakhalin
Asia/Samarkand
Asia/Seoul
Asia/Shanghai
Asia/Singapore
Asia/Srednekolymsk
Asia/Taipei
Asia/Tashkent
Asia/Tbilisi
Asia/Tehran
Asia/Tel_Aviv
Asia/Thimbu
Asia/Thimphu
Asia/Tokyo
Asia/Tomsk
Asia/Ujung_Pandang
Asia/Ulaanbaatar
Asia/Ulan_Bator
Asia/Urumqi
Asia/Ust-Nera
Asia/Vientiane
Asia/Vladivostok
Asia/Yakutsk
Asia/Yangon
Asia/Yekaterinburg
Asia/Yerevan
Atlantic/Azores
Atlantic/Bermuda
Atlantic/Canary
Atlantic/Cape_Verde
Atlantic/Faeroe
Atlantic/Faroe
Atlantic/Jan_Mayen
Atlantic/Madeira
Atlantic/Reykjavik
Atlantic/South_Georgia
Atlantic/St_Helena
Atlantic/Stanley
Australia/ACT
Australia/Adelaide
Australia/Brisbane
Australia/Broken_Hill
Australia/Canberra
Australia/Currie
Australia/Darwin
Australia/Eucla
Australia/Hobart
Australia/LHI
Australia/Lindeman
Australia/Lord_Howe
Australia/Melbourne
Australia/NSW
Australia/North
Australia/Perth
Australia/Queensland
Australia/South
Australia/Sydney
Australia/Tasmania
Australia/Victoria
Australia/West
Australia/Yancowinna
Brazil/Acre
Brazil/DeNoronha
Brazil/East
Brazil/West
CET
CST6CDT
Canada/Atlantic
Canada/Central
Canada/Eastern
Canada/Mountain
Canada/Newfoundland
Canada/Pacific
Canada/Saskatchewan
Canada/Yukon
Chile/Continental
Chile/EasterIsland
Cuba
EET
EST
EST5EDT
Egypt
Eire
Etc/GMT
Etc/GMT+0
Etc/GMT+1
Etc/GMT+10
Etc/GMT+11
Etc/GMT+12
Etc/GMT+2
Etc/GMT+3
Etc/GMT+4
Etc/GMT+5
Etc/GMT+6
Etc/GMT+7
Etc/GMT+8
Etc/GMT+9
Etc/GMT-0
Etc/GMT-1
Etc/GMT-10
Etc/GMT-11
Etc/GMT-12
Etc/GMT-13
Etc/GMT-14
Etc/GMT-2
Etc/GMT-3
Etc/GMT-4
Etc/GMT-5
Etc/GMT-6
Etc/GMT-7
Etc/GMT-8
Etc/GMT-9
Etc/GMT0
Etc/Greenwich
Etc/UCT
Etc/UTC
Etc/Universal
Etc/Zulu
Europe/Amsterdam
Europe/Andorra
Europe/Astrakhan
Europe/Athens
Europe/Belfast
Europe/Belgrade
Europe/Berlin
Europe/Bratislava
Europe/Brussels
Europe/Bucharest
Europe/Budapest
Europe/Busingen
Europe/Chisinau
Europe/Copenhagen
Europe/Dublin
Europe/Gibraltar
Europe/Guernsey
Europe/Helsinki
Europe/Isle_of_Man
Europe/Istanbul
Europe/Jersey
Europe/Kaliningrad
Europe/Kyiv
Europe/Kirov
Europe/Lisbon
Europe/Ljubljana
Europe/London
Europe/Luxembourg
Europe/Madrid
Europe/Malta
Europe/Mariehamn
Europe/Minsk
Europe/Monaco
Europe/Moscow
Europe/Nicosia
Europe/Oslo
Europe/Paris
Europe/Podgorica
Europe/Prague
Europe/Riga
Europe/Rome
Europe/Samara
Europe/San_Marino
Europe/Sarajevo
Europe/Saratov
Europe/Simferopol
Europe/Skopje
Europe/Sofia
Europe/Stockholm
Europe/Tallinn
Europe/Tirane
Europe/Tiraspol
Europe/Ulyanovsk
Europe/Uzhgorod
Europe/Vaduz
Europe/Vatican
Europe/Vienna
Europe/Vilnius
Europe/Volgograd
Europe/Warsaw
Europe/Zagreb
Europe/Zaporozhye
Europe/Zurich
GB
GB-Eire
GMT
GMT+0
GMT-0
GMT0
Greenwich
HST
Hongkong
Iceland
Indian/Antananarivo
Indian/Chagos
Indian/Christmas
Indian/Cocos
Indian/Comoro
Indian/Kerguelen
Indian/Mahe
Indian/Maldives
Indian/Mauritius
Indian/Mayotte
Indian/Reunion
Iran
Israel
Jamaica
Japan
Kwajalein
Libya
MET
MST
MST7MDT
Mexico/BajaNorte
Mexico/BajaSur
Mexico/General
NZ
NZ-CHAT
Navajo
PRC
PST8PDT
Pacific/Apia
Pacific/Auckland
Pacific/Bougainville
Pacific/Chatham
Pacific/Chuuk
Pacific/Easter
Pacific/Efate
Pacific/Enderbury
Pacific/Fakaofo
Pacific/Fiji
Pacific/Funafuti
Pacific/Galapagos
Pacific/Gambier
Pacific/Guadalcanal
Pacific/Guam
Pacific/Honolulu
Pacific/Johnston
Pacific/Kanton
Pacific/Kiritimati
Pacific/Kosrae
Pacific/Kwajalein
Pacific/Majuro
Pacific/Marquesas
Pacific/Midway
Pacific/Nauru
Pacific/Niue
Pacific/Norfolk
Pacific/Noumea
Pacific/Pago_Pago
Pacific/Palau
Pacific/Pitcairn
Pacific/Pohnpei
Pacific/Ponape
Pacific/Port_Moresby
Pacific/Rarotonga
Pacific/Saipan
Pacific/Samoa
Pacific/Tahiti
Pacific/Tarawa
Pacific/Tongatapu
Pacific/Truk
Pacific/Wake
Pacific/Wallis
Pacific/Yap
Poland
Portugal
ROK
Singapore
Türkiye
UCT
US/Alaska
US/Aleutian
US/Arizona
US/Central
US/East-Indiana
US/Eastern
US/Hawaii
US/Indiana-Starke
US/Michigan
US/Mountain
US/Pacific
US/Samoa
UTC
Universal
W-SU
WET
Zulu

12 - Scalar functions

12.1 - abs()

Learn how to use the abs() function to calculate the absolute value of an input.

Calculates the absolute value of the input.

Syntax

abs(x)

Parameters

NameTypeRequiredDescription
xint, real, or timespan✔️The value to make absolute.

Returns

Absolute value of x.

Example

print abs(-5)

Output

print_0
5

12.2 - acos()

Learn how to use the acos() function to calculate the angle of the cosine input.

Calculates the angle whose cosine is the specified number. Inverse operation of cos().

Syntax

acos(x)

Parameters

NameTypeRequiredDescription
xreal✔️The value used to calculate the arc cosine.

Returns

The value of the arc cosine of x. The return value is null if x < -1 or x > 1.

12.3 - ago()

Learn how to use the ago() function to subtract a given timespan from the current UTC clock time.

Subtracts the given timespan from the current UTC time.

Like now(), if you use ago() multiple times in a single query statement, the current UTC time being referenced is the same across all uses.

Syntax

ago(timespan)

Parameters

NameTypeRequiredDescription
timespantimespan✔️The interval to subtract from the current UTC clock time now(). For a full list of possible timespan values, see timespan literals.

Returns

A datetime value equal to the current time minus the timespan.

Example

All rows with a timestamp in the past hour:

T | where Timestamp > ago(1h)

12.4 - around() function

Learn how to use the around() function to indicate if the first argument is within a range around the center value.

Creates a bool value indicating if the first argument is within a range around the center value.

Syntax

around(value,center,delta)

Parameters

NameTypeRequiredDescription
valueint, long, real, datetime, or timespan✔️The value to compare to the center.
centerint, long, real, datetime, or timespan✔️The center of the range defined as [(center-delta) .. (center + delta)].
deltaint, long, real, datetime, or timespan✔️The delta value of the range defined as [(center-delta) .. (center + delta)].

Returns

Returns true if the value is within the range, false if the value is outside the range. Returns null if any of the arguments is null.

Example: Filtering values around a specific timestamp

The following example filters rows around specific timestamp.

range dt 
    from datetime(2021-01-01 01:00) 
    to datetime(2021-01-01 02:00) 
    step 1min
| where around(dt, datetime(2021-01-01 01:30), 1min)

Output

dt
2021-01-01 01:29:00.0000000
2021-01-01 01:30:00.0000000
2021-01-01 01:31:00.0000000

12.5 - array_concat()

Learn how to use the array_concat() function to concatenate many dynamic arrays to a single array.

Concatenates many dynamic arrays to a single array.

Syntax

array_concat(arr [, …])

Parameters

NameTypeRequiredDescription
arrdynamic✔️The arrays to concatenate into a dynamic array.

Returns

Returns a dynamic array of all input arrays.

Example

The following example shows concatenated arrays.

range x from 1 to 3 step 1
| extend y = x * 2
| extend z = y * 2
| extend a1 = pack_array(x,y,z), a2 = pack_array(x, y)
| project array_concat(a1, a2)

Output

Column1
[1,2,4,1,2]
[2,4,8,2,4]
[3,6,12,3,6]

12.6 - array_iff()

Learn how to use the array_iff() function to scan and evaluate elements in an array.

Element-wise iif function on dynamic arrays.

Syntax

array_iff(condition_array, when_true, when_false)

Parameters

NameTypeRequiredDescription
condition_arraydynamic✔️An array of boolean or numeric values.
when_truedynamic or scalar✔️An array of values or primitive value. This will be the result when condition_array is true.
when_falsedynamic or scalar✔️An array of values or primitive value. This will be the result when condition_array is false.

Returns

Returns a dynamic array of the values taken either from the when_true or when_false array values, according to the corresponding value of the condition array.

Examples

print condition=dynamic([true,false,true]), if_true=dynamic([1,2,3]), if_false=dynamic([4,5,6]) 
| extend res= array_iff(condition, if_true, if_false)

Output

conditionif_trueif_falseres
[true, false, true][1, 2, 3][4, 5, 6][1, 5, 3]

Numeric condition values

print condition=dynamic([1,0,50]), if_true="yes", if_false="no" 
| extend res= array_iff(condition, if_true, if_false)

Output

conditionif_trueif_falseres
[1, 0, 50]yesno[yes, no, yes]

Non-numeric and non-boolean condition values

print condition=dynamic(["some string value", datetime("01-01-2022"), null]), if_true=1, if_false=0
| extend res= array_iff(condition, if_true, if_false)

Output

conditionif_trueif_falseres
[true, false, true]10[null, null, null]

Mismatched array lengths

print condition=dynamic([true,true,true]), if_true=dynamic([1,2]), if_false=dynamic([3,4]) 
| extend res= array_iff(condition, if_true, if_false)

Output

conditionif_trueif_falseres
[true, true, true][1, 2][3, 4][1, 2, null]

12.7 - array_index_of()

Learn how to use the array_index_of() function to search an array for a specified item, and return its position.

Searches an array for the specified item, and returns its position.

Syntax

array_index_of(array, value [, start [, length [, occurence ]]])

Parameters

NameTypeRequiredDescription
arraydynamic✔️The array to search.
valuelong, int, datetime, timespan, string, guid, or bool✔️The value to lookup.
startintThe search start position. A negative value will offset the starting search value from the end of the array by abs(start) steps.
lengthintThe number of values to examine. A value of -1 means unlimited length.
occurrenceintThe number of the occurrence. The default is 1.

Returns

Returns a zero-based index position of lookup. Returns -1 if the value isn’t found in the array. Returns null for irrelevant inputs (occurrence < 0 or length < -1).

Example

The following example shows the position number of specific words within the array.

let arr=dynamic(["this", "is", "an", "example", "an", "example"]);
print
 idx1 = array_index_of(arr,"an")    // lookup found in input string
 , idx2 = array_index_of(arr,"example",1,3) // lookup found in researched range 
 , idx3 = array_index_of(arr,"example",1,2) // search starts from index 1, but stops after 2 values, so lookup can't be found
 , idx4 = array_index_of(arr,"is",2,4) // search starts after occurrence of lookup
 , idx5 = array_index_of(arr,"example",2,-1)  // lookup found
 , idx6 = array_index_of(arr, "an", 1, -1, 2)   // second occurrence found in input range
 , idx7 = array_index_of(arr, "an", 1, -1, 3)   // no third occurrence in input array
 , idx8 = array_index_of(arr, "an", -3)   // negative start index will look at last 3 elements
 , idx9 = array_index_of(arr, "is", -4)   // negative start index will look at last 3 elements

Output

idx1idx2idx3idx4idx5idx6idx7idx8idx9
23-1-134-14-1

Use set_has_element(arr, value) to check whether a value exists in an array. This function will improve the readability of your query. Both functions have the same performance.

12.8 - array_length()

Learn how to use the array_length() function to calculate the number of elements in a dynamic array.

Calculates the number of elements in a dynamic array.

Syntax

array_length(array)

Parameters

NameTypeRequiredDescription
arraydynamic✔️The array for which to calculate length.

Returns

Returns the number of elements in array, or null if array isn’t an array.

Examples

The following example shows the number of elements in the array.

print array_length(dynamic([1, 2, 3, "four"]))

Output

print_0
4

12.9 - array_reverse()

Learn how to use the array_reverse() function to reverse the order of the elements in a dynamic array.

Reverses the order of the elements in a dynamic array.

Syntax

array_reverse(value)

Parameters

NameTypeRequiredDescription
valuedynamic✔️The array to reverse.

Returns

Returns an array that contains the same elements as the input array in reverse order.

Example

This example shows an array of words reversed.

print arr=dynamic(["this", "is", "an", "example"]) 
| project Result=array_reverse(arr)

Output

Result
[“example”,“an”,“is”,“this”]

12.10 - array_rotate_left()

Learn how to use the array_rotate_left() function to rotate values inside a dynamic array to the left.

Rotates values inside a dynamic array to the left.

Syntax

array_rotate_left(array, rotate_count)

Parameters

NameTypeRequiredDescription
arraydynamic✔️The array to rotate.
rotate_countinteger✔️The number of positions that array elements will be rotated to the left. If the value is negative, the elements will be rotated to the right.

Returns

Dynamic array containing the same elements as the original array with each element rotated according to rotate_count.

Examples

Rotating to the left by two positions:

print arr=dynamic([1,2,3,4,5])
| extend arr_rotated=array_rotate_left(arr, 2)

Output

arrarr_rotated
[1,2,3,4,5][3,4,5,1,2]

Rotating to the right by two positions by using negative rotate_count value:

print arr=dynamic([1,2,3,4,5])
| extend arr_rotated=array_rotate_left(arr, -2)

Output

arrarr_rotated
[1,2,3,4,5][4,5,1,2,3]

12.11 - array_rotate_right()

Learn how to use the array_rotate_right() function to rotate values inside a dynamic array to the right.

Rotates values inside a dynamic array to the right.

Syntax

array_rotate_right(array, rotate_count)

Parameters

NameTypeRequiredDescription
arraydynamic✔️The array to rotate.
rotate_countinteger✔️The number of positions that array elements will be rotated to the right. If the value is negative, the elements will be rotated to the Left.

Returns

Dynamic array containing the same elements as the original array with each element rotated according to rotate_count.

Examples

Rotating to the right by two positions:

print arr=dynamic([1,2,3,4,5])
| extend arr_rotated=array_rotate_right(arr, 2)

Output

arrarr_rotated
[1,2,3,4,5][4,5,1,2,3]

Rotating to the left by two positions by using negative rotate_count value:

Results

print arr=dynamic([1,2,3,4,5])
| extend arr_rotated=array_rotate_right(arr, -2)

Output

arrarr_rotated
[1,2,3,4,5][3,4,5,1,2]

12.12 - array_shift_left()

Learn how to use the array_shift_left() function to shift the values inside a dynamic array to the left.

Shifts the values inside a dynamic array to the left.

Syntax

array_shift_left(array, shift_count [, default_value ])

Parameters

NameTypeRequiredDescription
arraydynamic✔️The array to shift.
shift_countint✔️The number of positions that array elements are shifted to the left. If the value is negative, the elements are shifted to the right.
default_valuescalarThe value used for an element that was shifted and removed. The default is null or an empty string depending on the type of elements in the array.

Returns

Returns a dynamic array containing the same number of elements as in the original array. Each element has been shifted according to shift_count. New elements that are added in place of removed elements have a value of default_value.

Examples

Shifting to the left by two positions:

print arr=dynamic([1,2,3,4,5])
| extend arr_shift=array_shift_left(arr, 2)

Output

arrarr_shift
[1,2,3,4,5][3,4,5,null,null]

Shifting to the left by two positions and adding default value:

print arr=dynamic([1,2,3,4,5])
| extend arr_shift=array_shift_left(arr, 2, -1)

Output

arrarr_shift
[1,2,3,4,5][3,4,5,-1,-1]

Shifting to the right by two positions by using negative shift_count value:

print arr=dynamic([1,2,3,4,5])
| extend arr_shift=array_shift_left(arr, -2, -1)

Output

arrarr_shift
[1,2,3,4,5][-1,-1,1,2,3]

12.13 - array_shift_right()

Learn how to use the array_shift_right() function to shift values inside a dynamic array to the right.

Shifts the values inside a dynamic array to the right.

Syntax

array_shift_right(array, shift_count [, default_value ])

Parameters

NameTypeRequiredDescription
arraydynamic✔️The array to shift.
shift_countint✔️The number of positions that array elements are shifted to the right. If the value is negative, the elements are shifted to the left.
default_valuescalarThe value used for an element that was shifted and removed. The default is null or an empty string depending on the type of elements in the array.

Returns

Returns a dynamic array containing the same amount of the elements as in the original array. Each element has been shifted according to shift_count. New elements that are added instead of the removed elements have a value of default_value.

Examples

Shifting to the right by two positions:

print arr=dynamic([1,2,3,4,5])
| extend arr_shift=array_shift_right(arr, 2)

Output

arrarr_shift
[1,2,3,4,5][null,null,1,2,3]

Shifting to the right by two positions and adding a default value:

print arr=dynamic([1,2,3,4,5])
| extend arr_shift=array_shift_right(arr, 2, -1)

Output

arrarr_shift
[1,2,3,4,5][-1,-1,1,2,3]

Shifting to the left by two positions by using a negative shift_count value:

print arr=dynamic([1,2,3,4,5])
| extend arr_shift=array_shift_right(arr, -2, -1)

Output

arrarr_shift
[1,2,3,4,5][3,4,5,-1,-1]

12.14 - array_slice()

Learn how to use the array_slice() function to extract a slice of a dynamic array.

Extracts a slice of a dynamic array.

Syntax

array_slice(array, start, end)

Parameters

NameTypeRequiredDescription
arraydynamic✔️The array from which to extract the slice.
startint✔️The start index of the slice (inclusive). Negative values are converted to array_length+start.
endint✔️The last index of the slice. (inclusive). Negative values are converted to array_length+end.

Returns

Returns a dynamic array of the values in the range [start..end] from array.

Examples

The following examples return a slice of the array.

print arr=dynamic([1,2,3]) 
| extend sliced=array_slice(arr, 1, 2)

Output

arrsliced
[1,2,3][2,3]
print arr=dynamic([1,2,3,4,5]) 
| extend sliced=array_slice(arr, 2, -1)

Output

arrsliced
[1,2,3,4,5][3,4,5]
print arr=dynamic([1,2,3,4,5]) 
| extend sliced=array_slice(arr, -3, -2)

Output

arrsliced
[1,2,3,4,5][3,4]

12.15 - array_sort_asc()

Learn how to use the array_sort_asc() function to sort arrays in ascending order.

Receives one or more arrays. Sorts the first array in ascending order. Orders the remaining arrays to match the reordered first array.

Syntax

array_sort_asc(array1[, …, arrayN][, nulls_last])

If nulls_last isn’t provided, a default value of true is used.

Parameters

NameTypeRequiredDescription
array1…arrayNdynamic✔️The array or list of arrays to sort.
nulls_lastboolDetermines whether nulls should be last.

Returns

Returns the same number of arrays as in the input, with the first array sorted in ascending order, and the remaining arrays ordered to match the reordered first array.

null is returned for every array that differs in length from the first one.

An array which contains elements of different types, is sorted in the following order:

  • Numeric, datetime, and timespan elements
  • String elements
  • Guid elements
  • All other elements

Examples

The examples in this section show how to use the syntax to help you get started.

Sort two arrays

The following example sorts the initial array, array1, in ascending order. It then sorts array2 to match the new order of array1.

let array1 = dynamic([1,3,4,5,2]);
let array2 = dynamic(["a","b","c","d","e"]);
print array_sort_asc(array1,array2)

Output

array1_sortedarray2_sorted
[1,2,3,4,5][“a”,“e”,“b”,“c”,“d”]

Sort substrings

The following example sorts a list of names in ascending order. It saves a list of names to a variable, Names, which is then splits into an array and sorted in ascending order. The query returns the names in ascending order.

let Names = "John,Paul,Jane,Kao";
let SortedNames = strcat_array(array_sort_asc(split(Names, ",")), ",");
print result = SortedNames

Output

result
Jane,John,Kao,Paul

Combine summarize and array_sort_asc

The following example uses the summarize operator and the array_sort_asc function to organize and sort commands by user in chronological order.

datatable(command:string, command_time:datetime, user_id:string)
[
    'chmod',   datetime(2019-07-15),   "user1",
    'ls',      datetime(2019-07-02),   "user1",
    'dir',     datetime(2019-07-22),   "user1",
    'mkdir',   datetime(2019-07-14),   "user1",
    'rm',      datetime(2019-07-27),   "user1",
    'pwd',     datetime(2019-07-25),   "user1",
    'rm',      datetime(2019-07-23),   "user2",
    'pwd',     datetime(2019-07-25),   "user2",
]
| summarize timestamps = make_list(command_time), commands = make_list(command) by user_id
| project user_id, commands_in_chronological_order = array_sort_asc(timestamps, commands)[1]

Output

user_idcommands_in_chronological_order
user1[
“ls”,
“mkdir”,
“chmod”,
“dir”,
“pwd”,
“rm”
]
user2[
“rm”,
“pwd”
]

Control location of null values

By default, null values are put last in the sorted array. However, you can control it explicitly by adding a bool value as the last argument to array_sort_asc().

The following example shows the default behavior:

print result=array_sort_asc(dynamic([null,"blue","yellow","green",null]))

Output

result
[“blue”,“green”,“yellow”,null,null]

The following example shows nondefault behavior using the false parameter, which specifies that nulls are placed at the beginning of the array.

print result=array_sort_asc(dynamic([null,"blue","yellow","green",null]), false)

Output

result
[null,null,“blue”,“green”,“yellow”]

12.16 - array_sort_desc()

Learn how to use the array_sort_desc() function to sort arrays in descending order.

Receives one or more arrays. Sorts the first array in descending order. Orders the remaining arrays to match the reordered first array.

Syntax

array_sort_desc(array1[, …, argumentN])

array_sort_desc(array1[, …, argumentN],nulls_last)

If nulls_last isn’t provided, a default value of true is used.

Parameters

NameTypeRequiredDescription
array1…arrayNdynamic✔️The array or list of arrays to sort.
nulls_lastboolDetermines whether nulls should be last.

Returns

Returns the same number of arrays as in the input, with the first array sorted in ascending order, and the remaining arrays ordered to match the reordered first array.

null is returned for every array that differs in length from the first one.

An array which contains elements of different types, is sorted in the following order:

  • Numeric, datetime, and timespan elements
  • String elements
  • Guid elements
  • All other elements

Examples

The examples in this section show how to use the syntax to help you get started.

Sort two arrays

The following example sorts the initial array, array1, in descending order. It then sorts array2 to match the new order of array1.

let array1 = dynamic([1,3,4,5,2]);
let array2 = dynamic(["a","b","c","d","e"]);
print array_sort_desc(array1,array2)

Output

array1_sortedarray2_sorted
[5,4,3,2,1][“d”,“c”,“b”,“e”,“a”]

Sort substrings

The following example sorts a list of names in descending order. It saves a list of names to a variable, Names, which is then splits into an array and sorted in descending order. The query returns the names in descending order.

let Names = "John,Paul,Jane,Kayo";
let SortedNames = strcat_array(array_sort_desc(split(Names, ",")), ",");
print result = SortedNames

Output

result
Paul,Kayo,John,Jane

Combine summarize and array_sort_desc

The following example uses the summarize operator and the array_sort_asc function to organize and sort commands by user in descending chronological order.

datatable(command:string, command_time:datetime, user_id:string)
[
    'chmod',   datetime(2019-07-15),   "user1",
    'ls',      datetime(2019-07-02),   "user1",
    'dir',     datetime(2019-07-22),   "user1",
    'mkdir',   datetime(2019-07-14),   "user1",
    'rm',      datetime(2019-07-27),   "user1",
    'pwd',     datetime(2019-07-25),   "user1",
    'rm',      datetime(2019-07-23),   "user2",
    'pwd',     datetime(2019-07-25),   "user2",
]
| summarize timestamps = make_list(command_time), commands = make_list(command) by user_id
| project user_id, commands_in_chronological_order = array_sort_desc(timestamps, commands)[1]

Output

user_idcommands_in_chronological_order
user1[
“rm”,
“pwd”,
“dir”,
“chmod”,
“mkdir”,
“ls”
]
user2[
“pwd”,
“rm”
]

Control location of null values

By default, null values are put last in the sorted array. However, you can control it explicitly by adding a bool value as the last argument to array_sort_asc().

The following example shows the default behavior:

print result=array_sort_desc(dynamic([null,"blue","yellow","green",null]))

Output

result
[“yellow”,“green”,“blue”,null,null]

The following example shows nondefault behavior using the false parameter, which specifies that nulls are placed at the beginning of the array.

print result=array_sort_desc(dynamic([null,"blue","yellow","green",null]), false)

Output

result
[null,null,“yellow”,“green”,“blue”]

12.17 - array_split()

Learn how to use the array_split() function to split an array into multiple arrays.

Splits an array to multiple arrays according to the split indices and packs the generated array in a dynamic array.

Syntax

array_split(array, index)

Parameters

NameTypeRequiredDescription
arraydynamic✔️The array to split.
indexint or dynamic✔️An integer or dynamic array of integers used to indicate the location at which to split the array. The start index of arrays is zero. Negative values are converted to array_length + value.

Returns

Returns a dynamic array containing N+1 arrays with the values in the range [0..i1), [i1..i2), ... [iN..array_length) from array, where N is the number of input indices and i1...iN are the indices.

Examples

This following example shows how to split and array.

print arr=dynamic([1,2,3,4,5]) 
| extend arr_split=array_split(arr, 2)

Output

arrarr_split
[1,2,3,4,5][[1,2],[3,4,5]]
print arr=dynamic([1,2,3,4,5]) 
| extend arr_split=array_split(arr, dynamic([1,3]))

Output

arrarr_split
[1,2,3,4,5][[1],[2,3],[4,5]]

12.18 - array_sum()

Learn how to use the array_sum() function to calculate the sum of elements in a dynamic array.

Calculates the sum of elements in a dynamic array.

Syntax

array_sum(array)

Parameters

NameTypeRequiredDescription
arraydynamic✔️The array to sum.

Returns

Returns a double type value with the sum of the elements of the array.

Example

This following example shows the sum of an array.

print arr=dynamic([1,2,3,4]) 
| extend arr_sum=array_sum(arr)

Output

arrarr_sum
[1,2,3,4]10

12.19 - asin()

Learn how to use the asin() function to calculate the angle from a sine input.

Calculates the angle whose sine is the specified number, or the arc sine. This is the inverse operation of sin().

Syntax

asin(x)

Parameters

NameTypeRequiredDescription
xreal✔️A real number in range [-1, 1] used to calculate the arc sine.

Returns

Returns the value of the arc sine of x. Returns null if x < -1 or x > 1.

Example

asin(0.5)

Output

result
1.2532358975033751

12.20 - assert()

Learn how to use the assert() function to check for a condition and output an error message when false.

Checks for a condition. If the condition is false, outputs error messages and fails the query.

Syntax

assert(condition,message)

Parameters

NameTypeRequiredDescription
conditionbool✔️The conditional expression to evaluate. The condition must be evaluated to constant during the query analysis phase.
messagestring✔️The message used if assertion is evaluated to false.

Returns

Returns true if the condition is true. Raises a semantic error if the condition is evaluated to false.

Examples

The following query defines a function checkLength() that checks input string length, and uses assert to validate input length parameter (checks that it’s greater than zero).

let checkLength = (len:long, s:string)
{
    assert(len > 0, "Length must be greater than zero") and
    strlen(s) > len
};
datatable(input:string)
[
    '123',
    '4567'
]
| where checkLength(len=long(-1), input)

Running this query yields an error: assert() has failed with message: 'Length must be greater than zero'

Example of running with valid len input:

let checkLength = (len:long, s:string)
{
    assert(len > 0, "Length must be greater than zero") and strlen(s) > len
};
datatable(input:string)
[
    '123',
    '4567'
]
| where checkLength(len=3, input)

Output

input
4567

The following query will always fail, demonstrating that the assert function gets evaluated even though the where b operator returns no data when b is false:

let b=false;
print x="Hello"
| where b
| where assert(b, "Assertion failed")

12.21 - atan()

Learn how to use the atan() function to return the inverse operation of tan().

Returns the angle whose tangent is the specified number. This is the inverse operation of tan().

Syntax

atan(x)

Parameters

NameTypeRequiredDescription
xreal✔️The number used to calculate the arc tangent.

Returns

The value of the arc tangent of x.

Example

atan(0.5)

Output

result
0.46364760900080609

12.22 - atan2()

Learn how to use the atan2() function to calculate an angle in radians between axes.

Calculates the angle, in radians, between the positive x-axis and the ray from the origin to the point (y, x).

Syntax

atan2(y,x)

Parameters

NameTypeRequiredDescription
yreal✔️The Y coordinate.
xreal✔️The X coordinate.

Returns

Returns the angle in radians between the positive x-axis and the ray from the origin to the point (y, x).

Examples

The following example returns the angle measurements in radians.

print atan2_0 = atan2(1,1) // Pi / 4 radians (45 degrees)
| extend atan2_1 = atan2(0,-1) // Pi radians (180 degrees)
| extend atan2_2 = atan2(-1,0) // - Pi / 2 radians (-90 degrees)

Output

atan2_0atan2_1atan2_2
0.7853981633974483.14159265358979-1.5707963267949

12.23 - bag_has_key()

Learn how to use the bag_has_key() function to check if a dynamic property bag object contains a given key.

Checks whether a dynamic property bag object contains a given key.

Syntax

bag_has_key(bag,key)

Parameters

NameTypeRequiredDescription
bagdynamic✔️The property bag to search.
keystring✔️The key for which to search. Search for a nested key using the JSONPath notation. Array indexing isn’t supported.

Returns

True or false depending on if the key exists in the bag.

Examples

datatable(input: dynamic)
[
    dynamic({'key1' : 123, 'key2': 'abc'}),
    dynamic({'key1' : 123, 'key3': 'abc'}),
]
| extend result = bag_has_key(input, 'key2')

Output

inputresult
{
“key1”: 123,
“key2”: “abc”
}
true
{
“key1”: 123,
“key3”: “abc”
}
false

Search using a JSONPath key

datatable(input: dynamic)
[
    dynamic({'key1': 123, 'key2': {'prop1' : 'abc', 'prop2': 'xyz'}, 'key3': [100, 200]}),
]
| extend result = bag_has_key(input, '$.key2.prop1')

Output

inputresult
{
“key1”: 123,
“key2”: {
“prop1”: “abc”,
“prop2”: “xyz”
},
“key3”: [
100,
200
]
}
true

12.24 - bag_keys()

Learn how to use the bag_keys() function to enumerate the root keys in a dynamic property bag object.

Enumerates all the root keys in a dynamic property bag object.

Syntax

bag_keys(object)

Parameters

NameTypeRequiredDescription
objectdynamic✔️The property bag object for which to enumerate keys.

Returns

An array of keys, order is undetermined.

Example

datatable(index:long, d:dynamic) [
    1, dynamic({'a':'b', 'c':123}), 
    2, dynamic({'a':'b', 'c':{'d':123}}),
    3, dynamic({'a':'b', 'c':[{'d':123}]}),
    4, dynamic(null),
    5, dynamic({}),
    6, dynamic('a'),
    7, dynamic([])
]
| extend keys = bag_keys(d)

Output

indexdkeys
1{
“a”: “b”,
“c”: 123
}
[
“a”,
“c”
]
2{
“a”: “b”,
“c”: {
“d”: 123
}
}
[
“a”,
“c”
]
3{
“a”: “b”,
“c”: [
{
“d”: 123
}
]
}
[
“a”,
“c”
]
4
5{}[]
6a
7[]

12.25 - bag_merge()

Learn how to use the bag_merge() function to merge property bags.

The function merges multiple dynamic property bags into a single dynamic property bag object, consolidating all properties from the input bags.

Syntax

bag_merge(bag1,bag2[,*bag3*, ...])

Parameters

NameTypeRequiredDescription
bag1…bagNdynamic✔️The property bags to merge. The function accepts between 2 to 64 arguments.

Returns

A dynamic property bag containing the merged results of all input property bags. If a key is present in multiple input bags, the value associated with the key from the leftmost argument takes precedence.

Example

print result = bag_merge(
   dynamic({'A1':12, 'B1':2, 'C1':3}),
   dynamic({'A2':81, 'B2':82, 'A1':1}))

Output

result
{
“A1”: 12,
“B1”: 2,
“C1”: 3,
“A2”: 81,
“B2”: 82
}

12.26 - bag_pack_columns()

Learn how to use the bag_pack_columns() function to create a dynamic JSON object from a list of columns.

Creates a dynamic property bag object from a list of columns.

Syntax

bag_pack_columns(column1, column2,... )

Parameters

NameTypeRequiredDescription
columnscalar✔️A column to pack. The name of the column is the property name in the property bag.

Returns

Returns a dynamic property bag object from the listed columns.

Examples

The following example creates a property bag that includes the Id and Value columns:

datatable(Id: string, Value: string, Other: long)
[
    "A", "val_a", 1,
    "B", "val_b", 2,
    "C", "val_c", 3
]
| extend Packed = bag_pack_columns(Id, Value)
IdValueOtherPacked
Aval_a1{
“Id”: “A”,
“Value”: “val_a”
}
Bval_b2{
“Id”: “B”,
“Value”: “val_b”
}
Cval_c3{
“Id”: “C”,
“Value”: “val_c”
}

|C|val_c|3|{
“Id”: “C”,
“Value”: “val_c”
}|

12.27 - bag_pack()

Learn how to use the bag_pack() function to create a dynamic JSON object from a list of keys and values.

Creates a dynamic property bag object from a list of keys and values.

Syntax

bag_pack(key1, value1, key2, value2,... )

Parameters

NameTypeRequiredDescription
keystring✔️The key name.
valueany scalar data type✔️The key value.

Returns

Returns a dynamic property bag object from the listed key and value inputs.

Examples

Example 1

The following example creates and returns a property bag from an alternating list of keys and values.

print bag_pack("Level", "Information", "ProcessID", 1234, "Data", bag_pack("url", "www.bing.com"))

Results

print_0
{“Level”:“Information”,“ProcessID”:1234,“Data”:{“url”:“www.bing.com”}}

Example 2

The following example creates a property bag and extract value from property bag using ‘.’ operator.

datatable (
    Source: int,
    Destination: int,
    Message: string
) [
    1234, 100, "AA", 
    4567, 200, "BB",
    1212, 300, "CC" 
]
| extend MyBag=bag_pack("Dest", Destination, "Mesg", Message)
| project-away Source, Destination, Message
| extend MyBag_Dest=MyBag.Dest, MyBag_Mesg=MyBag.Mesg

Results

MyBagMyBag_DestMyBag_Mesg
{“Dest”:100,“Mesg”:“AA”}100AA
{“Dest”:200,“Mesg”:“BB”}200BB
{“Dest”:300,“Mesg”:“CC”}300CC

Example 3

The following example uses two tables, SmsMessages and MmsMessages, and returns their common columns and a property bag from the other columns. The tables are created ad-hoc as part of the query.

SmsMessages

SourceNumberTargetNumberCharsCount
555-555-1234555-555-121246
555-555-1234555-555-121350
555-555-1212555-555-123432

MmsMessages

SourceNumberTargetNumberAttachmentSizeAttachmentTypeAttachmentName
555-555-1212555-555-1213200jpegPic1
555-555-1234555-555-1212250jpegPic2
555-555-1234555-555-1213300pngPic3
let SmsMessages = datatable (
    SourceNumber: string,
    TargetNumber: string,
    CharsCount: string
) [
    "555-555-1234", "555-555-1212", "46", 
    "555-555-1234", "555-555-1213", "50",
    "555-555-1212", "555-555-1234", "32" 
];
let MmsMessages = datatable (
    SourceNumber: string,
    TargetNumber: string,
    AttachmentSize: string,
    AttachmentType: string,
    AttachmentName: string
) [
    "555-555-1212", "555-555-1213", "200", "jpeg", "Pic1",
    "555-555-1234", "555-555-1212", "250", "jpeg", "Pic2",
    "555-555-1234", "555-555-1213", "300", "png", "Pic3"
];
SmsMessages 
| join kind=inner MmsMessages on SourceNumber
| extend Packed=bag_pack("CharsCount", CharsCount, "AttachmentSize", AttachmentSize, "AttachmentType", AttachmentType, "AttachmentName", AttachmentName) 
| where SourceNumber == "555-555-1234"
| project SourceNumber, TargetNumber, Packed

Results

SourceNumberTargetNumberPacked
555-555-1234555-555-1213{“CharsCount”:“50”,“AttachmentSize”:“250”,“AttachmentType”:“jpeg”,“AttachmentName”:“Pic2”}
555-555-1234555-555-1212{“CharsCount”:“46”,“AttachmentSize”:“250”,“AttachmentType”:“jpeg”,“AttachmentName”:“Pic2”}
555-555-1234555-555-1213{“CharsCount”:“50”,“AttachmentSize”:“300”,“AttachmentType”:“png”,“AttachmentName”:“Pic3”}
555-555-1234555-555-1212{“CharsCount”:“46”,“AttachmentSize”:“300”,“AttachmentType”:“png”,“AttachmentName”:“Pic3”}

12.28 - bag_remove_keys()

Learn how to use the bag_remove_keys() function to remove keys and associated values from property bags.

Removes keys and associated values from a dynamic property bag.

Syntax

bag_remove_keys(bag,keys)

Parameters

NameTypeRequiredDescription
bagdynamic✔️The property bag from which to remove keys.
keysdynamic✔️List of keys to be removed from the input. The keys are the first level of the property bag. You can specify keys on the nested levels using JSONPath notation. Array indexing isn’t supported.

Returns

Returns a dynamic property bag without specified keys and their values.

Examples

datatable(input:dynamic)
[
    dynamic({'key1' : 123,     'key2': 'abc'}),
    dynamic({'key1' : 'value', 'key3': 42.0}),
]
| extend result=bag_remove_keys(input, dynamic(['key2', 'key4']))

Output

inputresult
{
“key1”: 123,
“key2”: “abc”
}
{
“key1”: 123
}
{
“key1”: “value”,
“key3”: 42.0
}
{
“key1”: “value”,
“key3”: 42.0
}

Remove inner properties of dynamic values using JSONPath notation

datatable(input:dynamic)
[
    dynamic({'key1': 123, 'key2': {'prop1' : 'abc', 'prop2': 'xyz'}, 'key3': [100, 200]}),
]
| extend result=bag_remove_keys(input, dynamic(['$.key2.prop1', 'key3']))

Output

inputresult
{
“key1”: 123,
“key2”: {
“prop1”: “abc”,
“prop2”: “xyz”
},
“key3”: [
100,
200
]
}
{
“key1”: 123,
“key2”: {
“prop2”: “xyz”
}
}

12.29 - bag_set_key()

Learn how to use the bag_set_key() function to set a given key to a given value in a dynamic property-bag.

bag_set_key() receives a dynamic property-bag, a key and a value. The function sets the given key in the bag to the given value. The function overrides any existing value in case the key already exists.

Syntax

bag_set_key(bag,key,value)

Parameters

NameTypeRequiredDescription
bagdynamic✔️The property bag to modify.
keystring✔️The key to set. Either a JSON path (you can specify a key on the nested levels using JSONPath notation) or the key name for a root level key. Array indexing or root JSON paths aren’t supported.
valueany scalar data type✔️The value to which the key is set.

Returns

Returns a dynamic property-bag with specified key-value pairs. If the input bag isn’t a property-bag, a null value is returned.

Examples

Use a root-level key

datatable(input: dynamic) [
    dynamic({'key1': 1, 'key2': 2}), 
    dynamic({'key1': 1, 'key3': 'abc'}),
]
| extend result = bag_set_key(input, 'key3', 3)
inputresult
{
“key1”: 1,
“key2”: 2
}
{
“key1”: 1,
“key2”: 2,
“key3”: 3
}
{
“key1”: 1,
“key3”: “abc”
}
{
“key1”: 1,
“key3”: 3
}

Use a JSONPath key

datatable(input: dynamic)[
    dynamic({'key1': 123, 'key2': {'prop1': 123, 'prop2': 'xyz'}}),
    dynamic({'key1': 123})
]
| extend result = bag_set_key(input, '$.key2.prop1', 'abc')
inputresult
{
“key1”: 123,
“key2”: {
“prop1”: 123,
“prop2”: “xyz”
}
}
{
“key1”: 123,
“key2”: {
“prop1”: “abc”,
“prop2”: “xyz”
}
}
{
“key1”: 123
}
{
“key1”: 123,
“key2”: {
“prop1”: “abc”
}
}

12.30 - bag_zip()

Learn how to use bag_zip() to merge two dynamic arrays into a single property-bag of keys and values.

Creates a dynamic property-bag from two input dynamic arrays. In the resulting property-bag, the values from the first input array are used as the property keys, while the values from the second input array are used as corresponding property values.

Syntax

bag_zip(KeysArray, ValuesArray)

Parameters

NameTypeRequiredDescription
KeysArraydynamic✔️An array of strings. These strings represent the property names for the resulting property-bag.
ValuesArraydynamic✔️An array whose values will be the property values for the resulting property-bag.

Returns

Returns a dynamic property-bag.

Examples

In the following example, the array of keys and the array of values are the same length and are zipped together into a dynamic property bag.

let Data = datatable(KeysArray: dynamic, ValuesArray: dynamic) [
    dynamic(['a', 'b', 'c']), dynamic([1, '2', 3.4])
];
Data
| extend NewBag = bag_zip(KeysArray, ValuesArray)
KeysArrayValuesArrayNewBag
[‘a’,‘b’,‘c’][1,‘2’,3.4]{‘a’: 1,‘b’: ‘2’,‘c’: 3.4}

More keys than values

In the following example, the array of keys is longer than the array of values. The missing values are filled with nulls.

let Data = datatable(KeysArray: dynamic, ValuesArray: dynamic) [
    dynamic(['a', 'b', 'c']), dynamic([1, '2'])
];
Data
| extend NewBag = bag_zip(KeysArray, ValuesArray)
KeysArrayValuesArrayNewBag
[‘a’,‘b’,‘c’][1,‘2’]{‘a’: 1,‘b’: ‘2’,‘c’: null}

More values than keys

In the following example, the array of values is longer than the array of keys. Values with no matching keys are ignored.

let Data = datatable(KeysArray: dynamic, ValuesArray: dynamic) [
    dynamic(['a', 'b']), dynamic([1, '2', 2.5])
];
Data
| extend NewBag = bag_zip(KeysArray, ValuesArray)
KeysArrayValuesArrayNewBag
[‘a’,‘b’][1,‘2’,2.5]{‘a’: 1,‘b’: ‘2’}

Non-string keys

In the following example, there are some values in they keys array that aren’t of type string. The non-string values are ignored.

let Data = datatable(KeysArray: dynamic, ValuesArray: dynamic) [
    dynamic(['a', 8, 'b']), dynamic([1, '2', 2.5])
];
Data
| extend NewBag = bag_zip(KeysArray, ValuesArray)
KeysArrayValuesArrayNewBag
[‘a’,8,‘b’][1,‘2’,2.5]{‘a’: 1,‘b’: 2.5}

Fill values with null

In the following example, the parameter that is supposed to be an array of values isn’t an array, so all values are filled with nulls.

let Data = datatable(KeysArray: dynamic, ValuesArray: dynamic) [
    dynamic(['a', 8, 'b']), dynamic(1)
];
Data
| extend NewBag = bag_zip(KeysArray, ValuesArray)
KeysArrayValuesArrayNewBag
[‘a’,8,‘b’]1{‘a’: null,‘b’: null}

Null property-bag

In the following example, the parameter that is supposed to be an array of keys isn’t an array, so the resulting property-bag is null.

let Data = datatable(KeysArray: dynamic, ValuesArray: dynamic) [
    dynamic('a'), dynamic([1, '2', 2.5])
];
Data
| extend NewBag = bag_zip(KeysArray, ValuesArray)
| extend IsNewBagEmpty=isnull(NewBag)

| KeysArray | ValuesArray | NewBag | IsNewBagEmpty | |–|–|–| | a | [1,‘2’,2.5] | | TRUE |

12.31 - base64_decode_toarray()

Learn how to use the base64_decode_toarray() function to decode a base64 string into an array of long values.

Decodes a base64 string to an array of long values.

Syntax

base64_decode_toarray(base64_string)

Parameters

NameTypeRequiredDescription
base64_stringstring✔️The value to decode from base64 to an array of long values.

Returns

Returns an array of long values decoded from a base64 string.

Example

print Quine=base64_decode_toarray("S3VzdG8=")  
// 'K', 'u', 's', 't', 'o'

Output

Quine
[75,117,115,116,111]

12.32 - base64_decode_toguid()

Learn how to use base64_decode_toguid() function to return a GUID from a base64 string.

Decodes a base64 string to a GUID.

Syntax

base64_decode_toguid(base64_string)

Parameters

NameTypeRequiredDescription
base64_stringstring✔️The value to decode from base64 to a GUID.

Returns

Returns a GUID decoded from a base64 string.

Example

print Quine = base64_decode_toguid("JpbpECu8dUy7Pv5gbeJXAA==")  

Output

Quine
10e99626-bc2b-754c-bb3e-fe606de25700

If you try to decode an invalid base64 string, “null” will be returned:

print Empty = base64_decode_toguid("abcd1231")

To encode a GUID to a base64 string, see base64_encode_fromguid().

12.33 - base64_decode_tostring()

Learn how to use a base64_decode_tostring() function to decode a base64 string into a UTF-8 string.

Decodes a base64 string to a UTF-8 string.

Syntax

base64_decode_tostring(base64_string)

Parameters

NameTypeRequiredDescription
base64_stringstring✔️The value to decode from base64 to UTF-8 string.

Returns

Returns UTF-8 string decoded from base64 string.

Example

print Quine=base64_decode_tostring("S3VzdG8=")

Output

Quine
Kusto

Trying to decode a base64 string that was generated from invalid UTF-8 encoding returns null:

print Empty=base64_decode_tostring("U3RyaW5n0KHR0tGA0L7Rh9C60LA=")

Output

Empty

12.34 - base64_encode_fromarray()

Learn how to use the base64_encode_fromarray() function to encode a base64 string from a bytes array.

Encodes a base64 string from a bytes array.

Syntax

base64_encode_fromarray(base64_string_decoded_as_a_byte_array)

Parameters

NameTypeRequiredDescription
base64_string_decoded_as_a_byte_arraydynamic✔️The bytes (integer) array to be encoded into a base64 string.

Returns

Returns the base64 string encoded from the bytes array. Note that byte is an integer type.

Examples

let bytes_array = toscalar(print base64_decode_toarray("S3VzdG8="));
print decoded_base64_string = base64_encode_fromarray(bytes_array)

Output

decoded_base64_string
S3VzdG8=

Trying to encode a base64 string from an invalid bytes array that was generated from invalid UTF-8 encoded string will return null:

let empty_bytes_array = toscalar(print base64_decode_toarray("U3RyaW5n0KHR0tGA0L7Rh9C60LA"));
print empty_string = base64_encode_fromarray(empty_bytes_array)

Output

empty_string

12.35 - base64_encode_fromguid()

Learn how to use the base64_encode_fromguid() function to return a base64 string from a GUID.

Encodes a GUID to a base64 string.

Syntax

base64_encode_fromguid(guid)

Parameters

NameTypeRequiredDescription
guidguid✔️The value to encode to a base64 string.

Returns

Returns a base64 string encoded from a GUID.

Example

print Quine = base64_encode_fromguid(toguid("ae3133f2-6e22-49ae-b06a-16e6a9b212eb"))  

Output

Quine
8jMxriJurkmwahbmqbIS6w==

If you try to encode anything that isn’t a GUID as below, an error will be thrown:

print Empty = base64_encode_fromguid("abcd1231")

12.36 - base64_encode_tostring()

This article describes base64_encode_tostring().

Encodes a string as base64 string.

Syntax

base64_encode_tostring(string)

Parameters

NameTypeRequiredDescription
stringstring✔️The value to encode as a base64 string.

Returns

Returns string encoded as a base64 string.

Example

print Quine=base64_encode_tostring("Kusto")

Output

Quine
S3VzdG8=

12.37 - beta_cdf()

Learn how to use the beta_cdf() function to return a standard beta cumulative distribution function.

Returns the standard cumulative beta distribution function.

If probability = beta_cdf(x,…), then beta_inv(probability,…) = x.

The beta distribution is commonly used to study variation in the percentage of something across samples, such as the fraction of the day people spend watching television.

Syntax

beta_cdf(x, alpha, beta)

Parameters

NameTypeRequiredDescription
xint, long, or real✔️A value at which to evaluate the function.
alphaint, long, or real✔️A parameter of the distribution.
betaint, long, or real✔️A parameter of the distribution.

Returns

The cumulative beta distribution function.

Examples

datatable(x:double, alpha:double, beta:double, comment:string)
[
    0.9, 10.0, 20.0, "Valid input",
    1.5, 10.0, 20.0, "x > 1, yields NaN",
    double(-10), 10.0, 20.0, "x < 0, yields NaN",
    0.1, double(-1.0), 20.0, "alpha is < 0, yields NaN"
]
| extend b = beta_cdf(x, alpha, beta)

Output

xalphabetacommentb
0.91020Valid input0.999999999999959
1.51020x > 1, yields NaNNaN
-101020x < 0, yields NaNNaN
0.1-120alpha is < 0, yields NaNNaN
  • For computing the inverse of the beta cumulative probability density function, see beta-inv().
  • For computing probability density function, see beta-pdf().

12.38 - beta_inv()

Learn how to use the beta_inv() function to return the inverse of the beta cumulative probability density function.

Returns the inverse of the beta cumulative probability density function.

If probability = beta_cdf(x,…), then beta_inv(probability,…) = x.

The beta distribution can be used in project planning to model probable completion times given an expected completion time and variability.

Syntax

beta_inv(probability,alpha,beta)

Parameters

NameTypeRequiredDescription
probabilityint, long, or real✔️A probability associated with the beta distribution.
alphaint, long, or real✔️A parameter of the distribution.
betaint, long, or real✔️A parameter of the distribution.

Returns

The inverse of the beta cumulative probability density function beta_cdf()

Examples

datatable(p:double, alpha:double, beta:double, comment:string)
[
    0.1, 10.0, 20.0, "Valid input",
    1.5, 10.0, 20.0, "p > 1, yields null",
    0.1, double(-1.0), 20.0, "alpha is < 0, yields NaN"
]
| extend b = beta_inv(p, alpha, beta)

Output

palphabetacommentb
0.11020Valid input0.226415022388749
1.51020p > 1, yields null
0.1-120alpha is < 0, yields NaNNaN
  • For computing cumulative beta distribution function, see beta-cdf().
  • For computing probability beta density function, see beta-pdf().

12.39 - beta_pdf()

Learn how to use the beta_pdf() function to return the beta probability density function.

Returns the probability density beta function.

The beta distribution is commonly used to study variation in the percentage of something across samples, such as the fraction of the day people spend watching television.

Syntax

beta_pdf(x, alpha, beta)

Parameters

NameTypeRequiredDescription
xint, long, or real✔️A value at which to evaluate the function.
alphaint, long, or real✔️A parameter of the distribution.
betaint, long, or real✔️A parameter of the distribution.

Returns

The probability beta density function.

Examples

datatable(x:double, alpha:double, beta:double, comment:string)
[
    0.5, 10.0, 20.0, "Valid input",
    1.5, 10.0, 20.0, "x > 1, yields NaN",
    double(-10), 10.0, 20.0, "x < 0, yields NaN",
    0.1, double(-1.0), 20.0, "alpha is < 0, yields NaN"
]
| extend r = beta_pdf(x, alpha, beta)

Output

xalphabetacommentr
0.51020Valid input0.746176019310951
1.51020x > 1, yields NaNNaN
-101020x < 0, yields NaNNaN
0.1-120alpha is < 0, yields NaNNaN
  • For computing the inverse of the beta cumulative probability density function, see beta-inv().
  • For the standard cumulative beta distribution function, see beta-cdf().

12.40 - bin_at()

Learn how to use the bin_at() function to round values down to a fixed-size bin.

Returns the value rounded down to the nearest bin size, which is aligned to a fixed reference point.

In contrast to the bin() function, where the point of alignment is predefined, bin_at() allows you to define a fixed point for alignment. Results can align before or after the fixed point.

Syntax

bin_at (value,bin_size,fixed_point)

Parameters

NameTypeRequiredDescription
valueint, long, real, timespan, or datetime✔️The value to round.
bin_sizeint, long, real, or timespan✔️The size of each bin.
fixed_pointint, long, real, timespan, or datetime✔️A constant of the same type as value, which is used as a fixed reference point.

Returns

The nearest multiple of bin_size below the given value that aligns to the specified fixed_point.

Examples

In the following example, value is rounded down to the nearest bin_size that aligns to the fixed_point.

print bin_at(6.5, 2.5, 7)

Output

print_0
4.5

In the following example, the time interval is binned into daily bins aligned to a 12 hour fixed point. The return value is -12 since a daily bin aligned to 12 hours rounds down to 12 on the previous day.

print bin_at(time(1h), 1d, 12h)

Output

print_0
-12:00:00

In the following example, daily bins align to noon.

print bin_at(datetime(2017-05-15 10:20:00.0), 1d, datetime(1970-01-01 12:00:00.0))

Output

print_0
2017-05-14T12:00:00Z

In the following example, bins are weekly and align to the start of Sunday June 6, 2017. The example returns a bin aligned to Sundays.

print bin_at(datetime(2017-05-17 10:20:00.0), 7d, datetime(2017-06-04 00:00:00.0))

Output

print_0
2017-05-14T00:00:00Z

In the following example, the total number of events are grouped into daily bins aligned to the fixed_point date and time. The fixed_point value is included in one of the returned bins.

datatable(Date:datetime, NumOfEvents:int)[
datetime(2018-02-24T15:14),3,
datetime(2018-02-24T15:24),4,
datetime(2018-02-23T16:14),4,
datetime(2018-02-23T17:29),4,
datetime(2018-02-26T15:14),5]
| summarize TotalEvents=sum(NumOfEvents) by bin_at(Date, 1d, datetime(2018-02-24 15:14:00.0000000)) 

Output

DateTotalEvents
2018-02-23T15:14:00Z8
2018-02-24T15:14:00Z7
2018-02-26T15:14:00Z5

12.41 - bin_auto()

Learn how to use the bin_auto() function to round values down to a fixed-size bin.

Rounds values down to a fixed-size bin, with control over the bin size and starting point provided by a query property.

Syntax

bin_auto (value)

Parameters

NameTypeRequiredDescription
valueint, long, real, timespan, or datetime✔️The value to round into bins.

To control the bin size and starting point, set the following parameters before using the function.

NameTypeRequiredDescription
query_bin_auto_sizeint, long, real, or timespan✔️Indicates the size of each bin.
query_bin_auto_atint, long, real, or timespanIndicates one value of value which is a “fixed point” for which bin_auto(fixed_point) == fixed_point. Default is 0.

Returns

The nearest multiple of query_bin_auto_size below value, shifted so that query_bin_auto_at will be translated into itself.

Examples

set query_bin_auto_size=1h;
set query_bin_auto_at=datetime(2017-01-01 00:05);
range Timestamp from datetime(2017-01-01 00:05) to datetime(2017-01-01 02:00) step 1m
| summarize count() by bin_auto(Timestamp)

Output

Timestampcount_
2017-01-01 00:05:00.000000060
2017-01-01 01:05:00.000000056

12.42 - bin()

Learn how to use the bin() function to round values down to an integer multiple of a given bin size.

Rounds values down to an integer multiple of a given bin size.

Used frequently in combination with summarize by .... If you have a scattered set of values, they’ll be grouped into a smaller set of specific values.

Syntax

bin(value,roundTo)

Parameters

NameTypeRequiredDescription
valueint, long, real, timespan, or datetime✔️The value to round down.
roundToint, long, real, or timespan✔️The “bin size” that divides value.

Returns

The nearest multiple of roundTo below value. Null values, a null bin size, or a negative bin size will result in null.

Examples

Numeric bin

print bin(4.5, 1)

Output

print_0
4

Timespan bin

print bin(time(16d), 7d)

Output

print_0
14:00:00:00

Datetime bin

print bin(datetime(1970-05-11 13:45:07), 1d)

Output

print_0
1970-05-11T00:00:00Z

Pad a table with null bins

When there are rows for bins with no corresponding row in the table, we recommend to pad the table with those bins. The following query looks at strong wind storm events in California for a week in April. However, there are no events on some of the days.

let Start = datetime('2007-04-07');
let End = Start + 7d;
StormEvents
| where StartTime between (Start .. End)
| where State == "CALIFORNIA" and EventType == "Strong Wind"
| summarize PropertyDamage=sum(DamageProperty) by bin(StartTime, 1d)

Output

StartTimePropertyDamage
2007-04-08T00:00:00Z3000
2007-04-11T00:00:00Z1000
2007-04-12T00:00:00Z105000

In order to represent the full week, the following query pads the result table with null values for the missing days. Here’s a step-by-step explanation of the process:

  1. Use the union operator to add more rows to the table.
  2. The range operator produces a table that has a single row and column.
  3. The mv-expand operator over the range function creates as many rows as there are bins between StartTime and EndTime.
  4. Use a PropertyDamage of 0.
  5. The summarize operator groups together bins from the original table to the table produced by the union expression. This process ensures that the output has one row per bin whose value is either zero or the original count.
let Start = datetime('2007-04-07');
let End = Start + 7d;
StormEvents
| where StartTime between (Start .. End)
| where State == "CALIFORNIA" and EventType == "Strong Wind"
| union (
    range x from 1 to 1 step 1
    | mv-expand StartTime=range(Start, End, 1d) to typeof(datetime)
    | extend PropertyDamage=0
    )
| summarize PropertyDamage=sum(DamageProperty) by bin(StartTime, 1d)

Output

StartTimePropertyDamage
2007-04-07T00:00:00Z0
2007-04-08T00:00:00Z3000
2007-04-09T00:00:00Z0
2007-04-10T00:00:00Z0
2007-04-11T00:00:00Z1000
2007-04-12T00:00:00Z105000
2007-04-13T00:00:00Z0
2007-04-14T00:00:00Z0

12.43 - binary_and()

Learn how to use the binary_and() function to compare bits in corresponding operands.

Returns a result of the bitwise AND operation between two values.

Syntax

binary_and(value1,value2)

Parameters

NameTypeRequiredDescription
value1long✔️The left-hand value of the bitwise AND operation.
value2long✔️The right-hand value of the bitwise AND operation.

Returns

Returns logical AND operation on a pair of numbers: value1 & value2.

12.44 - binary_not()

Learn how to use the binary_not() function to return a bitwise negation of the input value.

Returns a bitwise negation of the input value.

Syntax

binary_not(value)

Parameters

NameTypeRequiredDescription
valuelong✔️The value to negate.

Returns

Returns logical NOT operation on a number: value.

Example

binary_not(100)

Output

result
-101

12.45 - binary_or()

Learn how to use the bianry_or() function to perform a bitwise OR operation of the two values.

Returns a result of the bitwise or operation of the two values.

Syntax

binary_or(value1, value2 )

Parameters

NameTypeRequiredDescription
value1long✔️The left-hand value of the bitwise OR operation.
value2long✔️The right-hand value of the bitwise OR operation.

Returns

Returns logical OR operation on a pair of numbers: value1 | value2.

12.46 - binary_shift_left()

Learn how to use the binary_shift_left() function to perform a binary shift left operation on a pair of numbers.

Returns binary shift left operation on a pair of numbers.

Syntax

binary_shift_left(value,shift)

Parameters

NameTypeRequiredDescription
valueint✔️The value to shift left.
shiftint✔️The number of bits to shift left.

Returns

Returns binary shift left operation on a pair of numbers: value « (shift%64). If n is negative, a NULL value is returned.

Example

binary_shift_left(1,2)

Output

Result
4

12.47 - binary_shift_right()

Learn how to use the binary_shift_right() function to perform a binary shift right operation on a pair of numbers.

Returns binary shift right operation on a pair of numbers.

Syntax

binary_shift_right(value,shift)

Parameters

NameTypeRequiredDescription
valueint✔️The value to shift right.
shiftint✔️The number of bits to shift right.

Returns

Returns binary shift right operation on a pair of numbers: value » (shift%64). If n is negative, a NULL value is returned.

Examples

binary_shift_right(1,2)

Output

Result
0

12.48 - binary_xor()

Learn how to use the binary_xor() function to perform the bitwise xor operation on a pair of values.

Returns a result of the bitwise xor operation of the two values.

Syntax

binary_xor(value1,value2)

Parameters

NameTypeRequiredDescription
value1int✔️The left-side value of the XOR operation.
value2int✔️The right-side value of the XOR operation.

Returns

Returns logical XOR operation on a pair of numbers: value1 ^ value2.

Examples

binary_xor(1,1)

Output

Result
0
binary_xor(1,2)

Output

Result
3

12.49 - bitset_count_ones()

Learn how to use the bitset_count_ones() function to return the number of set bits in the binary representation of a number.

Returns the number of set bits in the binary representation of a number.

Syntax

bitset_count_ones(value)

Parameters

NameTypeRequiredDescription
valueint✔️The value for which to calculate the number of set bits.

Returns

Returns the number of set bits in the binary representation of a number.

Example

// 42 = 32+8+2 : b'00101010' == 3 bits set
print ones = bitset_count_ones(42) 

Output

ones
3

12.50 - case()

Learn how to use the case() function to evaluate a list of predicates and return the first expression for which the predicate evaluates to true.

Evaluates a list of predicates and returns the first result expression whose predicate is satisfied.

If none of the predicates return true, the result of the else expression is returned. All predicate arguments must be expressions that evaluate to a boolean value. All then arguments and the else argument must be of the same type.

Syntax

case(predicate_1, then_1, [predicate_2, then_2, …] else)

Parameters

NameTypeRequiredDescription
predicatestring✔️An expression that evaluates to a boolean value.
thenstring✔️An expression that gets evaluated and its value is returned from the function if predicate is the first predicate that evaluates to true.
elsestring✔️An expression that gets evaluated and its value is returned from the function if neither of the predicate_i evaluate to true.

Returns

The value of the first then_i whose predicate_i evaluates to true, or the value of else if neither of the predicates are satisfied.

Example

range Size from 1 to 15 step 2
| extend bucket = case(Size <= 3, "Small", 
                       Size <= 10, "Medium", 
                       "Large")

Output

Sizebucket
1Small
3Small
5Medium
7Medium
9Medium
11Large
13Large
15Large

12.51 - ceiling()

Learn how to use the ceiling() function to calculate the smallest integer greater than, or equal to, the specified numeric expression.

Calculates the smallest integer greater than, or equal to, the specified numeric expression.

Syntax

ceiling(number)

Parameters

NameTypeRequiredDescription
numberint, long, or real✔️The value to round up.

Returns

The smallest integer greater than, or equal to, the specified numeric expression.

Examples

print c1 = ceiling(-1.1), c2 = ceiling(0), c3 = ceiling(0.9)

Output

c1c2c3
-101

12.52 - coalesce()

Learn how to use the coalesce() function to evaluate a list of expressions to return the first non-null expression.

Evaluates a list of expressions and returns the first non-null (or non-empty for string) expression.

Syntax

coalesce(arg,arg_2,[arg_3,...])

Parameters

NameTypeRequiredDescription
argscalar✔️The expression to be evaluated.

Returns

The value of the first arg whose value isn’t null (or not-empty for string expressions).

Example

print result=coalesce(tolong("not a number"), tolong("42"), 33)

Output

result
42

12.53 - column_ifexists()

Learn how to use the column_ifexists() function to return a reference to the column if it exists.

Displays the column, if the column exists. Otherwise, it returns the default column.

Syntax

column_ifexists(columnName,defaultValue)

Parameters

NameTypeRequiredDescription
columnNamestring✔️The name of the column to return.
defaultValuescalar✔️The default column to return if columnName doesn’t exist in the table. This value can be any scalar expression. For example, a reference to another column.

Returns

If columnName exists, then returns the column. Otherwise, it returns the defaultValue column.

Example

This example returns the default State column, because a column named Capital doesn’t exist in the StormEvents table.

StormEvents | project column_ifexists("Capital", State)

Output

This output shows the first 10 rows of the default State column.

State
ATLANTIC SOUTH
FLORIDA
FLORIDA
GEORGIA
MISSISSIPPI
MISSISSIPPI
MISSISSIPPI
MISSISSIPPI
AMERICAN SAMOA
KENTUCKY

12.54 - convert_angle()

Learn how to use the convert_angle() function to convert an angle input value from one unit to another.

Convert an angle value from one unit to another.

Syntax

convert_angle(value,from,to)

Parameters

NameTypeRequiredDescription
valuereal✔️The value to be converted.
fromstring✔️The unit to convert from. For possible values, see Conversion units.
tostring✔️The unit to convert to. For possible values, see Conversion units.

Conversion units

  • Arcminute
  • Arcsecond
  • Centiradian
  • Deciradian
  • Degree
  • Gradian
  • Microdegree
  • Microradian
  • Millidegree
  • Milliradian
  • Nanodegree
  • Nanoradian
  • NatoMil
  • Radian
  • Revolution
  • Tilt

Returns

Returns the input value converted from one angle unit to another. Invalid units return null.

Example

print result = convert_angle(1.2, 'Degree', 'Arcminute')

Output

result
72

12.55 - convert_energy()

Learn how to use the convert_energy() function to convert an energy input value from one unit to another.

Convert an energy value from one unit to another.

Syntax

convert_energy(value,from,to)

Parameters

NameTypeRequiredDescription
valuereal✔️The value to be converted.
fromstring✔️The unit to convert from. For possible values, see Conversion units.
tostring✔️The unit to convert to. For possible values, see Conversion units.

Conversion units

  • BritishThermalUnit
  • Calorie
  • DecathermEc
  • DecathermImperial
  • DecathermUs
  • ElectronVolt
  • Erg
  • FootPound
  • GigabritishThermalUnit
  • GigaelectronVolt
  • Gigajoule
  • GigawattDay
  • GigawattHour
  • HorsepowerHour
  • Joule
  • KilobritishThermalUnit
  • Kilocalorie
  • KiloelectronVolt
  • Kilojoule
  • KilowattDay
  • KilowattHour
  • MegabritishThermalUnit
  • Megacalorie
  • MegaelectronVolt
  • Megajoule
  • MegawattDay
  • MegawattHour
  • Millijoule
  • TeraelectronVolt
  • TerawattDay
  • TerawattHour
  • ThermEc
  • ThermImperial
  • ThermUs
  • WattDay
  • WattHour

Returns

Returns the input value converted from one energy unit to another. Invalid units return null.

Example

print result = convert_energy(1.2, 'Joule', 'BritishThermalUnit')

Output

result
0.00113738054437598

12.56 - convert_force()

Learn how to use the convert_force() function to convert a force input value from one unit to another.

Convert a force value from one unit to another.

Syntax

convert_force(value,from,to)

Parameters

NameTypeRequiredDescription
valuereal✔️The value to be converted.
fromstring✔️The unit to convert from. For possible values, see Conversion units.
tostring✔️The unit to convert to. For possible values, see Conversion units.

Conversion units

  • Decanewton
  • Dyn
  • KilogramForce
  • Kilonewton
  • KiloPond
  • KilopoundForce
  • Meganewton
  • Micronewton
  • Millinewton
  • Newton
  • OunceForce
  • Poundal
  • PoundForce
  • ShortTonForce
  • TonneForce

Returns

Returns the input value converted from one force unit to another. Invalid units return null.

Example

print result = convert_force(1.2, 'Newton', 'Decanewton')

Output

result
0.12

12.57 - convert_length()

Learn how to use the convert_length() function to convert a length input value from one unit to another.

Convert a length value from one unit to another.

Syntax

convert_length(value,from,to)

Parameters

NameTypeRequiredDescription
valuereal✔️The value to be converted.
fromstring✔️The unit to convert from. For possible values, see Conversion units.
tostring✔️The unit to convert to. For possible values, see Conversion units.

Conversion units

  • Angstrom
  • AstronomicalUnit
  • Centimeter
  • Chain
  • DataMile
  • Decameter
  • Decimeter
  • DtpPica
  • DtpPoint
  • Fathom
  • Foot
  • Hand
  • Hectometer
  • Inch
  • KilolightYear
  • Kilometer
  • Kiloparsec
  • LightYear
  • MegalightYear
  • Megaparsec
  • Meter
  • Microinch
  • Micrometer
  • Mil
  • Mile
  • Millimeter
  • Nanometer
  • NauticalMile
  • Parsec
  • PrinterPica
  • PrinterPoint
  • Shackle
  • SolarRadius
  • Twip
  • UsSurveyFoot
  • Yard

Returns

Returns the input value converted from one length unit to another. Invalid units return null.

Example

print result = convert_length(1.2, 'Meter', 'Foot')

Output

result
3.93700787401575

12.58 - convert_mass()

Learn how to use the convert_mass() function to convert a mass input value from one unit to another.

Convert a mass value from one unit to another.

Syntax

convert_mass(value,from,to)

Parameters

NameTypeRequiredDescription
valuereal✔️The value to be converted.
fromstring✔️The unit to convert from. For possible values, see Conversion units.
tostring✔️The unit to convert to. For possible values, see Conversion units.

Conversion units

  • Centigram
  • Decagram
  • Decigram
  • EarthMass
  • Grain
  • Gram
  • Hectogram
  • Kilogram
  • Kilopound
  • Kilotonne
  • LongHundredweight
  • LongTon
  • Megapound
  • Megatonne
  • Microgram
  • Milligram
  • Nanogram
  • Ounce
  • Pound
  • ShortHundredweight
  • ShortTon
  • Slug
  • SolarMass
  • Stone
  • Tonne

Returns

Returns the input value converted from one mass unit to another. Invalid units return null.

Example

print result = convert_mass(1.2, 'Kilogram', 'Pound')

Output

result
2.64554714621853

12.59 - convert_speed()

Learn how to use the convert_speed() function to convert a speed input value from one unit to another.

Convert a speed value from one unit to another.

Syntax

convert_speed(value,from,to)

Parameters

NameTypeRequiredDescription
valuereal✔️The value to be converted.
fromstring✔️The unit to convert from. For possible values, see Conversion units.
tostring✔️The unit to convert to. For possible values, see Conversion units.

Conversion units

  • CentimeterPerHour
  • CentimeterPerMinute
  • CentimeterPerSecond
  • DecimeterPerMinute
  • DecimeterPerSecond
  • FootPerHour
  • FootPerMinute
  • FootPerSecond
  • InchPerHour
  • InchPerMinute
  • InchPerSecond
  • KilometerPerHour
  • KilometerPerMinute
  • KilometerPerSecond
  • Knot
  • MeterPerHour
  • MeterPerMinute
  • MeterPerSecond
  • MicrometerPerMinute
  • MicrometerPerSecond
  • MilePerHour
  • MillimeterPerHour
  • MillimeterPerMinute
  • MillimeterPerSecond
  • NanometerPerMinute
  • NanometerPerSecond
  • UsSurveyFootPerHour
  • UsSurveyFootPerMinute
  • UsSurveyFootPerSecond
  • YardPerHour
  • YardPerMinute
  • YardPerSecond

Returns

Returns the input value converted from one speed unit to another. Invalid units return null.

Example

print result = convert_speed(1.2, 'MeterPerSecond', 'CentimeterPerHour')

Output

result
432000

12.60 - convert_temperature()

Learn how to use the convert_temperature() function to convert a temperature input value from one unit to another.

Convert a temperature value from one unit to another.

Syntax

convert_temperature(value,from,to)

Parameters

NameTypeRequiredDescription
valuereal✔️The value to be converted.
fromstring✔️The unit to convert from. For possible values, see Conversion units.
tostring✔️The unit to convert to. For possible values, see Conversion units.

Conversion units

  • DegreeCelsius
  • DegreeDelisle
  • DegreeFahrenheit
  • DegreeNewton
  • DegreeRankine
  • DegreeReaumur
  • DegreeRoemer
  • Kelvin
  • MillidegreeCelsius
  • SolarTemperature

Returns

Returns the input value converted from one temperature unit to another. Invalid units return null.

Example

print result = convert_temperature(1.2, 'Kelvin', 'DegreeCelsius')

Output

result
-271.95

12.61 - convert_volume()

Learn how to use the convert_volume() function to convert a volume input value from one unit to another.

Convert a volume value from one unit to another.

Syntax

convert_volume(value,from,to)

Parameters

NameTypeRequiredDescription
valuereal✔️The value to be converted.
fromstring✔️The unit to convert from. For possible values, see Conversion units.
tostring✔️The unit to convert to. For possible values, see Conversion units.

Conversion units

  • AcreFoot
  • AuTablespoon
  • BoardFoot
  • Centiliter
  • CubicCentimeter
  • CubicDecimeter
  • CubicFoot
  • CubicHectometer
  • CubicInch
  • CubicKilometer
  • CubicMeter
  • CubicMicrometer
  • CubicMile
  • CubicMillimeter
  • CubicYard
  • Decaliter
  • DecausGallon
  • Deciliter
  • DeciusGallon
  • HectocubicFoot
  • HectocubicMeter
  • Hectoliter
  • HectousGallon
  • ImperialBeerBarrel
  • ImperialGallon
  • ImperialOunce
  • ImperialPint
  • KilocubicFoot
  • KilocubicMeter
  • KiloimperialGallon
  • Kiloliter
  • KilousGallon
  • Liter
  • MegacubicFoot
  • MegaimperialGallon
  • Megaliter
  • MegausGallon
  • MetricCup
  • MetricTeaspoon
  • Microliter
  • Milliliter
  • OilBarrel
  • UkTablespoon
  • UsBeerBarrel
  • UsCustomaryCup
  • UsGallon
  • UsLegalCup
  • UsOunce
  • UsPint
  • UsQuart
  • UsTablespoon
  • UsTeaspoon

Returns

Returns the input value converted from one volume unit to another. Invalid units return null.

Example

print result = convert_volume(1.2, 'CubicMeter', 'AcreFoot')

Output

result
0.0009728568

12.62 - cos()

Learn how to use the cos() function to return the cosine of the input value.

Returns the cosine function value of the specified angle. The angle is specified in radians.

Syntax

cos(number)

Parameters

NameTypeRequiredDescription
numberreal✔️The value in radians for which to calculate the cosine.

Returns

The cosine of number of radians.

Example

print cos(1)

Output

result
0.54030230586813977

12.63 - cot()

Learn how to use the cot() function to calculate the trigonometric cotangent of the specified angle in radians.

Calculates the trigonometric cotangent of the specified angle, in radians.

Syntax

cot(number)

Parameters

NameTypeRequiredDescription
numberreal✔️The value for which to calculate the cotangent.

Returns

The cotangent function value for number.

Example

print cot(1)

Output

result
0.64209261593433065

12.64 - countof()

Learn how to use the countof() function to count the occurrences of a substring in a string.

Counts occurrences of a substring in a string. Plain string matches may overlap; regex matches don’t.

Syntax

countof(source, search [, kind])

Parameters

NameTypeRequiredDescription
sourcestring✔️The value to search.
searchstring✔️The value or regular expression to match inside source.
kindstringThe value normal or regex. The default is normal.

Returns

The number of times that the search value can be matched in the source string. Plain string matches may overlap; regex matches don’t.

Examples

Function callResult
countof("aaa", "a")3
countof("aaaa", "aa")3 (not 2!)
countof("ababa", "ab", "normal")2
countof("ababa", "aba")2
countof("ababa", "aba", "regex")1
countof("abcabc", "a.c", "regex")2

12.65 - current_cluster_endpoint()

Learn how to use the current_cluster_endpoint() function to return the network endpoint of the cluster being queried as a string type value.

Returns the network endpoint (DNS name) of the current cluster being queried.

Returns the network endpoint (DNS name) of the current Eventhouse being queried.

Syntax

current_cluster_endpoint()

Returns

The network endpoint (DNS name) of the current cluster being queried, as a value of type string.

The network endpoint (DNS name) of the current Eventhouse being queried, as a value of type string.

Example

print strcat("This query executed on: ", current_cluster_endpoint())

12.66 - current_database()

Learn how to use the current_database() function to return the name of the database in scope as a string type value.

Returns the name of the database in scope (database that all query entities are resolved against if no other database is specified).

Syntax

current_database()

Returns

The name of the database in scope as a value of type string.

Example

print strcat("Database in scope: ", current_database())

12.67 - current_principal_details()

Learn how to use the current_principal_details() function to return the details of the principal running the query.

Returns details of the principal running the query.

Syntax

current_principal_details()

Returns

The details of the current principal as a dynamic. The following table describes the returned fields.

FieldDescription
UserPrincipalNameThe sign-in identifier for users. For more information, see UPN.
IdentityProviderThe source that validates the identity of the principal.
AuthorityThe Microsoft Entra tenant ID.
MfaIndicates the use of multifactor authentication. For more information, see Access token claims reference.
TypeThe category of the principal: aaduser, aadapp, or aadgroup.
DisplayNameThe user-friendly name for the principal that is displayed in the UI.
ObjectIdThe Microsoft Entra object ID for the principal.
FQNThe Fully Qualified Name (FQN) of the principal. Valuable for security role management commands. For more information, see Referencing security principals.
CountryThe user’s country or region. This property is returned if the information is present. The value is a standard two-letter country or region code, for example, FR, JP, and SZ.
TenantCountryThe resource tenant’s country or region, set at a tenant level by an admin. This property is returned if the information is present. The value is a standard two-letter country or region code, for example, FR, JP, and SZ.
TenantRegionThe region of the resource tenant. This property is returned if the information is present. The value is a standard two-letter country or region code, for example, FR, JP, and SZ.

Example

print details=current_principal_details()

Example output

details
{
“Country”: “DE”,
“TenantCountry”: “US”,
“TenantRegion”: “WW”,
“UserPrincipalName”: “user@fabrikam.com”,
“IdentityProvider”: “https://sts.windows.net”,
“Authority”: “aaaabbbb-0000-cccc-1111-dddd2222eeee”,
“Mfa”: “True”,
“Type”: “AadUser”,
“DisplayName”: “James Smith (upn: user@fabrikam.com)”,
“ObjectId”: “aaaaaaaa-0000-1111-2222-bbbbbbbbbbbb”,
“FQN”: null,
“Notes”: null
}

12.68 - current_principal_is_member_of()

Learn how to use the current_principal_is_member_of() function to check the identity of the principal running the query.

Checks group membership or principal identity of the current principal running the query.

Syntax

current_principal_is_member_of(group)

Parameters

NameTypeRequiredDescription
groupdynamic✔️An array of string literals in which each literal represents a Microsoft Entra principal. See examples for Microsoft Entra principals.

Returns

The function returns true if the current principal running the query is successfully matched for at least one input argument. If not, the function returns false.

Examples

print result=current_principal_is_member_of(
    'aaduser=user1@fabrikam.com', 
    'aadgroup=group1@fabrikam.com',
    'aadapp=66ad1332-3a94-4a69-9fa2-17732f093664;72f988bf-86f1-41af-91ab-2d7cd011db47'
    )

Output

result
false

Using dynamic array instead of multiple arguments:

print result=current_principal_is_member_of(
    dynamic([
    'aaduser=user1@fabrikam.com', 
    'aadgroup=group1@fabrikam.com',
    'aadapp=66ad1332-3a94-4a69-9fa2-17732f093664;72f988bf-86f1-41af-91ab-2d7cd011db47'
    ]))

Output

result
false

12.69 - current_principal()

Learn how to use the current_principal() function to return the name of the principal running the query.

Returns the current principal name that runs the query.

Syntax

current_principal()

Returns

The current principal fully qualified name (FQN) as a string.
The string format is:
PrinciplaType=PrincipalId;TenantId

Example

print fqn=current_principal()

Example output

fqn
aaduser=346e950e-4a62-42bf-96f5-4cf4eac3f11e;72f988bf-86f1-41af-91ab-2d7cd011db47

12.70 - cursor_after()

Learn how to use the cursor_after() function to compare the ingestion time of the records of a table against the database cursor time.

A predicate run over the records of a table to compare their ingestion time against a database cursor.

IngestionTime policy enabled.

Syntax

cursor_after(RHS)

Parameters

NameTypeRequiredDescription
RHSstring✔️Either an empty string literal or a valid database cursor value.

Returns

A scalar value of type bool that indicates whether the record was ingested after the database cursor RHS (true) or not (false).

12.71 - cursor_before_or_at()

Learn how to use the cursor_before_or_at() function to compare the ingestion time of the records of a table against the database cursor time.

A predicate function run over the records of a table to compare their ingestion time against the database cursor time.

IngestionTime policy enabled.

Syntax

cursor_before_or_at(RHS)

Parameters

NameTypeRequiredDescription
RHSstring✔️Either an empty string literal or a valid database cursor value.

Returns

A scalar value of type bool that indicates whether the record was ingested before or at the database cursor RHS (true) or not (false).

12.72 - cursor_current()

Learn how to use the cursor_current() function to return a string type value.

Retrieves the current value of the cursor of the database in scope.

Syntax

cursor_current()

Returns

Returns a single value of type string that encodes the current value of the cursor of the database in scope.

12.73 - datetime_add()

Learn how to use the datetime_add() function to calculate a new datetime.

Calculates a new datetime from a specified period multiplied by a specified amount, added to, or subtracted from a specified datetime.

Syntax

datetime_add(period,amount,datetime)

Parameters

NameTypeRequiredDescription
periodstring✔️The length of time by which to increment.
amountint✔️The number of periods to add to or subtract from datetime.
datetimedatetime✔️The date to increment by the result of the period x amount calculation.

Possible values of period:

  • Year
  • Quarter
  • Month
  • Week
  • Day
  • Hour
  • Minute
  • Second
  • Millisecond
  • Microsecond
  • Nanosecond

Returns

A datetime after a certain time/date interval has been added.

Examples

Period

print  year = datetime_add('year',1,make_datetime(2017,1,1)),
quarter = datetime_add('quarter',1,make_datetime(2017,1,1)),
month = datetime_add('month',1,make_datetime(2017,1,1)),
week = datetime_add('week',1,make_datetime(2017,1,1)),
day = datetime_add('day',1,make_datetime(2017,1,1)),
hour = datetime_add('hour',1,make_datetime(2017,1,1)),
minute = datetime_add('minute',1,make_datetime(2017,1,1)),
second = datetime_add('second',1,make_datetime(2017,1,1))

Output

yearquartermonthweekdayhourminutesecond
2018-01-01 00:00:00.00000002017-04-01 00:00:00.00000002017-02-01 00:00:00.00000002017-01-08 00:00:00.00000002017-01-02 00:00:00.00000002017-01-01 01:00:00.00000002017-01-01 00:01:00.00000002017-01-01 00:00:01.0000000

Amount

print  year = datetime_add('year',-5,make_datetime(2017,1,1)),
quarter = datetime_add('quarter',12,make_datetime(2017,1,1)),
month = datetime_add('month',-15,make_datetime(2017,1,1)),
week = datetime_add('week',100,make_datetime(2017,1,1))

Output

yearquartermonthweek
2012-01-01T00:00:00Z2020-01-01T00:00:00Z2015-10-01T00:00:00Z2018-12-02T00:00:00Z

12.74 - datetime_diff()

Learn how to use the datetime_diff() function to calculate the period between two datetime values.

Calculates the number of the specified periods between two datetime values.

Syntax

datetime_diff(period,datetime1,datetime2)

Parameters

NameTypeRequiredDescription
periodstring✔️The measurement of time used to calculate the return value. See possible values.
datetime1datetime✔️The left-hand side of the subtraction equation.
datetime2datetime✔️The right-hand side of the subtraction equation.

Possible values of period

These values are case insensitive:

  • Year
  • Quarter
  • Month
  • Week
  • Day
  • Hour
  • Minute
  • Second
  • Millisecond
  • Microsecond
  • Nanosecond

Returns

An integer that represents the amount of periods in the result of subtraction (datetime1 - datetime2).

Example

print
year = datetime_diff('year',datetime(2017-01-01),datetime(2000-12-31)),
quarter = datetime_diff('quarter',datetime(2017-07-01),datetime(2017-03-30)),
month = datetime_diff('month',datetime(2017-01-01),datetime(2015-12-30)),
week = datetime_diff('week',datetime(2017-10-29 00:00),datetime(2017-09-30 23:59)),
day = datetime_diff('day',datetime(2017-10-29 00:00),datetime(2017-09-30 23:59)),
hour = datetime_diff('hour',datetime(2017-10-31 01:00),datetime(2017-10-30 23:59)),
minute = datetime_diff('minute',datetime(2017-10-30 23:05:01),datetime(2017-10-30 23:00:59)),
second = datetime_diff('second',datetime(2017-10-30 23:00:10.100),datetime(2017-10-30 23:00:00.900)),
millisecond = datetime_diff('millisecond',datetime(2017-10-30 23:00:00.200100),datetime(2017-10-30 23:00:00.100900)),
microsecond = datetime_diff('microsecond',datetime(2017-10-30 23:00:00.1009001),datetime(2017-10-30 23:00:00.1008009)),
nanosecond = datetime_diff('nanosecond',datetime(2017-10-30 23:00:00.0000000),datetime(2017-10-30 23:00:00.0000007))

Output

yearquartermonthweekdayhourminutesecondmillisecondmicrosecondnanosecond
172135292510100100-700

12.75 - datetime_list_timezones()

Get a list of all supported timezones.

Returns a list of supported timezones a time-zone specification.

Syntax

datetime_list_timezones()

Parameters

None, the function doesn’t have any parameters.

Returns

A list of timezones supported by the Internet Assigned Numbers Authority (IANA) Time Zone Database.

Example

print datetime_list_timezones()

Output print datetime_list_timezones()

12.76 - datetime_local_to_utc()

Learn how to use the datetime_local_to_utc() function to convert local datetime to UTC datetime.

Converts local datetime to UTC datetime using a time-zone specification.

Syntax

datetime_local_to_utc(from,timezone)

Parameters

NameTypeRequiredDescription
fromdatetime✔️The local datetime to convert.
timezonestring✔️The timezone of the desired datetime. The value must be one of the supported timezones.

Returns

A UTC datetime that corresponds the local datetime in the specified timezone.

Example

datatable(local_dt: datetime, tz: string)
[ datetime(2020-02-02 20:02:20), 'US/Pacific', 
  datetime(2020-02-02 20:02:20), 'America/Chicago', 
  datetime(2020-02-02 20:02:20), 'Europe/Paris']
| extend utc_dt = datetime_local_to_utc(local_dt, tz)

Output

local_dttzutc_dt
2020-02-02 20:02:20.0000000Europe/Paris2020-02-02 19:02:20.0000000
2020-02-02 20:02:20.0000000America/Chicago2020-02-03 02:02:20.0000000
2020-02-02 20:02:20.0000000US/Pacific2020-02-03 04:02:20.0000000
range Local from datetime(2022-03-27 01:00:00.0000000) to datetime(2022-03-27 04:00:00.0000000) step 1h
| extend UTC=datetime_local_to_utc(Local, 'Europe/Brussels')
| extend BackToLocal=datetime_utc_to_local(UTC, 'Europe/Brussels')
| extend diff=Local-BackToLocal
LocalUTCBackToLocaldiff
2022-03-27 02:00:00.00000002022-03-27 00:00:00.00000002022-03-27 01:00:00.000000001:00:00
2022-03-27 01:00:00.00000002022-03-27 00:00:00.00000002022-03-27 01:00:00.000000000:00:00
2022-03-27 03:00:00.00000002022-03-27 01:00:00.00000002022-03-27 03:00:00.000000000:00:00
2022-03-27 04:00:00.00000002022-03-27 02:00:00.00000002022-03-27 04:00:00.000000000:00:00

12.77 - datetime_part()

This article describes datetime_part().

Extracts the requested date part as an integer value.

Syntax

datetime_part(part,datetime)

Parameters

NameTypeRequiredDescription
partstring✔️Measurement of time to extract from date. See possible values.
datedatetime✔️The full date from which to extract part.

Possible values of part

  • Year
  • Quarter
  • Month
  • week_of_year
  • Day
  • DayOfYear
  • Hour
  • Minute
  • Second
  • Millisecond
  • Microsecond
  • Nanosecond

Returns

An integer representing the extracted part.

Example

let dt = datetime(2017-10-30 01:02:03.7654321); 
print 
year = datetime_part("year", dt),
quarter = datetime_part("quarter", dt),
month = datetime_part("month", dt),
weekOfYear = datetime_part("week_of_year", dt),
day = datetime_part("day", dt),
dayOfYear = datetime_part("dayOfYear", dt),
hour = datetime_part("hour", dt),
minute = datetime_part("minute", dt),
second = datetime_part("second", dt),
millisecond = datetime_part("millisecond", dt),
microsecond = datetime_part("microsecond", dt),
nanosecond = datetime_part("nanosecond", dt)

Output

yearquartermonthweekOfYeardaydayOfYearhourminutesecondmillisecondmicrosecondnanosecond
20174104430303123765765432765432100

12.78 - datetime_utc_to_local()

This article describes the datetime_utc_to_local function.

Converts UTC datetime to local datetime using a time-zone specification.

Syntax

datetime_utc_to_local(from,timezone)

Parameters

NameTypeRequiredDescription
fromdatetime✔️The UTC datetime to convert.
timezonestring✔️The timezone to convert to. This value must be one of the supported timezones.

Returns

A local datetime in the timezone that corresponds the UTC datetime.

Example

print dt=now()
| extend pacific_dt = datetime_utc_to_local(dt, 'US/Pacific'), canberra_dt = datetime_utc_to_local(dt, 'Australia/Canberra')
| extend diff = pacific_dt - canberra_dt

Output

dtpacific_dtcanberra_dtdiff
2022-07-11 22:18:48.46786202022-07-11 15:18:48.46786202022-07-12 08:18:48.4678620-17:00:00

12.79 - dayofmonth()

Learn how to use the dayofmonth() function to return an integer representing the day of the month.

Returns an integer representing the day number of the given datetime.

Syntax

dayofmonth(date)

Parameters

NameTypeRequiredDescription
datedatetime✔️The datetime used to extract the day number.

Returns

An integer representing the day number of the given datetime.

Example

dayofmonth(datetime(2015-12-14))

Output

result
14

12.80 - dayofweek()

Learn how to use the dayofweek() function to return the timespan since the preceding Sunday.

Returns the number of days since the preceding Sunday, as a timespan.

To convert timespan to int, see Convert timespan to integer.

Syntax

dayofweek(date)

Parameters

NameTypeRequiredDescription
datedatetime✔️The datetime for which to determine the day of week.

Returns

The timespan since midnight at the beginning of the preceding Sunday, rounded down to an integer number of days.

Examples

The following example returns 0, indicating that the specified datetime is a Sunday.

print
Timespan = dayofweek(datetime(1947-11-30 10:00:05))

Output

Timespan
00:00:00

The following example returns 1, indicating that the specified datetime is a Monday.

print
Timespan = dayofweek(datetime(1970-05-11))

Output

Timespan
1.00:00:00

Convert timespan to integer

The following example returns the number of days both as a timespan and as data type int.

let dow=dayofweek(datetime(1970-5-12));
print Timespan = dow, Integer = toint(dow/1d)

Output

TimespanInteger
2.00:00:002

The timespan data type

12.81 - dayofyear()

Learn how to use the dayofyear() function to return the day number of the given year.

Returns the integer number represents the day number of the given year.

Syntax

dayofyear(date)

Parameters

NameTypeRequiredDescription
datedatetime✔️The datetime for which to determine the day number.

Returns

The day number of the given year.

Example

dayofyear(datetime(2015-12-14))

Output

result
348

12.82 - dcount_hll()

Learn how to use the dcount_hll() function to calculate the distinct count from hyper log log (hll) intermediate calculation results.

Calculates the distinct count from results generated by hll or hll_merge.

Read about the underlying algorithm (HyperLogLog) and estimation accuracy.

Syntax

dcount_hll(hll)

Parameters

NameTypeRequiredDescription
hllstring✔️An expression generated by hll or hll-merge to be used to find the distinct count.

Returns

Returns the distinct count of each value in hll.

Example

The following example shows the distinct count hll merged results.

StormEvents
| summarize hllRes = hll(DamageProperty) by bin(StartTime,10m)
| summarize hllMerged = hll_merge(hllRes)
| project dcount_hll(hllMerged)

Output

dcount_hll_hllMerged
315

Estimation accuracy

12.83 - degrees()

Learn how to use the degrees() function to convert angle values from radians to values in degrees.

Converts angle value in radians into value in degrees, using the formula degrees = (180 / PI ) * angle_in_radians.

Syntax

degrees(radians)

Parameters

NameTypeRequiredDescription
radiansreal✔️The angle in radians to convert to degrees.

Returns

The corresponding angle in degrees for an angle specified in radians.

Examples

print degrees0 = degrees(pi()/4), degrees1 = degrees(pi()*1.5), degrees2 = degrees(0)

Output

degrees0degrees1degrees2
452700

12.84 - dynamic_to_json()

Learn how to use the dynamic_to_json() function to convert a scalar value of type dynamic to a canonical string representation.

Converts a scalar value of type dynamic to a canonical string representation.

Syntax

dynamic_to_json(expr)

Parameters

NameTypeRequiredDescription
exprdynamic✔️The expression to convert to string representation.

Returns

Returns a canonical representation of the input as a value of type string, according to the following rules:

  • If the input is a scalar value of type other than dynamic, the output is the application of tostring() to that value.

  • If the input is an array of values, the output is composed of the characters [, ,, and ] interspersed with the canonical representation described here of each array element.

  • If the input is a property bag, the output is composed of the characters {, ,, and } interspersed with the colon (:)-delimited name/value pairs of the properties. The pairs are sorted by the names, and the values are in the canonical representation described here of each array element.

Example

let bag1 = dynamic_to_json(
  dynamic({
    'Y10':dynamic({}),
    'X8': dynamic({
      'c3':1,
      'd8':5,
      'a4':6
    }),
    'D1':114,
    'A1':12,
    'B1':2,
    'C1':3,
    'A14':[15, 13, 18]
}));
let bag2 = dynamic_to_json(
  dynamic({
    'X8': dynamic({
      'a4':6,
      'c3':1,
      'd8':5
    }),
    'A14':[15, 13, 18],
    'C1':3,
    'B1':2,
    'Y10': dynamic({}),
    'A1':12, 'D1':114
  }));
print AreEqual=bag1 == bag2, Result=bag1

Output

AreEqualResult
true{“A1”:12,“A14”:[15,13,18],“B1”:2,“C1”:3,“D1”:114,“X8”:{“a4”:6,“c3”:1,“d8”:5},“Y10”:{}}

12.85 - endofday()

Learn how to use the endofday() function to return a datetime representing the end of the day for the given date value.

Returns the end of the day containing the date, shifted by an offset, if provided.

Syntax

endofday(date [, offset])

Parameters

NameTypeRequiredDescription
datedatetime✔️The date to find the end of.
offsetintThe number of offset days from date. Default is 0.

Returns

A datetime representing the end of the day for the given date value, with the offset, if specified.

Example

  range offset from -1 to 1 step 1
 | project dayEnd = endofday(datetime(2017-01-01 10:10:17), offset) 

Output

dayEnd
2016-12-31 23:59:59.9999999
2017-01-01 23:59:59.9999999
2017-01-02 23:59:59.9999999

12.86 - endofmonth()

Learn how to use the endofmonth() function to return a datetime representing the end of the month for the given date value.

Returns the end of the month containing the date, shifted by an offset, if provided.

Syntax

endofmonth(date [, offset])

Parameters

NameTypeRequiredDescription
datedatetime✔️The date used to find the end of the month.
offsetintThe number of offset months from date. Default is 0.

Returns

A datetime representing the end of the month for the given date value, with the offset, if specified.

Example

  range offset from -1 to 1 step 1
 | project monthEnd = endofmonth(datetime(2017-01-01 10:10:17), offset) 

Output

monthEnd
2016-12-31 23:59:59.9999999
2017-01-31 23:59:59.9999999
2017-02-28 23:59:59.9999999

12.87 - endofweek()

Learn how to use the endofweek() function to return a datetime representing the end of the week for the given date value.

Returns the end of the week containing the date, shifted by an offset, if provided.

Last day of the week is considered to be a Saturday.

Syntax

endofweek(date [, offset])

Parameters

NameTypeRequiredDescription
datedatetime✔️The date used to find the end of the week.
offsetintThe number of offset weeks from date. Default is 0.

Returns

A datetime representing the end of the week for the given date value, with the offset, if specified.

Example

  range offset from -1 to 1 step 1
 | project weekEnd = endofweek(datetime(2017-01-01 10:10:17), offset)  

Output

weekEnd
2016-12-31 23:59:59.9999999
2017-01-07 23:59:59.9999999
2017-01-14 23:59:59.9999999

12.88 - endofyear()

Learn how to use the endofyear() function to return a datetime representing the end of the year for the given date value.

Returns the end of the year containing the date, shifted by an offset, if provided.

Syntax

endofyear(date [, offset])

Parameters

NameTypeRequiredDescription
datedatetime✔️The date used to find the end of the year.
offsetintThe number of offset years from date. Default is 0.

Returns

A datetime representing the end of the year for the given date value, with the &offset, if specified.

Example

  range offset from -1 to 1 step 1
 | project yearEnd = endofyear(datetime(2017-01-01 10:10:17), offset) 

Output

yearEnd
2016-12-31 23:59:59.9999999
2017-12-31 23:59:59.9999999
2018-12-31 23:59:59.9999999

12.89 - erf()

This article describes erf() function.

Returns the error function of the input.

Syntax

erf(x)

Parameters

NameTypeRequiredDescription
xreal✔️The value for which to calculate the function.

Returns

Error function of x.

Example

range x from -3 to 3 step 1
| extend erf_x = erf(x)
xerf_x
-3-0.999977909503001
-2-0.995322265018953
-1-0.842700792949715
00
10.842700792949715
20.995322265018953
30.999977909503001

12.90 - erfc()

This article describes erfc() function.

Returns the complementary error function of the input.

Syntax

erfc(x)

Parameters

NameTypeRequiredDescription
xreal✔️The value for which to calculate the function.

Returns

Complementary error function of x.

Example

range x from -3 to 3 step 1
| extend erf_x = erfc(x)
xerf_x
-31.999977909503001
-21.995322265018953
-11.842700792949715
01
10.157299207050285
20.00467773498104727
32.20904969985854E-05

12.91 - estimate_data_size()

Learn how to use the estimate_data_size() function to return an estimated data size in bytes of the selected columns of the tabular expression.

Returns an estimated data size in bytes of the selected columns of the tabular expression.

Syntax

estimate_data_size(columns)

Parameters

NameTypeRequiredDescription
columnsstring✔️One or more comma-separated column references in the source tabular expression to use for data size estimation. To include all columns, use the wildcard (*) character.

Returns

The estimated data size in bytes of the referenced columns. Estimation is based on data types and actual values. For example, the data size for the string '{"a":"bcd"}' is smaller than the dynamic value dynamic({"a":"bcd"}) because the latter’s internal representation is more complex than that of a string.

Example

The following example calculates the total data size using estimate_data_size().

range x from 1 to 10 step 1                    // x (long) is 8 
| extend Text = '1234567890'                   // Text length is 10  
| summarize Total=sum(estimate_data_size(*))   // (8+10)x10 = 180

Output

Total
180

12.92 - exp()

Learn how to use the exp() function to return the base-e exponential value of x.

The base-e exponential function of x, which is e raised to the power x: e^x.

Syntax

exp(x)

Parameters

NameTypeRequiredDescription
xreal✔️The value of the exponent.

Returns

The exponential value of x.

  • For natural (base-e) logarithms, see log().
  • For exponential functions of base-2 and base-10 logarithms, see exp2(), exp10().

12.93 - exp10()

Learn how to use the exp10() function to return the base-10 exponential value of x.

The base-10 exponential function of x, which is 10 raised to the power x: 10^x.

Syntax

exp10(x)

Parameters

NameTypeRequiredDescription
xreal✔️The value of the exponent.

Returns

The exponential value of x.

  • For natural (base-10) logarithms, see log10().
  • For exponential functions of base-e and base-2 logarithms, see exp(), exp2().

12.94 - exp2()

Learn how to use the exp2() function to return the base-2 exponential value of x.

The base-2 exponential function of x, which is 2 raised to the power x: 2^x.

Syntax

exp2(x)

Parameters

NameTypeRequiredDescription
xreal✔️The value of the exponent.

Returns

The exponential value of x.

  • For natural (base-2) logarithms, see log2().
  • For exponential functions of base-e and base-10 logarithms, see exp(), exp10().

12.95 - extent_id()

Learn how to use the extent_id() function to return an identifier of the current record’s data shard

Returns a unique identifier that identifies the data shard (“extent”) that the current record resides in at the time the query was run.

Applying this function to calculated data that isn’t attached to a data shard returns an empty guid (all zeros).

Syntax

extent_id()

Returns

A value of type guid that identifies the current record’s data shard at the time the query was run, or an empty guid (all zeros).

Example

The following example shows how to get a list of all the data shards that currently have records from an hour ago with a specific value for the column ActivityId. It demonstrates that some query operators (here, the where operator, and also extend and project) preserve the information about the data shard hosting the record.

T
| where Timestamp > ago(1h)
| where ActivityId == 'dd0595d4-183e-494e-b88e-54c52fe90e5a'
| extend eid=extent_id()
| summarize by eid

12.96 - extent_tags()

Learn how to use the extent_tags() function to return a dynamic array of the data shard that the current record is in.

Returns a dynamic array with the extent tags of the extent that the current record is in.

If you apply this function to calculated data, which isn’t attached to a data shard, returns an empty value.

Syntax

extent_tags()

Returns

A value of type dynamic that is an array holding the current record’s extent tags, or an empty value.

Examples

Some query operators preserve the information about the data shard hosting the record. These operators include where, extend, and project. The following example shows how to get a list the tags of all the data shards that have records from an hour ago, with a specific value for the column ActivityId.

T
| where Timestamp > ago(1h)
| where ActivityId == 'dd0595d4-183e-494e-b88e-54c52fe90e5a'
| extend tags = extent_tags()
| summarize by tostring(tags)

The following example shows how to obtain a count of all records from the last hour, which are stored in extents tagged with the tag MyTag(and potentially other tags), but not tagged with the tag drop-by:MyOtherTag.

T
| where Timestamp > ago(1h)
| extend Tags = extent_tags()
| where Tags has_cs 'MyTag' and Tags !has_cs 'drop-by:MyOtherTag'
| count

12.97 - extract_all()

Lean how to use the extract_all() to extract all matches for a regular expression from a source string.

Get all matches for a regular expression from a source string. Optionally, retrieve a subset of matching groups.

print extract_all(@"(\d+)", "a set of numbers: 123, 567 and 789") // results with the dynamic array ["123", "567", "789"]

Syntax

extract_all(regex, [captureGroups,] source)

Parameters

NameTypeRequiredDescription
regexstring✔️A regular expression containing between one and 16 capture groups.
captureGroupsdynamicAn array that indicates the capture groups to extract. Valid values are from 1 to the number of capturing groups in the regular expression. Named capture groups are allowed as well. See examples.
sourcestring✔️The string to search.

Returns

  • If regex finds a match in source: Returns dynamic array including all matches against the indicated capture groups captureGroups, or all of capturing groups in the regex.
  • If number of captureGroups is 1: The returned array has a single dimension of matched values.
  • If number of captureGroups is more than 1: The returned array is a two-dimensional collection of multi-value matches per captureGroups selection, or all capture groups present in the regex if captureGroups is omitted.
  • If there’s no match: null.

Examples

Extract a single capture group

The following query returns hex-byte representation (two hex-digits) of the GUID.

print Id="82b8be2d-dfa7-4bd1-8f63-24ad26d31449"
| extend guid_bytes = extract_all(@"([\da-f]{2})", Id) 

Output

IDguid_bytes
82b8be2d-dfa7-4bd1-8f63-24ad26d31449[“82”,“b8”,“be”,“2d”,“df”,“a7”,“4b”,“d1”,“8f”,“63”,“24”,“ad”,“26”,“d3”,“14”,“49”]

Extract several capture groups

The following query uses a regular expression with three capturing groups to split each GUID part into first letter, last letter, and whatever is in the middle.

print Id="82b8be2d-dfa7-4bd1-8f63-24ad26d31449"
| extend guid_bytes = extract_all(@"(\w)(\w+)(\w)", Id)

Output

IDguid_bytes
82b8be2d-dfa7-4bd1-8f63-24ad26d31449[[“8”,“2b8be2”,“d”],[“d”,“fa”,“7”],[“4”,“bd”,“1”],[“8”,“f6”,“3”],[“2”,“4ad26d3144”,“9”]]

Extract a subset of capture groups

The following query selects a subset of capturing groups.

The regular expression matches the first letter, last letter, and all the rest.

The captureGroups parameter is used to select only the first and the last parts.

print Id="82b8be2d-dfa7-4bd1-8f63-24ad26d31449"
| extend guid_bytes = extract_all(@"(\w)(\w+)(\w)", dynamic([1,3]), Id) 

Output

IDguid_bytes
82b8be2d-dfa7-4bd1-8f63-24ad26d31449[[“8”,“d”],[“d”,“7”],[“4”,“1”],[“8”,“3”],[“2”,“9”]]

Using named capture groups

The captureGroups in the following query uses both capture group indexes and named capture group references to fetch matching values.

print Id="82b8be2d-dfa7-4bd1-8f63-24ad26d31449"
| extend guid_bytes = extract_all(@"(?P<first>\w)(?P<middle>\w+)(?P<last>\w)", dynamic(['first',2,'last']), Id) 

Output

IDguid_bytes
82b8be2d-dfa7-4bd1-8f63-24ad26d31449[[“8”,“2b8be2”,“d”],[“d”,“fa”,“7”],[“4”,“bd”,“1”],[“8”,“f6”,“3”],[“2”,“4ad26d3144”,“9”]]

12.98 - extract_json()

Learn how to use the extract_json() function to get a specified element out of a JSON text using a path expression.

Get a specified element out of a JSON text using a path expression.

Optionally convert the extracted string to a specific type.

Syntax

extract_json(jsonPath, dataSource, type)

Parameters

NameTypeRequiredDescription
jsonPathstring✔️A JSONPath that defines an accessor into the JSON document.
dataSourcestring✔️A JSON document.
typestringAn optional type literal. If provided, the extracted value is converted to this type. For example, typeof(long) will convert the extracted value to a long.

Performance tips

  • Apply where-clauses before using extract_json().
  • Consider using a regular expression match with extract instead. This can run very much faster, and is effective if the JSON is produced from a template.
  • Use parse_json() if you need to extract more than one value from the JSON.
  • Consider having the JSON parsed at ingestion by declaring the type of the column to be dynamic.

Returns

This function performs a JSONPath query into dataSource, which contains a valid JSON string, optionally converting that value to another type depending on the third argument.

Example

let json = '{"name": "John", "age": 30, "city": "New York"}';
print extract_json("$.name", json, typeof(string));

Output

print_0
John

12.99 - extract()

Learn how to use the extract() function to get a match for a regular expression from a source string.

Get a match for a regular expression from a source string.

Optionally, convert the extracted substring to the indicated type.

Syntax

extract(regex, captureGroup, source [, typeLiteral])

Parameters

NameTypeRequiredDescription
regexstring✔️A regular expression.
captureGroupint✔️The capture group to extract. 0 stands for the entire match, 1 for the value matched by the first ‘(‘parenthesis’)’ in the regular expression, and 2 or more for subsequent parentheses.
sourcestring✔️The string to search.
typeLiteralstringIf provided, the extracted substring is converted to this type. For example, typeof(long).

Returns

If regex finds a match in source: the substring matched against the indicated capture group captureGroup, optionally converted to typeLiteral.

If there’s no match, or the type conversion fails: null.

Examples

Extract month from datetime string

The following query extracts the month from the string Dates and returns a table with the date string and the month.

let Dates = datatable(DateString: string)
[
    "15-12-2024",
    "21-07-2023",
    "10-03-2022"
];
Dates
| extend Month = extract(@"-(\d{2})-", 1, DateString, typeof(int))
| project DateString, Month

Output

DateStringMonth
15-12-202412
21-07-20237
10-03-20223

Extract username from a string

The following example returns the username from the string. The regular expression ([^,]+) matches the text following “User: " up to the next comma, effectively extracting the username.

let Text = "User: JohnDoe, Email: johndoe@example.com, Age: 29";
print UserName = extract("User: ([^,]+)", 1, Text)

Output

UserName
JohnDoe

12.100 - format_bytes()

Learn how to use the format_bytes() function to format a number as a string representing the data size in bytes.

Formats a number as a string representing data size in bytes.

Syntax

format_bytes(size [, precision [, units]])

Parameters

NameTypeRequiredDescription
sizereal✔️The value to be formatted as data size in bytes.
precisionintThe number of digits the value will be rounded to after the decimal point. The default is 0.
unitsstringThe units of the target data size: Bytes, KB, MB, GB, TB, PB, or EB. If this parameter is empty, the units will be auto-selected based on input value.

Returns

A string of size formatted as data size in bytes.

Examples

print 
v1 = format_bytes(564),
v2 = format_bytes(10332, 1),
v3 = format_bytes(20010332),
v4 = format_bytes(20010332, 2),
v5 = format_bytes(20010332, 0, "KB")

Output

v1v2v3v4v5
564 Bytes10.1 KB19 MB19.08 MB19541 KB

12.101 - format_datetime()

Learn how to use the format_datetime() function to format a datetime according to the provided format.

Formats a datetime according to the provided format.

Syntax

format_datetime(date , format)

Parameters

NameTypeRequiredDescription
datedatetime✔️The value to format.
formatstring✔️The output format comprised of one or more of the supported format elements.

Supported format elements

The format parameter should include one or more of the following elements:

Format specifierDescriptionExamples
dThe day of the month, from 1 through 31.2009-06-01T13:45:30 -> 1, 2009-06-15T13:45:30 -> 15
ddThe day of the month, from 01 through 31.2009-06-01T13:45:30 -> 01, 2009-06-15T13:45:30 -> 15
fThe tenths of a second in a date and time value.2009-06-15T13:45:30.6170000 -> 6, 2009-06-15T13:45:30.05 -> 0
ffThe hundredths of a second in a date and time value.2009-06-15T13:45:30.6170000 -> 61, 2009-06-15T13:45:30.0050000 -> 00
fffThe milliseconds in a date and time value.6/15/2009 13:45:30.617 -> 617, 6/15/2009 13:45:30.0005 -> 000
ffffThe ten thousandths of a second in a date and time value.2009-06-15T13:45:30.6175000 -> 6175, 2009-06-15T13:45:30.0000500 -> 0000
fffffThe hundred thousandths of a second in a date and time value.2009-06-15T13:45:30.6175400 -> 61754, 2009-06-15T13:45:30.000005 -> 00000
ffffffThe millionths of a second in a date and time value.2009-06-15T13:45:30.6175420 -> 617542, 2009-06-15T13:45:30.0000005 -> 000000
fffffffThe ten millionths of a second in a date and time value.2009-06-15T13:45:30.6175425 -> 6175425, 2009-06-15T13:45:30.0001150 -> 0001150
FIf non-zero, the tenths of a second in a date and time value.2009-06-15T13:45:30.6170000 -> 6, 2009-06-15T13:45:30.0500000 -> (no output)
FFIf non-zero, the hundredths of a second in a date and time value.2009-06-15T13:45:30.6170000 -> 61, 2009-06-15T13:45:30.0050000 -> (no output)
FFFIf non-zero, the milliseconds in a date and time value.2009-06-15T13:45:30.6170000 -> 617, 2009-06-15T13:45:30.0005000 -> (no output)
FFFFIf non-zero, the ten thousandths of a second in a date and time value.2009-06-15T13:45:30.5275000 -> 5275, 2009-06-15T13:45:30.0000500 -> (no output)
FFFFFIf non-zero, the hundred thousandths of a second in a date and time value.2009-06-15T13:45:30.6175400 -> 61754, 2009-06-15T13:45:30.0000050 -> (no output)
FFFFFFIf non-zero, the millionths of a second in a date and time value.2009-06-15T13:45:30.6175420 -> 617542, 2009-06-15T13:45:30.0000005 -> (no output)
FFFFFFFIf non-zero, the ten millionths of a second in a date and time value.2009-06-15T13:45:30.6175425 -> 6175425, 2009-06-15T13:45:30.0001150 -> 000115
hThe hour, using a 12-hour clock from 1 to 12.2009-06-15T01:45:30 -> 1, 2009-06-15T13:45:30 -> 1
hhThe hour, using a 12-hour clock from 01 to 12.2009-06-15T01:45:30 -> 01, 2009-06-15T13:45:30 -> 01
HThe hour, using a 24-hour clock from 0 to 23.2009-06-15T01:45:30 -> 1, 2009-06-15T13:45:30 -> 13
HHThe hour, using a 24-hour clock from 00 to 23.2009-06-15T01:45:30 -> 01, 2009-06-15T13:45:30 -> 13
mThe minute, from 0 through 59.2009-06-15T01:09:30 -> 9, 2009-06-15T13:29:30 -> 29
mmThe minute, from 00 through 59.2009-06-15T01:09:30 -> 09, 2009-06-15T01:45:30 -> 45
MThe month, from 1 through 12.2009-06-15T13:45:30 -> 6
MMThe month, from 01 through 12.2009-06-15T13:45:30 -> 06
sThe second, from 0 through 59.2009-06-15T13:45:09 -> 9
ssThe second, from 00 through 59.2009-06-15T13:45:09 -> 09
yThe year, from 0 to 99.0001-01-01T00:00:00 -> 1, 0900-01-01T00:00:00 -> 0, 1900-01-01T00:00:00 -> 0, 2009-06-15T13:45:30 -> 9, 2019-06-15T13:45:30 -> 19
yyThe year, from 00 to 99.0001-01-01T00:00:00 -> 01, 0900-01-01T00:00:00 -> 00, 1900-01-01T00:00:00 -> 00, 2019-06-15T13:45:30 -> 19
yyyyThe year as a four-digit number.0001-01-01T00:00:00 -> 0001, 0900-01-01T00:00:00 -> 0900, 1900-01-01T00:00:00 -> 1900, 2009-06-15T13:45:30 -> 2009
ttAM / PM hours2009-06-15T13:45:09 -> PM

Supported delimiters

The format specifier can include the following delimiters:

DelimiterComment
' 'Space
'/'
'-'Dash
':'
','
'.'
'_'
'['
']'

Returns

A string with date formatted as specified by format.

Examples

The following three examples return differently formatted datetimes.

let dt = datetime(2017-01-29 09:00:05);
print 
v1=format_datetime(dt,'yy-MM-dd [HH:mm:ss]')

Output

v1
17-01-29 [09:00:05]
let dt = datetime(2017-01-29 09:00:05);
print 
v2=format_datetime(dt, 'yyyy-M-dd [H:mm:ss]')

Output

v2
2017-1-29 [9:00:05]
let dt = datetime(2017-01-29 09:00:05);
print 
v3=format_datetime(dt, 'yy-MM-dd [hh:mm:ss tt]')

Output

v3
17-01-29 [09:00:05 AM]

12.102 - format_ipv4_mask()

Learn how to use the format_ipv4_mask() function to parse the input with a netmask and return a string representing the IPv4 address in CIDR notation.

Parses the input with a netmask and returns a string representing the IPv4 address in CIDR notation.

Syntax

format_ipv4_mask(ip [, prefix])

Parameters

NameTypeRequiredDescription
ipstring✔️The IPv4 address as CIDR notation. The format may be a string or number representation in big-endian order.
prefixintAn integer from 0 to 32 representing the number of most-significant bits that are taken into account. If unspecified, all 32 bit-masks are used.

Returns

If conversion is successful, the result will be a string representing IPv4 address as CIDR notation. If conversion isn’t successful, the result will be an empty string.

Examples

datatable(address:string, mask:long)
[
 '192.168.1.1', 24,          
 '192.168.1.1', 32,          
 '192.168.1.1/24', 32,       
 '192.168.1.1/24', long(-1), 
]
| extend result = format_ipv4(address, mask), 
         result_mask = format_ipv4_mask(address, mask)

Output

addressmaskresultresult_mask
192.168.1.124192.168.1.0192.168.1.0/24
192.168.1.132192.168.1.1192.168.1.1/32
192.168.1.1/2432192.168.1.0192.168.1.0/24
192.168.1.1/24-1

12.103 - format_ipv4()

Learn how to use the format_ipv4() function to parse the input with a netmask and return a string representing the IPv4 address.

Parses the input with a netmask and returns a string representing the IPv4 address.

Syntax

format_ipv4(ip [, prefix])

Parameters

NameTypeRequiredDescription
ipstring✔️The IPv4 address. The format may be a string or number representation in big-endian order.
prefixintAn integer from 0 to 32 representing the number of most-significant bits that are taken into account. If unspecified, all 32 bit-masks are used.

Returns

If conversion is successful, the result will be a string representing IPv4 address. If conversion isn’t successful, the result will be an empty string.

Examples

datatable(address:string, mask:long)
[
 '192.168.1.1', 24,          
 '192.168.1.1', 32,          
 '192.168.1.1/24', 32,       
 '192.168.1.1/24', long(-1), 
]
| extend result = format_ipv4(address, mask), 
         result_mask = format_ipv4_mask(address, mask)

Output

addressmaskresultresult_mask
192.168.1.124192.168.1.0192.168.1.0/24
192.168.1.132192.168.1.1192.168.1.1/32
192.168.1.1/2432192.168.1.0192.168.1.0/24
192.168.1.1/24-1

12.104 - format_timespan()

Learn how to use the format_timespan() function to format a timespan according to the provided format.

Formats a timespan according to the provided format.

Syntax

format_timespan(timespan , format)

Parameters

NameTypeRequiredDescription
timespantimespan✔️The value to format.
formatstring✔️The output format comprised of one or more of the supported format elements.

Supported format elements

Format specifierDescriptionExamples
d-ddddddddThe number of whole days in the time interval. Padded with zeros if needed.15.13:45:30: d -> 15, dd -> 15, ddd -> 015
fThe tenths of a second in the time interval.15.13:45:30.6170000 -> 6, 15.13:45:30.05 -> 0
ffThe hundredths of a second in the time interval.15.13:45:30.6170000 -> 61, 15.13:45:30.0050000 -> 00
fffThe milliseconds in the time interval.6/15/2009 13:45:30.617 -> 617, 6/15/2009 13:45:30.0005 -> 000
ffffThe ten thousandths of a second in the time interval.15.13:45:30.6175000 -> 6175, 15.13:45:30.0000500 -> 0000
fffffThe hundred thousandths of a second in the time interval.15.13:45:30.6175400 -> 61754, 15.13:45:30.000005 -> 00000
ffffffThe millionths of a second in the time interval.15.13:45:30.6175420 -> 617542, 15.13:45:30.0000005 -> 000000
fffffffThe ten millionths of a second in the time interval.15.13:45:30.6175425 -> 6175425, 15.13:45:30.0001150 -> 0001150
FIf non-zero, the tenths of a second in the time interval.15.13:45:30.6170000 -> 6, 15.13:45:30.0500000 -> (no output)
FFIf non-zero, the hundredths of a second in the time interval.15.13:45:30.6170000 -> 61, 15.13:45:30.0050000 -> (no output)
FFFIf non-zero, the milliseconds in the time interval.15.13:45:30.6170000 -> 617, 15.13:45:30.0005000 -> (no output)
FFFFIf non-zero, the ten thousandths of a second in the time interval.15.13:45:30.5275000 -> 5275, 15.13:45:30.0000500 -> (no output)
FFFFFIf non-zero, the hundred thousandths of a second in the time interval.15.13:45:30.6175400 -> 61754, 15.13:45:30.0000050 -> (no output)
FFFFFFIf non-zero, the millionths of a second in the time interval.15.13:45:30.6175420 -> 617542, 15.13:45:30.0000005 -> (no output)
FFFFFFFIf non-zero, the ten millionths of a second in the time interval.15.13:45:30.6175425 -> 6175425, 15.13:45:30.0001150 -> 000115
HThe hour, using a 24-hour clock from 0 to 23.15.01:45:30 -> 1, 15.13:45:30 -> 13
HHThe hour, using a 24-hour clock from 00 to 23.15.01:45:30 -> 01, 15.13:45:30 -> 13
mThe number of whole minutes in the time interval that aren’t included as part of hours or days. Single-digit minutes don’t have a leading zero.15.01:09:30 -> 9, 15.13:29:30 -> 29
mmThe number of whole minutes in the time interval that aren’t included as part of hours or days. Single-digit minutes have a leading zero.15.01:09:30 -> 09, 15.01:45:30 -> 45
sThe number of whole seconds in the time interval that aren’t included as part of hours, days, or minutes. Single-digit seconds don’t have a leading zero.15.13:45:09 -> 9
ssThe number of whole seconds in the time interval that aren’t included as part of hours, days, or minutes. Single-digit seconds have a leading zero.15.13:45:09 -> 09

Supported delimiters

The format specifier can include following delimiters:

DelimiterComment
' 'Space
'/'
'-'Dash
':'
','
'.'
'_'
'['
']'

Returns

A string with timespan formatted as specified by format.

Examples

let t = time(29.09:00:05.12345);
print 
v1=format_timespan(t, 'dd.hh:mm:ss:FF'),
v2=format_timespan(t, 'ddd.h:mm:ss [fffffff]')

Output

v1v2
29.09:00:05:12029.9:00:05 [1234500]

12.105 - gamma()

Learn how to use the gamma() function to compute the gamma of the input parameter.

Computes the gamma function for the provided number.

Syntax

gamma(number)

Parameters

NameTypeRequiredDescription
numberreal✔️The number used to calculate the gamma function.

Returns

Gamma function of number.

For computing log-gamma function, see loggamma().

12.106 - geo_info_from_ip_address()

Learn how to use the geo_info_from_ip_address() function to retrieve geolocation information about IPv4 or IPv6 addresses.

Retrieves geolocation information about IPv4 or IPv6 addresses.

Syntax

geo_info_from_ip_address(IpAddress )

Parameters

NameTypeRequiredDescription
IpAddressstring✔️IPv4 or IPv6 address to retrieve geolocation information about.

Returns

A dynamic object containing the information on IP address whereabouts (if the information is available). The object contains the following fields:

NameTypeDescription
countrystringCountry name
statestringState (subdivision) name
citystringCity name
latituderealLatitude coordinate
longituderealLongitude coordinate

Examples

print ip_location=geo_info_from_ip_address('20.53.203.50')

Output

ip_location
{"country": "Australia", "state": "New South Wales", "city": "Sydney", "latitude": -33.8715, "longitude": 151.2006}
print ip_location=geo_info_from_ip_address('2a03:2880:f12c:83:face:b00c::25de')

Output

ip_location
{"country": "United States", "state": "Florida", "city": "Boca Raton", "latitude": 26.3594, "longitude": -80.0771}

12.107 - gettype()

Learn how to use the gettype() function to return a string representing the runtime type of its single argument.

Returns the runtime type of its single argument.

The runtime type may be different than the nominal (static) type for expressions whose nominal type is dynamic; in such cases gettype() can be useful to reveal the type of the actual value (how the value is encoded in memory).

Syntax

gettype(value)

Parameters

NameTypeRequiredDescription
valuescalar✔️The value for which to find the type.

Returns

A string representing the runtime type of value.

Examples

ExpressionReturns
gettype("a")string
gettype(111)long
gettype(1==1)bool
gettype(now())datetime
gettype(1s)timespan
gettype(parse_json('1'))int
gettype(parse_json(' "abc" '))string
gettype(parse_json(' {"abc":1} '))dictionary
gettype(parse_json(' [1, 2, 3] '))array
gettype(123.45)real
gettype(guid(12e8b78d-55b4-46ae-b068-26d7a0080254))guid
gettype(parse_json(''))null

12.108 - getyear()

Learn how tow use the getyear() function to return the year of the datetime input.

Returns the year part of the datetime argument.

Syntax

getyear(date)

Parameters

NameTypeRequiredDescription
datedatetime✔️The date for which to get the year.

Returns

The year that contains the given date.

Example

print year = getyear(datetime(2015-10-12))
year
2015

12.109 - gzip_compress_to_base64_string

Learn how to use the gzip_compress_to_base64_string() function to gzip-compress an input and encode it into a base64 string.

Performs gzip compression and encodes the result to base64.

Syntax

gzip_compress_to_base64_string(string)

Parameters

NameTypeRequiredDescription
stringstring✔️The value to be compressed and base64 encoded. The function accepts only one argument.

Returns

  • Returns a string that represents gzip-compressed and base64-encoded original string.
  • Returns an empty result if compression or encoding failed.

Example

print res = gzip_compress_to_base64_string("1234567890qwertyuiop")
res
H4sIAAAAAAAA/wEUAOv/MTIzNDU2Nzg5MHF3ZXJ0eXVpb3A6m7f2FAAAAA==

12.110 - gzip_decompress_from_base64_string()

Learn how to use the gzip_decompress_from_base64_string() function to decode an input string from base64 and perform a gzip-decompression.

Decodes the input string from base64 and performs gzip decompression.

Syntax

gzip_decompress_from_base64_string(string)

Parameters

NameTypeRequiredDescription
stringstring✔️The value that was compressed with gzip and then base64-encoded. The function accepts only one argument.

Returns

  • Returns a UTF-8 string that represents the original string.
  • Returns an empty result if decompression or decoding failed.
    • For example, invalid gzip-compressed and base 64-encoded strings will return an empty output.

Examples

Valid input

print res=gzip_decompress_from_base64_string("H4sIAAAAAAAA/wEUAOv/MTIzNDU2Nzg5MHF3ZXJ0eXVpb3A6m7f2FAAAAA==")
res
“1234567890qwertyuiop”

Invalid input

print res=gzip_decompress_from_base64_string("x0x0x0")
res

12.111 - has_any_ipv4_prefix()

Learn how to use the has_any_ipv4_prefix() function to check if any IPv4 address prefixes appear in the text.

Returns a boolean value indicating whether one of specified IPv4 address prefixes appears in a text.

IP address entrances in a text must be properly delimited with non-alphanumeric characters. For example, properly delimited IP addresses are:

  • “These requests came from: 192.168.1.1, 10.1.1.115 and 10.1.1.201”
  • “05:04:54 127.0.0.1 GET /favicon.ico 404”

Performance tips

Syntax

has_any_ipv4_prefix(source , ip_address_prefix [, ip_address_prefix_2, …] )

Parameters

NameTypeRequiredDescription
sourcestring✔️The value to search.
ip_address_prefixstring or dynamic✔️An IP address prefix, or an array of IP address prefixes, for which to search. A valid IP address prefix is either a complete IPv4 address, such as 192.168.1.11, or its prefix ending with a dot, such as 192., 192.168. or 192.168.1..

Returns

true if the one of specified IP address prefixes is a valid IPv4 address prefix, and it was found in source. Otherwise, the function returns false.

Examples

IP addresses as list of strings

print result=has_any_ipv4_prefix('05:04:54 127.0.0.1 GET /favicon.ico 404', '127.0.', '192.168.') // true
result
true

IP addresses as dynamic array

print result=has_any_ipv4_prefix('05:04:54 127.0.0.1 GET /favicon.ico 404', dynamic(["127.0.", "192.168."]))
result
true

Invalid IPv4 prefix

print result=has_any_ipv4_prefix('05:04:54 127.0.0.1 GET /favicon.ico 404', '127.0')
result
false

Improperly deliminated IP address

print result=has_any_ipv4_prefix('05:04:54127.0.0.1 GET /favicon.ico 404', '127.0.', '192.')
result
false

12.112 - has_any_ipv4()

Learn how to use the has_any_ipv4() function to check if any IPv4 addresses appear in the text.

Returns a value indicating whether one of specified IPv4 addresses appears in a text.

IP address entrances in a text must be properly delimited with non-alphanumeric characters. For example, properly delimited IP addresses are:

  • “These requests came from: 192.168.1.1, 10.1.1.115 and 10.1.1.201”
  • “05:04:54 127.0.0.1 GET /favicon.ico 404”

Performance tips

Syntax

has_any_ipv4(source , ip_address [, ip_address_2, …] )

Parameters

NameTypeRequiredDescription
sourcestring✔️The value to search.
ip_addressstring or dynamic✔️An IP address, or an array of IP addresses, for which to search.

Returns

true if one of specified IP addresses is a valid IPv4 address, and it was found in source. Otherwise, the function returns false.

Examples

IP addresses as list of strings

print result=has_any_ipv4('05:04:54 127.0.0.1 GET /favicon.ico 404', '127.0.0.1', '127.0.0.2')
result
true

IP addresses as dynamic array

print result=has_any_ipv4('05:04:54 127.0.0.1 GET /favicon.ico 404', dynamic(['127.0.0.1', '127.0.0.2']))
result
true

Invalid IPv4 address

print result=has_any_ipv4('05:04:54 127.0.0.256 GET /favicon.ico 404', dynamic(["127.0.0.256", "192.168.1.1"]))
result
false

Improperly deliminated IP address

print result=has_any_ipv4('05:04:54127.0.0.1 GET /favicon.ico 404', '127.0.0.1', '192.168.1.1') // false, improperly delimited IP address
result
false

12.113 - has_ipv4_prefix()

Learn how to use the has_ipv4_prefix() function to check if a specified IPv4 address prefix appears in the text.

Returns a value indicating whether a specified IPv4 address prefix appears in a text.

A valid IP address prefix is either a complete IPv4 address (192.168.1.11) or its prefix ending with a dot (192., 192.168. or 192.168.1.).

IP address entrances in a text must be properly delimited with nonalphanumeric characters. For example, properly delimited IP addresses are:

  • “These requests came from: 192.168.1.1, 10.1.1.115 and 10.1.1.201”
  • “05:04:54 127.0.0.1 GET /favicon.ico 404”

Syntax

has_ipv4_prefix(source , ip_address_prefix )

Parameters

NameTypeRequiredDescription
sourcestring✔️The text to search.
ip_address_prefixstring✔️The IP address prefix for which to search.

Returns

true if the ip_address_prefix is a valid IPv4 address prefix, and it was found in source. Otherwise, the function returns false.

Examples

Properly formatted IPv4 prefix

print result=has_ipv4_prefix('05:04:54 127.0.0.1 GET /favicon.ico 404', '127.0.')
result
true

Invalid IPv4 prefix

print result=has_ipv4_prefix('05:04:54 127.0.0.1 GET /favicon.ico 404', '127.0')
result
false

Invalid IPv4 address

print result=has_ipv4_prefix('05:04:54 127.0.0.256 GET /favicon.ico 404', '127.0.')
result
false

Improperly delimited IPv4 address

print result=has_ipv4_prefix('05:04:54127.0.0.1 GET /favicon.ico 404', '127.0.')
result
false

12.114 - has_ipv4()

Learn how to use the has_ipv4() function to check if a specified IPv4 address appears in the text.

Returns a value indicating whether a specified IPv4 address appears in a text.

IP address entrances in a text must be properly delimited with non-alphanumeric characters. For example, properly delimited IP addresses are:

  • “These requests came from: 192.168.1.1, 10.1.1.115 and 10.1.1.201”
  • “05:04:54 127.0.0.1 GET /favicon.ico 404”

Syntax

has_ipv4(source , ip_address )

Parameters

NameTypeRequiredDescription
sourcestring✔️The text to search.
ip_addressstring✔️The value containing the IP address for which to search.

Returns

true if the ip_address is a valid IPv4 address, and it was found in source. Otherwise, the function returns false.

Examples

Properly formatted IP address

print result=has_ipv4('05:04:54 127.0.0.1 GET /favicon.ico 404', '127.0.0.1')

Output

result
true

Invalid IP address

print result=has_ipv4('05:04:54 127.0.0.256 GET /favicon.ico 404', '127.0.0.256')

Output

result
false

Improperly delimited IP

print result=has_ipv4('05:04:54127.0.0.1 GET /favicon.ico 404', '127.0.0.1')

Output

result
false

12.115 - hash_combine()

learn how to use the hash_combine() function to combine hash values of two or more hashes.

Combines hash values of two or more hashes.

Syntax

hash_combine(h1 , h2 [, h3 …])

Parameters

NameTypeRequiredDescription
h1, h2, … hNlong✔️The hash values to combine.

Returns

The combined hash value of the given scalars.

Examples

print value1 = "Hello", value2 = "World"
| extend h1 = hash(value1), h2=hash(value2)
| extend combined = hash_combine(h1, h2)

Output

value1value2h1h2combined
HelloWorld7536944136985306281846988464401551951-1440138333540407281

12.116 - hash_many()

Learn how to use the hash_many() function to return a combined hash value of multiple values.

Returns a combined hash value of multiple values.

Syntax

hash_many(s1 , s2 [, s3 …])

Parameters

NameTypeRequiredDescription
s1, s2, …, sNscalar✔️The values to hash together.

Returns

The hash() function is applied to each of the specified scalars. The resulting hashes are combined into a single hash and returned.

Examples

print value1 = "Hello", value2 = "World"
| extend combined = hash_many(value1, value2)

Output

value1value2combined
HelloWorld-1440138333540407281

12.117 - hash_md5()

Learn how to use the hash_md5() function to return the MD5 hash value of the input.

Returns an MD5 hash value of the input.

Syntax

hash_md5(source)

Parameters

NameTypeRequiredDescription
sourcescalar✔️The value to be hashed.

Returns

The MD5 hash value of the given scalar, encoded as a hex string (a string of characters, each two of which represent a single Hex number between 0 and 255).

Examples

print 
h1=hash_md5("World"),
h2=hash_md5(datetime(2020-01-01))

Output

h1h2
f5a7924e621e84c9280a9a27e1bcb7f6786c530672d1f8db31fee25ea8a9390b

The following example uses the hash_md5() function to aggregate StormEvents based on State’s MD5 hash value.

StormEvents
| summarize StormCount = count() by State, StateHash=hash_md5(State)
| top 5 by StormCount

Output

StateStateHashStormCount
TEXAS3b00dbe6e07e7485a1c12d36c8e9910a4701
KANSASe1338d0ac8be43846cf9ae967bd02e7f3166
IOWA6d4a7c02942f093576149db764d4e2d22337
ILLINOIS8c00d9e0b3fcd55aed5657e42cc40cf12022
MISSOURI2d82f0c963c0763012b2539d469e50082016

12.118 - hash_sha1()

Learn how to use the hash_sha1() function to return a sha1 hash value of the source input.

Returns a sha1 hash value of the source input.

Syntax

hash_sha1(source)

Parameters

NameTypeRequiredDescription
sourcescalar✔️The value to be hashed.

Returns

The sha1 hash value of the given scalar, encoded as a hex string (a string of characters, each two of which represent a single Hex number between 0 and 255).

Examples

print 
    h1=hash_sha1("World"),
    h2=hash_sha1(datetime(2020-01-01))

Output

h1h2
70c07ec18ef89c5309bbb0937f3a6342411e1fdde903e533f4d636b4fc0dcf3cf81e7b7f330de776

The following example uses the hash_sha1() function to aggregate StormEvents based on State’s SHA1 hash value.

StormEvents 
| summarize StormCount = count() by State, StateHash=hash_sha1(State)
| top 5 by StormCount desc

Output

StateStateHashStormCount
TEXAS3128d805194d4e6141766cc846778eeacb12e3ea4701
KANSASea926e17098148921e472b1a760cd5a8117e84d63166
IOWAcacf86ec119cfd5b574bde5b59604774de3273db2337
ILLINOIS03740763b16dae9d799097f51623fe635d8c48522022
MISSOURI26d938907240121b54d9e039473dacc96e712f612016

12.119 - hash_sha256()

Learn how to use the hash_sha256() function to return a sha256 hash value of the source input.

Returns a sha256 hash value of the source input.

Syntax

hash_sha256(source)

Parameters

NameTypeRequiredDescription
sourcescalar✔️The value to be hashed.

Returns

The sha256 hash value of the given scalar, encoded as a hex string (a string of characters, each two of which represent a single Hex number between 0 and 255).

Examples

print 
    h1=hash_sha256("World"),
    h2=hash_sha256(datetime(2020-01-01))

Output

h1h2
78ae647dc5544d227130a0682a51e30bc7777fbb6d8a8f17007463a3ecd1d524ba666752dc1a20eb750b0eb64e780cc4c968bc9fb8813461c1d7e750f302d71d

The following example uses the hash_sha256() function to aggregate StormEvents based on State’s SHA256 hash value.

StormEvents 
| summarize StormCount = count() by State, StateHash=hash_sha256(State)
| top 5 by StormCount desc

Output

StateStateHashStormCount
TEXAS9087f20f23f91b5a77e8406846117049029e6798ebbd0d38aea68da73a00ca374701
KANSASc80e328393541a3181b258cdb4da4d00587c5045e8cf3bb6c8fdb7016b69cc2e3166
IOWAf85893dca466f779410f65cd904fdc4622de49e119ad4e7c7e4a291ceed1820b2337
ILLINOISae3eeabfd7eba3d9a4ccbfed6a9b8cff269dc43255906476282e0184cf81b7fd2022
MISSOURId15dfc28abc3ee73b7d1f664a35980167ca96f6f90e034db2a6525c0b8ba61b12016

12.120 - hash_xxhash64()

Learn how to use the hash_xxhash64() function to return the xxhash64 value of the input.

Returns an xxhash64 value for the input value.

Syntax

hash_xxhash64(source [, mod])

Parameters

NameTypeRequiredDescription
sourcescalar✔️The value to be hashed.
modintA modulo value to be applied to the hash result, so that the output value is between 0 and mod - 1. This parameter is useful for limiting the range of possible output values or for compressing the output of the hash function into a smaller range.

Returns

The hash value of source. If mod is specified, the function returns the hash value modulo the value of mod, meaning that the output of the function will be the remainder of the hash value divided by mod. The output will be a value between 0 and mod - 1, inclusive.

Examples

String input

print result=hash_xxhash64("World")
result
1846988464401551951

String input with mod

print result=hash_xxhash64("World", 100)
result
51

Datetime input

print result=hash_xxhash64(datetime("2015-01-01"))
result
1380966698541616202

12.121 - hash()

Learn how to use the hash() function to return the hash value of the input.

Returns a hash value for the input value.

Syntax

hash(source [, mod])

Parameters

NameTypeRequiredDescription
sourcescalar✔️The value to be hashed.
modintA modulo value to be applied to the hash result, so that the output value is between 0 and mod - 1. This parameter is useful for limiting the range of possible output values or for compressing the output of the hash function into a smaller range.

Returns

The hash value of source. If mod is specified, the function returns the hash value modulo the value of mod, meaning that the output of the function will be the remainder of the hash value divided by mod. The output will be a value between 0 and mod - 1, inclusive.

Examples

String input

print result=hash("World")
result
1846988464401551951

String input with mod

print result=hash("World", 100)
result
51

Datetime input

print result=hash(datetime("2015-01-01"))
result
1380966698541616202

Use hash to check data distribution

Use the hash() function for sampling data if the values in one of its columns is uniformly distributed. In the following example, StartTime values are uniformly distributed and the function is used to run a query on 10% of the data.

StormEvents 
| where hash(StartTime, 10) == 0
| summarize StormCount = count(), TypeOfStorms = dcount(EventType) by State 
| top 5 by StormCount desc

12.122 - hll_merge()

Learn how to use the hll_merge() function toe merge HLL results.

Merges HLL results. This is the scalar version of the aggregate version hll_merge().

Read about the underlying algorithm (HyperLogLog) and estimation accuracy.

Syntax

hll_merge( hll, hll2, [ hll3, … ])

Parameters

NameTypeRequiredDescription
hll, hll2, …string✔️The column names containing HLL values to merge. The function expects between 2-64 arguments.

Returns

Returns one HLL value. The value is the result of merging the columns hll, hll2, … hllN.

Examples

This example shows the value of the merged columns.

range x from 1 to 10 step 1 
| extend y = x + 10
| summarize hll_x = hll(x), hll_y = hll(y)
| project merged = hll_merge(hll_x, hll_y)
| project dcount_hll(merged)

Output

dcount_hll_merged
20

Estimation accuracy

12.123 - hourofday()

Learn how to use the hourofday() function to return an integer representing the hour of the given date.

Returns the integer number representing the hour number of the given date.

Syntax

hourofday(date)

Parameters

NameTypeRequiredDescription
datedatetime✔️The date for which to return the hour number.

Returns

An integer between 0-23 representing the hour number of the day for date.

Example

print hour=hourofday(datetime(2015-12-14 18:54))
hour
18

12.124 - iff()

This article describes iff().

Returns the :::no-loc text=“then”::: value when the :::no-loc text=“if”::: condition evaluates to true, otherwise it returns the :::no-loc text=“else”::: value.

Syntax

iff(:::no-loc text=“if”:::, :::no-loc text=“then”:::, :::no-loc text=“else”:::)

Parameters

NameTypeRequiredDescription
:::no-loc text=“if”:::string✔️An expression that evaluates to a boolean value.
:::no-loc text=“then”:::scalar✔️An expression that returns its value when the :::no-loc text=“if”::: condition evaluates to true.
:::no-loc text=“else”:::scalar✔️An expression that returns its value when the :::no-loc text=“if”::: condition evaluates to false.

Returns

This function returns the :::no-loc text=“then”::: value when the :::no-loc text=“if”::: condition evaluates to true, otherwise it returns the :::no-loc text=“else”::: value.

Examples

Classify data using iff()

The following query uses the iff() function to categorize storm events as either “Rain event” or “Not rain event” based on their event type, and then projects the state, event ID, event type, and the new rain category.

StormEvents
| extend Rain = iff((EventType in ("Heavy Rain", "Flash Flood", "Flood")), "Rain event", "Not rain event")
| project State, EventId, EventType, Rain

Output

The following table shows only the first five rows.

StateEventIdEventTypeRain
ATLANTIC SOUTH61032WaterspoutNot rain event
FLORIDA60904Heavy RainRain event
FLORIDA60913TornadoNot rain event
GEORGIA64588Thunderstorm WindNot rain event
MISSISSIPPI68796Thunderstorm WindNot rain event

Combine iff() with other functions

The following query calculates the total damage from crops and property, categorizes the severity of storm events based on total damage, direct injuries, and direct deaths, and then summarizes the total number of events and the number of events by severity.

StormEvents
| extend TotalDamage = DamageCrops + DamageProperty
| extend Severity = iff(TotalDamage > 1000000 or InjuriesDirect > 10 or DeathsDirect > 0, "High", iff(TotalDamage < 50000 and InjuriesDirect == 0 and DeathsDirect == 0, "Low", "Moderate"))
| summarize TotalEvents = count(), SeverityEvents = count() by Severity

Output

SeverityTotalEvents
Low54805
High977
Moderate3284

12.125 - indexof_regex()

Learn how to use the indexof_regex() function to return the zero-based index position of a regex input.

Returns the zero-based index of the first occurrence of a specified lookup regular expression within the input string.

See indexof().

Syntax

indexof_regex(string,match[,start[,length[,occurrence]]])

Parameters

NameTypeRequiredDescription
stringstring✔️The source string to search.
matchstring✔️The regular expression lookup string.
startintThe search start position. A negative value will offset the starting search position from the end of the string by this many steps: abs(start).
lengthintThe number of character positions to examine. A value of -1 means unlimited length.
occurrenceintThe number of the occurrence. The default is 1.

Returns

The zero-based index position of match.

  • Returns -1 if match isn’t found in string.
  • Returns null if:
    • start is less than 0.
    • occurrence is less than 0.
    • length is less than -1.

Examples

print
    idx1 = indexof_regex("abcabc", @"a.c"), // lookup found in input string
    idx2 = indexof_regex("abcabcdefg", @"a.c", 0, 9, 2),  // lookup found in input string
    idx3 = indexof_regex("abcabc", @"a.c", 1, -1, 2),  // there's no second occurrence in the search range
    idx4 = indexof_regex("ababaa", @"a.a", 0, -1, 2), // Matches don't overlap so full lookup can't be found 
    idx5 = indexof_regex("abcabc", @"a|ab", -1)  // invalid start argument

Output

idx1idx2idx3idx4idx5
03-1-1

12.126 - indexof()

Learn how to use the indexof() function to report the zero-based index position of the input string.

Reports the zero-based index of the first occurrence of a specified string within the input string.

For more information, see indexof_regex().

Syntax

indexof(string,match[,start[,length[,occurrence]]])

Parameters

NameTypeRequiredDescription
stringstring✔️The source string to search.
matchstring✔️The string for which to search.
startintThe search start position. A negative value will offset the starting search position from the end of the string by this many steps: abs(start).
lengthintThe number of character positions to examine. A value of -1 means unlimited length.
occurrenceintThe number of the occurrence. The default is 1.

Returns

The zero-based index position of match.

  • Returns -1 if match isn’t found in string.
  • Returns null if:
    • start is less than 0.
    • occurrence is less than 0.
    • length is less than -1.

Examples

print
 idx1 = indexof("abcdefg","cde")    // lookup found in input string
 , idx2 = indexof("abcdefg","cde",1,4) // lookup found in researched range 
 , idx3 = indexof("abcdefg","cde",1,2) // search starts from index 1, but stops after 2 chars, so full lookup can't be found
 , idx4 = indexof("abcdefg","cde",3,4) // search starts after occurrence of lookup
 , idx5 = indexof("abcdefg","cde",-5)  // negative start index
 , idx6 = indexof(1234567,5,1,4)       // two first parameters were forcibly casted to strings "12345" and "5"
 , idx7 = indexof("abcdefg","cde",2,-1)  // lookup found in input string
 , idx8 = indexof("abcdefgabcdefg", "cde", 1, 10, 2)   // lookup found in input range
 , idx9 = indexof("abcdefgabcdefg", "cde", 1, -1, 3)   // the third occurrence of lookup is not in researched range

Output

idx1idx2idx3idx4idx5idx6idx7idx8idx9
22-1-12429-1

12.127 - ingestion_time()

Learn how to use the ingestion_time() function to return the approximate time of the data’s ingestion.

Returns the approximate datetime in UTC format indicating when the current record was ingested.

This function must be used in the context of a table or a materialized view. Otherwise, this function produces null values.

If IngestionTime policy was not enabled when the data was ingested, the function returns null values.

Retrieves the datetime when the record was ingested and ready for query.

Syntax

ingestion_time()

Returns

A datetime value specifying the approximate time of ingestion into a table.

Example

T
| extend ingestionTime = ingestion_time() | top 10 by ingestionTime

12.128 - ipv4_compare()

Learn how to use the ipv4_compare() function to compare two IPv4 strings.

Compares two IPv4 strings. The two IPv4 strings are parsed and compared while accounting for the combined IP-prefix mask calculated from argument prefixes, and the optional PrefixMask argument.

Syntax

ipv4_compare(Expr1,Expr2[ ,PrefixMask])

Parameters

NameTypeRequiredDescription
Expr1, Expr2string✔️A string expression representing an IPv4 address. IPv4 strings can be masked using IP-prefix notation.
PrefixMaskintAn integer from 0 to 32 representing the number of most-significant bits that are taken into account.

Returns

  • 0: If the long representation of the first IPv4 string argument is equal to the second IPv4 string argument
  • 1: If the long representation of the first IPv4 string argument is greater than the second IPv4 string argument
  • -1: If the long representation of the first IPv4 string argument is less than the second IPv4 string argument
  • null: If conversion for one of the two IPv4 strings wasn’t successful.

Examples: IPv4 comparison equality cases

Compare IPs using the IP-prefix notation specified inside the IPv4 strings

datatable(ip1_string:string, ip2_string:string)
[
 '192.168.1.0',    '192.168.1.0',       // Equal IPs
 '192.168.1.1/24', '192.168.1.255',     // 24 bit IP-prefix is used for comparison
 '192.168.1.1',    '192.168.1.255/24',  // 24 bit IP-prefix is used for comparison
 '192.168.1.1/30', '192.168.1.255/24',  // 24 bit IP-prefix is used for comparison
]
| extend result = ipv4_compare(ip1_string, ip2_string)

Output

ip1_stringip2_stringresult
192.168.1.0192.168.1.00
192.168.1.1/24192.168.1.2550
192.168.1.1192.168.1.255/240
192.168.1.1/30192.168.1.255/240

Compare IPs using IP-prefix notation specified inside the IPv4 strings and as additional argument of the ipv4_compare() function

datatable(ip1_string:string, ip2_string:string, prefix:long)
[
 '192.168.1.1',    '192.168.1.0',   31, // 31 bit IP-prefix is used for comparison
 '192.168.1.1/24', '192.168.1.255', 31, // 24 bit IP-prefix is used for comparison
 '192.168.1.1',    '192.168.1.255', 24, // 24 bit IP-prefix is used for comparison
]
| extend result = ipv4_compare(ip1_string, ip2_string, prefix)

Output

ip1_stringip2_stringprefixresult
192.168.1.1192.168.1.0310
192.168.1.1/24192.168.1.255310
192.168.1.1192.168.1.255240

12.129 - ipv4_is_in_any_range()

Learn how to use the ipv4_is_in_any_range() function to check if the IPv4 string address is in any of the IPv4 address ranges.

Checks whether IPv4 string address is in any of the specified IPv4 address ranges.

Performance tips

Syntax

ipv4_is_in_any_range(Ipv4Address , Ipv4Range [ , Ipv4Range …] )

ipv4_is_in_any_range(Ipv4Address , Ipv4Ranges )

Parameters

NameTypeRequiredDescription
Ipv4Addressstring✔️An expression representing an IPv4 address.
Ipv4Rangestring✔️An IPv4 range or list of IPv4 ranges written with IP-prefix notation.
Ipv4Rangesdynamic✔️A dynamic array containing IPv4 ranges written with IP-prefix notation.

Returns

  • true: If the IPv4 address is in the range of any of the specified IPv4 networks.
  • false: Otherwise.
  • null: If conversion for one of the two IPv4 strings wasn’t successful.

Examples

Syntax using list of strings

print Result=ipv4_is_in_any_range('192.168.1.6', '192.168.1.1/24', '10.0.0.1/8', '127.1.0.1/16')

Output

Result
true

Syntax using dynamic array

print Result=ipv4_is_in_any_range("127.0.0.1", dynamic(["127.0.0.1", "192.168.1.1"]))

Output

Result
true

Extend table with IPv4 range check

let LocalNetworks=dynamic([
    "192.168.1.1/16",
    "127.0.0.1/8",
    "10.0.0.1/8"
]);
let IPs=datatable(IP:string) [
    "10.1.2.3",
    "192.168.1.5",
    "123.1.11.21",
    "1.1.1.1"
];
IPs
| extend IsLocal=ipv4_is_in_any_range(IP, LocalNetworks)

Output

IPIsLocal
10.1.2.3true
192.168.1.5true
123.1.11.21false
1.1.1.1false

12.130 - ipv4_is_in_range()

Learn how to use the ipv4_is_in_range() function to check if the IPv4 string address is in the IPv4-prefix notation range.

Checks if IPv4 string address is in IPv4-prefix notation range.

Syntax

ipv4_is_in_range(Ipv4Address,Ipv4Range)

Parameters

NameTypeRequiredDescription
Ipv4Addressstring✔️An expression representing an IPv4 address.
Ipv4Rangestring✔️An IPv4 range or list of IPv4 ranges written with IP-prefix notation.

Returns

  • true: If the long representation of the first IPv4 string argument is in range of the second IPv4 string argument.
  • false: Otherwise.
  • null: If conversion for one of the two IPv4 strings wasn’t successful.

Example

datatable(ip_address:string, ip_range:string)
[
 '192.168.1.1',    '192.168.1.1',       // Equal IPs
 '192.168.1.1',    '192.168.1.255/24',  // 24 bit IP-prefix is used for comparison
]
| extend result = ipv4_is_in_range(ip_address, ip_range)

Output

ip_addressip_rangeresult
192.168.1.1192.168.1.1true
192.168.1.1192.168.1.255/24true

12.131 - ipv4_is_match()

Learn how to use the ipv4_is_match() function to match two IPv4 strings.

Matches two IPv4 strings. The two IPv4 strings are parsed and compared while accounting for the combined IP-prefix mask calculated from argument prefixes, and the optional prefix argument.

Syntax

ipv4_is_match(ip1,ip2[ ,prefix])

Parameters

NameTypeRequiredDescription
ip1, ip2string✔️An expression representing an IPv4 address. IPv4 strings can be masked using IP-prefix notation.
prefixintAn integer from 0 to 32 representing the number of most-significant bits that are taken into account.

Returns

  • true: If the long representation of the first IPv4 string argument is equal to the second IPv4 string argument.
  • false: Otherwise.
  • null: If conversion for one of the two IPv4 strings wasn’t successful.

Examples

Simple example

print ipv4_is_match('192.168.1.1/24', '192.168.1.255')

Output

print_0
true

IPv4 comparison equality - IP-prefix notation specified inside the IPv4 strings

datatable(ip1_string:string, ip2_string:string)
[
 '192.168.1.0',    '192.168.1.0',       // Equal IPs
 '192.168.1.1/24', '192.168.1.255',     // 24 bit IP-prefix is used for comparison
 '192.168.1.1',    '192.168.1.255/24',  // 24 bit IP-prefix is used for comparison
 '192.168.1.1/30', '192.168.1.255/24',  // 24 bit IP-prefix is used for comparison
]
| extend result = ipv4_is_match(ip1_string, ip2_string)

Output

ip1_stringip2_stringresult
192.168.1.0192.168.1.0true
192.168.1.1/24192.168.1.255true
192.168.1.1192.168.1.255/24true
192.168.1.1/30192.168.1.255/24true

IPv4 comparison equality - IP-prefix notation specified inside the IPv4 strings and an additional argument of the ipv4_is_match() function

datatable(ip1_string:string, ip2_string:string, prefix:long)
[
 '192.168.1.1',    '192.168.1.0',   31, // 31 bit IP-prefix is used for comparison
 '192.168.1.1/24', '192.168.1.255', 31, // 24 bit IP-prefix is used for comparison
 '192.168.1.1',    '192.168.1.255', 24, // 24 bit IP-prefix is used for comparison
]
| extend result = ipv4_is_match(ip1_string, ip2_string, prefix)

Output

ip1_stringip2_stringprefixresult
192.168.1.1192.168.1.031true
192.168.1.1/24192.168.1.25531true
192.168.1.1192.168.1.25524true

12.132 - ipv4_is_private()

Learn how to use the ipv4_is_private() function to check if the IPv4 string address belongs to a set of private network IPs.

Checks if the IPv4 string address belongs to a set of private network IPs.

Private network addresses were originally defined to help delay IPv4 address exhaustion. IP packets originating from or addressed to a private IP address can’t be routed through the public internet.

Private IPv4 addresses

The Internet Engineering Task Force (IETF) has directed the Internet Assigned Numbers Authority (IANA) to reserve the following IPv4 address ranges for private networks:

IP address rangeNumber of addressesLargest CIDR block (subnet mask)
10.0.0.0 – 10.255.255.2551677721610.0.0.0/8 (255.0.0.0)
172.16.0.0 – 172.31.255.2551048576172.16.0.0/12 (255.240.0.0)
192.168.0.0 – 192.168.255.25565536192.168.0.0/16 (255.255.0.0)
ipv4_is_private('192.168.1.1/24') == true
ipv4_is_private('10.1.2.3/24') == true
ipv4_is_private('202.1.2.3') == false
ipv4_is_private("127.0.0.1") == false

Syntax

ipv4_is_private(ip)

Parameters

NameTypeRequiredDescription
ipstring✔️An expression representing an IPv4 address. IPv4 strings can be masked using IP-prefix notation.

Returns

  • true: If the IPv4 address belongs to any of the private network ranges.
  • false: Otherwise.
  • null: If parsing of the input as IPv4 address string wasn’t successful.

Example: Check if IPv4 belongs to a private network

datatable(ip_string:string)
[
 '10.1.2.3',
 '192.168.1.1/24',
 '127.0.0.1',
]
| extend result = ipv4_is_private(ip_string)

Output

ip_stringresult
10.1.2.3true
192.168.1.1/24true
127.0.0.1false

12.133 - ipv4_netmask_suffix()

Learn how to use the ipv4_netmask_suffix() function to return the value of the IPv4 netmask suffix from an IPv4 string address.

Returns the value of the IPv4 netmask suffix from an IPv4 string address.

Syntax

ipv4_netmask_suffix(ip)

Parameters

NameTypeRequiredDescription
ipstring✔️An expression representing an IPv4 address. IPv4 strings can be masked using IP-prefix notation.

Returns

  • The value of the netmask suffix the IPv4 address. If the suffix isn’t present in the input, a value of 32 (full netmask suffix) is returned.
  • null: If parsing the input as an IPv4 address string wasn’t successful.

Example: Resolve IPv4 mask suffix

datatable(ip_string:string)
[
 '10.1.2.3',
 '192.168.1.1/24',
 '127.0.0.1/16',
]
| extend cidr_suffix = ipv4_netmask_suffix(ip_string)

Output

ip_stringcidr_suffix
10.1.2.332
192.168.1.1/2424
127.0.0.1/1616

12.134 - ipv4_range_to_cidr_list()

Learn how to use the ipv4_range_to_cidr_list() function to convert IPv4 address range to a list of CIDR ranges.

Converts a IPv4 address range denoted by starting and ending IPv4 addresses to a list of IPv4 ranges in CIDR notation.

Syntax

ipv4_range_to_cidr_list(StartAddress , EndAddress )

Parameters

NameTypeRequiredDescription
StartAddressstring✔️An expression representing a starting IPv4 address of the range.
EndAddressstring✔️An expression representing an ending IPv4 address of the range.

Returns

A dynamic array object containing the list of ranges in CIDR notation.

Examples

print start_IP="1.1.128.0", end_IP="1.1.140.255"
 | project ipv4_range_list = ipv4_range_to_cidr_list(start_IP, end_IP)

Output

ipv4_range_list
["1.1.128.0/21", "1.1.136.0/22","1.1.140.0/24"]

12.135 - ipv6_compare()

Learn how to use the ipv6_compare() function to compare two IPv6 or IPv4 network address strings.

Compares two IPv6 or IPv4 network address strings. The two IPv6 strings are parsed and compared while accounting for the combined IP-prefix mask calculated from argument prefixes, and the optional prefix argument.

Syntax

ipv6_compare(ip1,ip2[ ,prefix])

Parameters

NameTypeRequiredDescription
ip1, ip2string✔️An expression representing an IPv6 or IPv4 address. IPv6 and IPv4 strings can be masked using IP-prefix notation.
prefixintAn integer from 0 to 128 representing the number of most significant bits that are taken into account.

Returns

  • 0: If the long representation of the first IPv6 string argument is equal to the second IPv6 string argument.
  • 1: If the long representation of the first IPv6 string argument is greater than the second IPv6 string argument.
  • -1: If the long representation of the first IPv6 string argument is less than the second IPv6 string argument.
  • null: If conversion for one of the two IPv6 strings wasn’t successful.

Examples: IPv6/IPv4 comparison equality cases

Compare IPs using the IP-prefix notation specified inside the IPv6/IPv4 strings

datatable(ip1_string:string, ip2_string:string)
[
 // IPv4 are compared as IPv6 addresses
 '192.168.1.1',    '192.168.1.1',       // Equal IPs
 '192.168.1.1/24', '192.168.1.255',     // 24 bit IP4-prefix is used for comparison
 '192.168.1.1',    '192.168.1.255/24',  // 24 bit IP4-prefix is used for comparison
 '192.168.1.1/30', '192.168.1.255/24',  // 24 bit IP4-prefix is used for comparison
  // IPv6 cases
 'fe80::85d:e82c:9446:7994', 'fe80::85d:e82c:9446:7994',         // Equal IPs
 'fe80::85d:e82c:9446:7994/120', 'fe80::85d:e82c:9446:7998',     // 120 bit IP6-prefix is used for comparison
 'fe80::85d:e82c:9446:7994', 'fe80::85d:e82c:9446:7998/120',     // 120 bit IP6-prefix is used for comparison
 'fe80::85d:e82c:9446:7994/120', 'fe80::85d:e82c:9446:7998/120', // 120 bit IP6-prefix is used for comparison
 // Mixed case of IPv4 and IPv6
 '192.168.1.1',      '::ffff:c0a8:0101', // Equal IPs
 '192.168.1.1/24',   '::ffff:c0a8:01ff', // 24 bit IP-prefix is used for comparison
 '::ffff:c0a8:0101', '192.168.1.255/24', // 24 bit IP-prefix is used for comparison
 '::192.168.1.1/30', '192.168.1.255/24', // 24 bit IP-prefix is used for comparison
]
| extend result = ipv6_compare(ip1_string, ip2_string)

Output

ip1_stringip2_stringresult
192.168.1.1192.168.1.10
192.168.1.1/24192.168.1.2550
192.168.1.1192.168.1.255/240
192.168.1.1/30192.168.1.255/240
fe80::85d:e82c:9446:7994fe80::85d:e82c:9446:79940
fe80::85d:e82c:9446:7994/120fe80::85d:e82c:9446:79980
fe80::85d:e82c:9446:7994fe80::85d:e82c:9446:7998/1200
fe80::85d:e82c:9446:7994/120fe80::85d:e82c:9446:7998/1200
192.168.1.1::ffff:c0a8:01010
192.168.1.1/24::ffff:c0a8:01ff0
::ffff:c0a8:0101192.168.1.255/240
::192.168.1.1/30192.168.1.255/240

Compare IPs using IP-prefix notation specified inside the IPv6/IPv4 strings and as additional argument of the ipv6_compare() function

datatable(ip1_string:string, ip2_string:string, prefix:long)
[
 // IPv4 are compared as IPv6 addresses 
 '192.168.1.1',    '192.168.1.0',   31, // 31 bit IP4-prefix is used for comparison
 '192.168.1.1/24', '192.168.1.255', 31, // 24 bit IP4-prefix is used for comparison
 '192.168.1.1',    '192.168.1.255', 24, // 24 bit IP4-prefix is used for comparison
   // IPv6 cases
 'fe80::85d:e82c:9446:7994', 'fe80::85d:e82c:9446:7995',     127, // 127 bit IP6-prefix is used for comparison
 'fe80::85d:e82c:9446:7994/127', 'fe80::85d:e82c:9446:7998', 120, // 120 bit IP6-prefix is used for comparison
 'fe80::85d:e82c:9446:7994/120', 'fe80::85d:e82c:9446:7998', 127, // 120 bit IP6-prefix is used for comparison
 // Mixed case of IPv4 and IPv6
 '192.168.1.1/24',   '::ffff:c0a8:01ff', 127, // 127 bit IP6-prefix is used for comparison
 '::ffff:c0a8:0101', '192.168.1.255',    120, // 120 bit IP6-prefix is used for comparison
 '::192.168.1.1/30', '192.168.1.255/24', 127, // 120 bit IP6-prefix is used for comparison
]
| extend result = ipv6_compare(ip1_string, ip2_string, prefix)

Output

ip1_stringip2_stringprefixresult
192.168.1.1192.168.1.0310
192.168.1.1/24192.168.1.255310
192.168.1.1192.168.1.255240
fe80::85d:e82c:9446:7994fe80::85d:e82c:9446:79951270
fe80::85d:e82c:9446:7994/127fe80::85d:e82c:9446:79981200
fe80::85d:e82c:9446:7994/120fe80::85d:e82c:9446:79981270
192.168.1.1/24::ffff:c0a8:01ff1270
::ffff:c0a8:0101192.168.1.2551200
::192.168.1.1/30192.168.1.255/241270

12.136 - ipv6_is_in_any_range()

Learn how to use the ipv6_is_in_any_range function to check if an IPv6 string address is in any of the IPv6 address ranges.

Checks whether an IPv6 string address is in any of the specified IPv6 address ranges.

Performance tips

Syntax

ipv6_is_in_any_range(Ipv6Address , Ipv6Range [ , Ipv6Range …] )

ipv6_is_in_any_range(Ipv6Address , Ipv6Ranges )

Parameters

NameTypeRequiredDescription
Ipv6Addressstring✔️An expression representing an IPv6 address.
Ipv6Rangestring✔️An expression representing an IPv6 range using IP-prefix notation.
Ipv6Rangesdynamic✔️An array containing IPv6 ranges using IP-prefix notation.

Returns

  • true: If the IPv6 address is in the range of any of the specified IPv6 networks.
  • false: Otherwise.
  • null: If conversion for one of the two IPv6 strings wasn’t successful.

Example

let LocalNetworks=dynamic([
    "a5e:f127:8a9d:146d:e102:b5d3:c755:f6cd/112",
    "0:0:0:0:0:ffff:c0a8:ac/60"
]);
let IPs=datatable(IP:string) [
    "a5e:f127:8a9d:146d:e102:b5d3:c755:abcd",
    "a5e:f127:8a9d:146d:e102:b5d3:c755:abce",
    "a5e:f127:8a9d:146d:e102:b5d3:c755:abcf",
    "a5e:f127:8a9d:146d:e102:b5d3:c756:abd1",
];
IPs
| extend IsLocal=ipv6_is_in_any_range(IP, LocalNetworks)

Output

IPIsLocal
a5e:f127:8a9d:146d:e102:b5d3:c755:abcdTrue
a5e:f127:8a9d:146d:e102:b5d3:c755:abceTrue
a5e:f127:8a9d:146d:e102:b5d3:c755:abcfTrue
a5e:f127:8a9d:146d:e102:b5d3:c756:abd1False

12.137 - ipv6_is_in_range()

Learn how to use the ipv6_is_in_range() function to check if an IPv6 string address is in the Ipv6-prefix notation range.

Checks if an IPv6 string address is in the IPv6-prefix notation range.

Syntax

ipv6_is_in_range(Ipv6Address,Ipv6Range)

Parameters

NameTypeRequiredDescription
Ipv6Addressstring✔️An expression representing an IPv6 address.
Ipv6Rangestring✔️An expression representing an IPv6 range using IP-prefix notation.

Returns

  • true: If the long representation of the first IPv6 string argument is in range of the second IPv6 string argument.
  • false: Otherwise.
  • null: If conversion for one of the two IPv6 strings wasn’t successful.

Example

datatable(ip_address:string, ip_range:string)
[
 'a5e:f127:8a9d:146d:e102:b5d3:c755:abcd',    'a5e:f127:8a9d:146d:e102:b5d3:c755:0000/112',
 'a5e:f127:8a9d:146d:e102:b5d3:c755:abcd',    'a5e:f127:8a9d:146d:e102:b5d3:c755:abcd',
 'a5e:f127:8a9d:146d:e102:b5d3:c755:abcd',    '0:0:0:0:0:ffff:c0a8:ac/60',
]
| extend result = ipv6_is_in_range(ip_address, ip_range)

Output

ip_addressip_rangeresult
a5e:f127:8a9d:146d:e102:b5d3:c755:abcda5e:f127:8a9d:146d:e102:b5d3:c755:0000/112True
a5e:f127:8a9d:146d:e102:b5d3:c755:abcda5e:f127:8a9d:146d:e102:b5d3:c755:abcdTrue
a5e:f127:8a9d:146d:e102:b5d3:c755:abcd0:0:0:0:0:ffff:c0a8:ac/60False

12.138 - ipv6_is_match()

Learn how to use the ipv6_is_match() function to match two IPv6 or IPv4 network address strings.

Matches two IPv6 or IPv4 network address strings. The two IPv6/IPv4 strings are parsed and compared while accounting for the combined IP-prefix mask calculated from argument prefixes, and the optional prefix argument.

Syntax

ipv6_is_match(ip1,ip2[ ,prefix])

Parameters

NameTypeRequiredDescription
ip1, ip2string✔️An expression representing an IPv6 or IPv4 address. IPv6 and IPv4 strings can be masked using IP-prefix notation.
prefixintAn integer from 0 to 128 representing the number of most-significant bits that are taken into account.

Returns

  • true: If the long representation of the first IPv6/IPv4 string argument is equal to the second IPv6/IPv4 string argument.
  • false: Otherwise.
  • null: If conversion for one of the two IPv6/IPv4 strings wasn’t successful.

Examples

IPv6/IPv4 comparison equality case - IP-prefix notation specified inside the IPv6/IPv4 strings

datatable(ip1_string:string, ip2_string:string)
[
 // IPv4 are compared as IPv6 addresses
 '192.168.1.1',    '192.168.1.1',       // Equal IPs
 '192.168.1.1/24', '192.168.1.255',     // 24 bit IP4-prefix is used for comparison
 '192.168.1.1',    '192.168.1.255/24',  // 24 bit IP4-prefix is used for comparison
 '192.168.1.1/30', '192.168.1.255/24',  // 24 bit IP4-prefix is used for comparison
  // IPv6 cases
 'fe80::85d:e82c:9446:7994', 'fe80::85d:e82c:9446:7994',         // Equal IPs
 'fe80::85d:e82c:9446:7994/120', 'fe80::85d:e82c:9446:7998',     // 120 bit IP6-prefix is used for comparison
 'fe80::85d:e82c:9446:7994', 'fe80::85d:e82c:9446:7998/120',     // 120 bit IP6-prefix is used for comparison
 'fe80::85d:e82c:9446:7994/120', 'fe80::85d:e82c:9446:7998/120', // 120 bit IP6-prefix is used for comparison
 // Mixed case of IPv4 and IPv6
 '192.168.1.1',      '::ffff:c0a8:0101', // Equal IPs
 '192.168.1.1/24',   '::ffff:c0a8:01ff', // 24 bit IP-prefix is used for comparison
 '::ffff:c0a8:0101', '192.168.1.255/24', // 24 bit IP-prefix is used for comparison
 '::192.168.1.1/30', '192.168.1.255/24', // 24 bit IP-prefix is used for comparison
]
| extend result = ipv6_is_match(ip1_string, ip2_string)

Output

ip1_stringip2_stringresult
192.168.1.1192.168.1.11
192.168.1.1/24192.168.1.2551
192.168.1.1192.168.1.255/241
192.168.1.1/30192.168.1.255/241
fe80::85d:e82c:9446:7994fe80::85d:e82c:9446:79941
fe80::85d:e82c:9446:7994/120fe80::85d:e82c:9446:79981
fe80::85d:e82c:9446:7994fe80::85d:e82c:9446:7998/1201
fe80::85d:e82c:9446:7994/120fe80::85d:e82c:9446:7998/1201
192.168.1.1::ffff:c0a8:01011
192.168.1.1/24::ffff:c0a8:01ff1
::ffff:c0a8:0101192.168.1.255/241
::192.168.1.1/30192.168.1.255/241

IPv6/IPv4 comparison equality case- IP-prefix notation specified inside the IPv6/IPv4 strings and as additional argument of the ipv6_is_match() function

datatable(ip1_string:string, ip2_string:string, prefix:long)
[
 // IPv4 are compared as IPv6 addresses 
 '192.168.1.1',    '192.168.1.0',   31, // 31 bit IP4-prefix is used for comparison
 '192.168.1.1/24', '192.168.1.255', 31, // 24 bit IP4-prefix is used for comparison
 '192.168.1.1',    '192.168.1.255', 24, // 24 bit IP4-prefix is used for comparison
   // IPv6 cases
 'fe80::85d:e82c:9446:7994', 'fe80::85d:e82c:9446:7995',     127, // 127 bit IP6-prefix is used for comparison
 'fe80::85d:e82c:9446:7994/127', 'fe80::85d:e82c:9446:7998', 120, // 120 bit IP6-prefix is used for comparison
 'fe80::85d:e82c:9446:7994/120', 'fe80::85d:e82c:9446:7998', 127, // 120 bit IP6-prefix is used for comparison
 // Mixed case of IPv4 and IPv6
 '192.168.1.1/24',   '::ffff:c0a8:01ff', 127, // 127 bit IP6-prefix is used for comparison
 '::ffff:c0a8:0101', '192.168.1.255',    120, // 120 bit IP6-prefix is used for comparison
 '::192.168.1.1/30', '192.168.1.255/24', 127, // 120 bit IP6-prefix is used for comparison
]
| extend result = ipv6_is_match(ip1_string, ip2_string, prefix)

Output

ip1_stringip2_stringprefixresult
192.168.1.1192.168.1.0311
192.168.1.1/24192.168.1.255311
192.168.1.1192.168.1.255241
fe80::85d:e82c:9446:7994fe80::85d:e82c:9446:79951271
fe80::85d:e82c:9446:7994/127fe80::85d:e82c:9446:79981201
fe80::85d:e82c:9446:7994/120fe80::85d:e82c:9446:79981271
192.168.1.1/24::ffff:c0a8:01ff1271
::ffff:c0a8:0101192.168.1.2551201
::192.168.1.1/30192.168.1.255/241271

12.139 - isascii()

Learn how to use the isascii() to check if the argument is a valid ascii string.

Returns true if the argument is a valid ASCII string.

Syntax

isascii(value)

Parameters

NameTypeRequiredDescription
valuestring✔️The value to check if a valid ASCII string.

Returns

A boolean value indicating whether value is a valid ASCII string.

Example

print result=isascii("some string")

Output

result
true

12.140 - isempty()

Learn how to use the isempty() function to check if the argument is an empty string.

Returns true if the argument is an empty string or is null.

Syntax

isempty(value)

Parameters

NameTypeRequiredDescription
valuestring✔️The value to check if empty or null.

Returns

A boolean value indicating whether value is an empty string or is null.

Example

xisempty(x)
""true
“x”false
parsejson("")true
parsejson("[]")false
parsejson("{}")false

12.141 - isfinite()

Learn how to use the isfinite() function to check if the input is a finite value.

Returns whether the input is a finite value, meaning it’s not infinite or NaN.

Syntax

isfinite(number)

Parameters

NameTypeRequiredDescription
numberreal✔️The value to check if finite.

Returns

true if x is finite and false otherwise.

Example

range x from -1 to 1 step 1
| extend y = 0.0
| extend div = 1.0*x/y
| extend isfinite=isfinite(div)

Output

xydivisfinite
-10-∞0
00NaN0
100
  • To check if a value is null, see isnull().
  • To check if a value is infinite, see isinf().
  • To check if a value is NaN (Not-a-Number), see isnan().

12.142 - isinf()

Learn how to use the isinf() function to check if the input is an infinite value.

Returns whether the input is an infinite (positive or negative) value.

Syntax

isinf(number)

Parameters

NameTypeRequiredDescription
numberreal✔️The value to check if infinite.

Returns

true if x is a positive or negative infinite and false otherwise.

Example

range x from -1 to 1 step 1
| extend y = 0.0
| extend div = 1.0*x/y
| extend isinf=isinf(div)

Output

xydivisinf
-10-∞true
00NaNfalse
10true
  • To check if a value is null, see isnull().
  • To check if a value is finite, see isfinite().
  • To check if a value is NaN (Not-a-Number), see isnan().

12.143 - isnan()

Learn how to use the isnan() function to check if the input is a not-a-number (NaN) value.

Returns whether the input is a Not-a-Number (NaN) value.

Syntax

isnan(number)

Parameters

NameTypeRequiredDescription
numberscalar✔️The value to check if NaN.

Returns

true if x is NaN and false otherwise.

Example

range x from -1 to 1 step 1
| extend y = (-1*x) 
| extend div = 1.0*x/y
| extend isnan=isnan(div)

Output

xydivisnan
-11-1false
00NaNtrue
1-1-1false
  • To check if a value is null, see isnull().
  • To check if a value is finite, see isfinite().
  • To check if a value is infinite, see isinf().

12.144 - isnotempty()

Learn how to use the isnotempty() function to check if the argument isn’t an empty string.

Returns true if the argument isn’t an empty string, and it isn’t null.

Syntax

isnotempty(value)

Parameters

NameTypeRequiredDescription
valuescalar✔️The value to check if not empty or null.

Returns

true if value isn’t null and false otherwise.

Example

Find the storm events for which there’s a begin location.

StormEvents
| where isnotempty(BeginLat) and isnotempty(BeginLon)

12.145 - isnotnull()

Learn how to use the isnotnull() function to check if the argument isn’t null.

Returns true if the argument isn’t null.

Syntax

isnotnull(value)

Parameters

NameTypeRequiredDescription
valuescalar✔️The value to check if not null.

Returns

true if value isn’t null and false otherwise.

Example

Find the storm events for which there’s a begin location.

StormEvents
| where isnotnull(BeginLat) and isnotnull(BeginLon)

12.146 - isnull()

Learn how to use the isnull() function to check if the argument value is null.

Evaluates an expression and returns a Boolean result indicating whether the value is null.

Syntax

isnull(Expr)

Parameters

NameTypeRequiredDescription
Exprscalar✔️The expression to evaluate whether the value is null. The expression can be any scalar value other than strings, arrays, or objects that always return false. For more information, see The dynamic data type.

Returns

Returns true if the value is null and false otherwise. Empty strings, arrays, property bags, and objects always return false.

The following table lists return values for different expressions (x):

xisnull(x)
""false
"x"false
parse_json("")true
parse_json("[]")false
parse_json("{}")false

Example

Find the storm events for which there’s no begin location.

StormEvents
| where isnull(BeginLat) and isnull(BeginLon)
| project StartTime, EndTime, EpisodeId, EventId, State, EventType, BeginLat, BeginLon

Output

StartTimeEndTimeEpisodeIdEventIdStateEventTypeBeginLatBeginLon
2007-01-01T00:00:00Z2007-01-01T05:00:00Z417123358WISCONSINWinter Storm
2007-01-01T00:00:00Z2007-01-31T23:59:00Z14927067MINNESOTADrought
2007-01-01T00:00:00Z2007-01-31T23:59:00Z14927068MINNESOTADrought
2007-01-01T00:00:00Z2007-01-31T23:59:00Z14927069MINNESOTADrought
2007-01-01T00:00:00Z2007-01-31T23:59:00Z14927065MINNESOTADrought
2007-01-01T00:00:00Z2007-01-31T23:59:00Z14927070MINNESOTADrought
2007-01-01T00:00:00Z2007-01-31T23:59:00Z14927071MINNESOTADrought
2007-01-01T00:00:00Z2007-01-31T23:59:00Z14927072MINNESOTADrought
2007-01-01T00:00:00Z2007-01-31T23:59:00Z238011735MINNESOTADrought
2007-01-01T00:00:00Z2007-01-31T23:59:00Z14927073MINNESOTADrought
2007-01-01T00:00:00Z2007-01-31T23:59:00Z224010857TEXASDrought
2007-01-01T00:00:00Z2007-01-31T23:59:00Z224010858TEXASDrought
2007-01-01T00:00:00Z2007-01-31T23:59:00Z14927066MINNESOTADrought

12.147 - isutf8()

Learn how to use the isutf8() function to check if the argument is a valid utf8 string.

Returns true if the argument is a valid UTF8 string.

Syntax

isutf8(value)

Parameters

NameTypeRequiredDescription
valuestring✔️The value to check if a valid UTF8 string.

Returns

A boolean value indicating whether value is a valid UTF8 string.

Example

print result=isutf8("some string")

12.148 - jaccard_index()

Learn how to use the jaccard_index() function to calculate the Jaccard index of two input sets.

Calculates the Jaccard index of two input sets.

Syntax

jaccard_index(set1, set2)

Parameters

NameTypeRequiredDescription
set1dynamic✔️The array representing the first set for the calculation.
set2dynamic✔️The array representing the second set for the calculation.

Returns

The Jaccard index of the two input sets. The Jaccard index formula is |set1set2| / |set1set2|.

Examples

print set1=dynamic([1,2,3]), set2=dynamic([1,2,3,4])
| extend jaccard=jaccard_index(set1, set2)

Output

set1set2jaccard
[1,2,3][1,2,3,4]0.75

12.149 - log()

Learn how to use the log() function to return the natural logarithm of the input.

The natural logarithm is the base-e logarithm: the inverse of the natural exponential function (exp).

Syntax

log(number)

Parameters

NameTypeRequiredDescription
numberreal✔️The number for which to calculate the logarithm.

Returns

  • log() returns the natural logarithm of the input.
  • null if the argument is negative or null or can’t be converted to a real value.

Example

print result=log(5)

Output

result
1.6094379124341003
  • For common (base-10) logarithms, see log10().
  • For base-2 logarithms, see log2().

12.150 - log10()

Learn how to use the log10() function to return the common (base-10) logarithm of the input.

log10() returns the common (base-10) logarithm of the input.

Syntax

log10(number)

Parameters

NameTypeRequiredDescription
numberreal✔️The number for which to calculate the base-10 logarithm.

Returns

  • The common logarithm is the base-10 logarithm: the inverse of the exponential function (exp) with base 10.
  • null if the argument is negative or null or can’t be converted to a real value.

Example

print result=log10(5)

Output

result
0.69897000433601886
  • For natural (base-e) logarithms, see log().
  • For base-2 logarithms, see log2()

12.151 - log2()

Learn how to use the log2() function to return the base-2 logarithm of the input.

The logarithm is the base-2 logarithm: the inverse of the exponential function (exp) with base 2.

Syntax

log2(number)

Parameters

NameTypeRequiredDescription
numberreal✔️The number for which to calculate the base-2 logarithm.

Returns

  • The logarithm is the base-2 logarithm: the inverse of the exponential function (exp) with base 2.
  • null if the argument is negative or null or can’t be converted to a real value.

Example

print result=log2(5)

Output

result
2.3219280948873622
  • For natural (base-e) logarithms, see log().
  • For common (base-10) logarithms, see log10().

12.152 - loggamma()

Learn how to use the loggamma() function to compute the log of the absolute value of the gamma function.

Computes log of the absolute value of the gamma function

Syntax

loggamma(number)

Parameters

NameTypeRequiredDescription
numberreal✔️The number for which to calculate the gamma.

Example

print result=loggamma(5)

Output

result
3.1780538303479458

Returns

  • Returns the natural logarithm of the absolute value of the gamma function of x.
  • For computing gamma function, see gamma().

12.153 - make_datetime()

Learn how to use the make_datetime() function to create a datetime scalar value from the specified date and time.

Creates a datetime scalar value between the specified date and time.

Syntax

make_datetime(year, month, day)

make_datetime(year, month, day, hour, minute)

make_datetime(year, month, day, hour, minute, second)

Parameters

NameTypeRequiredDescription
yearint✔️The year value between 0 to 9999.
monthint✔️The month value between 1 to 12.
dayint✔️The day value between 1 to 28-31, depending on the month.
hourintThe hour value between 0 to 23.
minuteintThe minute value between 0 to 59.
seconddoubleThe second value between 0 to 59.9999999.

Returns

If successful, the result will be a datetime value, otherwise, the result will be null.

Example

print year_month_day = make_datetime(2017,10,01)

Output

year_month_day
2017-10-01 00:00:00.0000000
print year_month_day_hour_minute = make_datetime(2017,10,01,12,10)

Output

year_month_day_hour_minute
2017-10-01 12:10:00.0000000
print year_month_day_hour_minute_second = make_datetime(2017,10,01,12,11,0.1234567)

Output

year_month_day_hour_minute_second
2017-10-01 12:11:00.1234567

12.154 - make_timespan()

Learn how to use the make_timespan() function to create a timespan scalar value from the specified time period.

Creates a timespan scalar value from the specified time period.

Syntax

make_timespan(hour, minute)

make_timespan(hour, minute, second)

make_timespan(day, hour, minute, second)

Parameters

NameTypeRequiredDescription
dayint✔️The day.
hourint✔️The hour. A value from 0-23.
minuteintThe minute. A value from 0-59.
secondrealThe second. A value from 0 to 59.9999999.

Returns

If the creation is successful, the result will be a timespan value. Otherwise, the result will be null.

Example

print ['timespan'] = make_timespan(1,12,30,55.123)

Output

timespan
1.12:30:55.1230000

12.155 - max_of()

Learn how to use the max_of() function to return the maximum value of all argument expressions.

Returns the maximum value of all argument expressions.

Syntax

max_of(arg, arg_2, [ arg_3, … ])

Parameters

NameTypeRequiredDescription
arg_iscalar✔️The values to compare.
  • All arguments must be of the same type.
  • Maximum of 64 arguments is supported.
  • Non-null values take precedence to null values.

Returns

The maximum value of all argument expressions.

Examples

Find the largest number

This query returns the maximum value of the numbers in the string.

print result = max_of(10, 1, -3, 17) 

Output

result
17

Find the maximum value in a data-table

This query returns the highest value from columns A and B. Notice that non-null values take precedence over null values.

datatable (A: int, B: int)
[
    1, 6,
    8, 1,
    int(null), 2,
    1, int(null),
    int(null), int(null)
]
| project max_of(A, B)

Output

result
6
8
2
1
(null)

Find the maximum datetime

This query returns the later of the two datetime values from columns A and B.

datatable (A: datetime, B: datetime)
[
    datetime(2024-12-15 07:15:22), datetime(2024-12-15 07:15:24),
    datetime(2024-12-15 08:00:00), datetime(2024-12-15 09:30:00),
    datetime(2024-12-15 10:45:00), datetime(2024-12-14 10:45:00)
]
| project maxDate = max_of(A, B)

Output

maxDate
2024-12-15 07:15:24
2024-12-15 09:30:00
2024-12-15 10:45:00

12.156 - merge_tdigest()

Learn how to use the merge_tdigest() function to merge columns.

Merges tdigest results (scalar version of the aggregate version tdigest_merge()).

Read more about the underlying algorithm (T-Digest) and the estimated error here.

Syntax

merge_tdigest(exprs)

Parameters

NameTypeRequiredDescription
exprsdynamic✔️One or more comma-separated column references that have the tdigest values to be merged.

Returns

The result for merging the columns *Expr1*, *Expr2*, … *ExprN* to one tdigest.

Example

range x from 1 to 10 step 1 
| extend y = x + 10
| summarize tdigestX = tdigest(x), tdigestY = tdigest(y)
| project merged = merge_tdigest(tdigestX, tdigestY)
| project percentile_tdigest(merged, 100, typeof(long))

Output

percentile_tdigest_merged
20

12.157 - min_of()

Learn how to use the min_of() function to return the minimum value of all argument expressions.

Returns the minimum value of several evaluated scalar expressions.

Syntax

min_of (arg, arg_2, [ arg_3, … ])

Parameters

NameTypeRequiredDescription
arg, arg_2, …scalar✔️A comma separated list of 2-64 scalar expressions to compare. The function returns the minimum value among these expressions.
  • All arguments must be of the same type.
  • Maximum of 64 arguments is supported.
  • Non-null values take precedence to null values.

Returns

The minimum value of all argument expressions.

Examples

Find the maximum value in an array:

print result=min_of(10, 1, -3, 17) 

Output

result
-3

Find the minimum value in a data-table. Non-null values take precedence over null values:

datatable (A: int, B: int)
[
    5, 2,
    10, 1,
    int(null), 3,
    1, int(null),
    int(null), int(null)
]
| project min_of(A, B)

Output

result
2
1
3
1
(null)

12.158 - monthofyear()

Learn how to use the monthofyear() function to get the integer representation of the month.

Returns the integer number from 1-12 representing the month number of the given year.

Syntax

monthofyear(date)

Parameters

NameTypeRequiredDescription
datedatetime✔️The date for which to find the month number.

Returns

An integer from 1-12 representing the month number of the given year.

Example

print result=monthofyear(datetime("2015-12-14"))

Output

result
12

12.159 - new_guid()

Learn how to use the new_guid() function to return a random GUID (Globally Unique Identifier).

Returns a random GUID (Globally Unique Identifier).

Syntax

new_guid()

Returns

A new value of type guid.

Example

print guid=new_guid()

Output

guid
2157828f-e871-479a-9d1c-17ffde915095

12.160 - not()

Learn how to use the not() function to reverse the value of its boolean argument.

Reverses the value of its bool argument.

Syntax

not(expr)

Parameters

NameTypeRequiredDescription
exprscalar✔️An expression that evaluates to a boolean value. The result of this expression is reversed.

Returns

Returns the reversed logical value of its bool argument.

Examples

The following query returns the number of events that are not a tornado, per state.

StormEvents 
| where not(EventType == "Tornado") 
| summarize count() by State

Output

StateCount
TEXAS4485
KANSAS3005
IOWA2286
ILLINOIS1999
MISSOURI1971
GEORGIA1927
MINNESOTA1863
WISCONSIN1829
NEBRASKA1715
NEW YORK1746

The following query excludes records where either the EventType is hail, or the state is Alaska.

StormEvents
| where not(EventType == "Hail" or State == "Alaska")

The next query excludes records where both the EventType is hail and the state is Alaska simultaneously.

StormEvents
| where not(EventType == "Hail" and State == "Alaska")

Combine with other conditions

You can also combine the not() function with other conditions. The following query returns all records where the EventType is not a flood and the property damage is greater than $1,000,000.

StormEvents
| where not(EventType == "Flood") and DamageProperty > 1000000

12.161 - now()

Learn how to use the now() function to return the current UTC time.

Returns the current UTC time, optionally offset by a given timespan.

The current UTC time will stay the same across all uses of now() in a single query statement, even if there’s technically a small time difference between when each now() runs.

Syntax

now([ offset ])

Parameters

NameTypeRequiredDescription
offsettimespanA timespan to add to the current UTC clock time. The default value is 0.

Returns

The current UTC clock time, plus the offset time if provided, as a datetime.

Examples

Show the current time

print now()

Show the time 2 days ago

print now(-2d)

Find time elapsed from a given event

The following example shows the time elapsed since the start of the storm events.

StormEvents
| extend Elapsed=now() - StartTime
| take 10

Get the date relative to a specific time interval

let T = datatable(label: string, timespanValue: timespan) [
    "minute", 60s, 
    "hour", 1h, 
    "day", 1d, 
    "year", 365d
];
T 
| extend timeAgo = now() - timespanValue

Output

labeltimespanValuetimeAgo
year365.00:00:002022-06-19T08:22:54.6623324Z
day1.00:00:002023-06-18T08:22:54.6623324Z
hour01:00:002023-06-19T07:22:54.6623324Z
minute00:01:002023-06-19T08:21:54.6623324Z

12.162 - pack_all()

Learn how to use the pack_all() function to create a dynamic object from all the columns of the tabular expression.

Creates a dynamic property bag object from all the columns of the tabular expression.

Syntax

pack_all([ ignore_null_empty ])

Parameters

NameTypeRequiredDescription
ignore_null_emptyboolIndicates whether to ignore null/empty columns and exclude them from the resulting property bag. The default value is false.

Example

The following query will use pack_all() to create columns for the below table.

SourceNumberTargetNumberCharsCount
555-555-1234555-555-121246
555-555-1234555-555-121350
555-555-131342
555-555-345674
datatable(SourceNumber:string,TargetNumber:string,CharsCount:long)
[
'555-555-1234','555-555-1212',46,
'555-555-1234','555-555-1213',50,
'555-555-1313','',42, 
'','555-555-3456',74 
]
| extend Packed=pack_all(), PackedIgnoreNullEmpty=pack_all(true)

Output

SourceNumberTargetNumberCharsCountPackedPackedIgnoreNullEmpty
555-555-1234555-555-121246{“SourceNumber”:“555-555-1234”, “TargetNumber”:“555-555-1212”, “CharsCount”: 46}{“SourceNumber”:“555-555-1234”, “TargetNumber”:“555-555-1212”, “CharsCount”: 46}
555-555-1234555-555-121350{“SourceNumber”:“555-555-1234”, “TargetNumber”:“555-555-1213”, “CharsCount”: 50}{“SourceNumber”:“555-555-1234”, “TargetNumber”:“555-555-1213”, “CharsCount”: 50}
555-555-131342{“SourceNumber”:“555-555-1313”, “TargetNumber”:"", “CharsCount”: 42}{“SourceNumber”:“555-555-1313”, “CharsCount”: 42}
555-555-345674{“SourceNumber”:"", “TargetNumber”:“555-555-3456”, “CharsCount”: 74}{“TargetNumber”:“555-555-3456”, “CharsCount”: 74}

12.163 - pack_array()

Learn how to use the pack_array() function to pack all input values into a dynamic array.

Packs all input values into a dynamic array.

Syntax

pack_array(value1, [ value2, … ])

pack_array(*)

Parameters

NameTypeRequiredDescription
value1…valueNstring✔️Input expressions to be packed into a dynamic array.
The wildcard *stringProviding the wildcard * packs all input columns into a dynamic array.

Returns

A dynamic array that includes the values of value1, value2, … valueN.

Example

range x from 1 to 3 step 1
| extend y = x * 2
| extend z = y * 2
| project pack_array(x, y, z)

Output

Column1
[1,2,4]
[2,4,8]
[3,6,12]
range x from 1 to 3 step 1
| extend y = tostring(x * 2)
| extend z = (x * 2) * 1s
| project pack_array(x, y, z)

Output

Column1
[1,“2”,“00:00:02”]
[2,“4”,“00:00:04”]
[3,“6”,“00:00:06”]

12.164 - parse_command_line()

Learn how to use the parse_command_line() function to parse a unicode command-line string.

Parses a Unicode command-line string and returns a dynamic array of the command-line arguments.

Syntax

parse_command_line(command_line, parser_type)

Parameters

NameTypeRequiredDescription
command_linestring✔️The command line value to parse.
parser_typestring✔️The only value that is currently supported is "windows", which parses the command line the same way as CommandLineToArgvW.

Returns

A dynamic array of the command-line arguments.

Example

print parse_command_line("echo \"hello world!\"", "windows")

Output

Result
[“echo”,“hello world!”]

12.165 - parse_csv()

Learn how to use the parse_csv() function to split a given string representing a single record of comma-separated values.

Splits a given string representing a single record of comma-separated values and returns a string array with these values.

Syntax

parse_csv(csv_text)

Parameters

NameTypeRequiredDescription
csv_textstring✔️A single record of comma-separated values.

Returns

A string array that contains the split values.

Examples

Filter by count of values in record

Count the conference sessions with more than three participants.

ConferenceSessions
| where array_length(parse_csv(participants)) > 3
| distinct *

Output

sessionidparticipants
CON-PRT157Guy Reginiano, Guy Yehudy, Pankaj Suri, Saeed Copty
BRK3099Yoni Leibowitz, Eric Fleischman, Robert Pack, Avner Aharoni

Use escaping quotes

print result=parse_csv('aa,"b,b,b",cc,"Escaping quotes: ""Title""","line1\nline2"')

Output

result
[
“aa”,
“b,b,b”,
“cc”,
“Escaping quotes: "Title"”,
“line1\nline2”
]

CSV with multiple records

Only the first record is taken since this function doesn’t support multiple records.

print result_multi_record=parse_csv('record1,a,b,c\nrecord2,x,y,z')

Output

result_multi_record
[
“record1”,
“a”,
“b”,
“c”
]

12.166 - parse_ipv4_mask()

Learn how to use the parse_ipv4_mask() function to convert an IPv4 input string and netmask to a 64-bit wide long number in big-endian order.

Converts the input string of IPv4 and netmask to a signed, 64-bit wide, long number representation in big-endian order.

Syntax

parse_ipv4_mask(ip , prefix)

Parameters

NameTypeRequiredDescription
ipstring✔️The IPv4 address to convert to a long number.
prefixint✔️An integer from 0 to 32 representing the number of most-significant bits that are taken into account.

Returns

If conversion is successful, the result is a long number. If conversion isn’t successful, the result is null.

Example

print parse_ipv4_mask("127.0.0.1", 24)

12.167 - parse_ipv4()

Learn how to use the parse_ipv4() function to convert an IPv4 string to a long number in big-endian order.

Converts IPv4 string to a signed 64-bit wide long number representation in big-endian order.

Syntax

parse_ipv4(ip)

Parameters

NameTypeRequiredDescription
ipstring✔️The IPv4 that is converted to long. The value may include net-mask using IP-prefix notation.

Returns

If conversion is successful, the result is a long number. If conversion isn’t successful, the result is null.

Example

datatable(ip_string: string)
[
    '192.168.1.1', '192.168.1.1/24', '255.255.255.255/31'
]
| extend ip_long = parse_ipv4(ip_string)

Output

ip_stringip_long
192.168.1.13232235777
192.168.1.1/243232235776
255.255.255.255/314294967294

12.168 - parse_ipv6_mask()

Learn how to use the parse_ipv6_mask() function to convert IPv6 or IPv4 strings and netmask to a canonical IPv6 string representation.

Converts IPv6/IPv4 string and netmask to a canonical IPv6 string representation.

Syntax

parse_ipv6_mask(ip, prefix)

Parameters

NameTypeRequiredDescription
ipstringThe IPv6/IPv4 network address to convert to canonical IPv6 representation. The value may include net-mask using IP-prefix notation.
prefixintAn integer from 0 to 128 representing the number of most-significant bits that are taken into account.

Returns

If conversion is successful, the result is a string representing a canonical IPv6 network address. If conversion isn’t successful, the result is an empty string.

Example

datatable(ip_string: string, netmask: long)
[
    // IPv4 addresses
    '192.168.255.255', 120,  // 120-bit netmask is used
    '192.168.255.255/24', 124,  // 120-bit netmask is used, as IPv4 address doesn't use upper 8 bits
    '255.255.255.255', 128,  // 128-bit netmask is used
    // IPv6 addresses
    'fe80::85d:e82c:9446:7994', 128,     // 128-bit netmask is used
    'fe80::85d:e82c:9446:7994/120', 124, // 120-bit netmask is used
    // IPv6 with IPv4 notation
    '::192.168.255.255', 128,  // 128-bit netmask is used
    '::192.168.255.255/24', 128,  // 120-bit netmask is used, as IPv4 address doesn't use upper 8 bits
]
| extend ip6_canonical = parse_ipv6_mask(ip_string, netmask)

Output

ip_stringnetmaskip6_canonical
192.168.255.2551200000:0000:0000:0000:0000:ffff:c0a8:ff00
192.168.255.255/241240000:0000:0000:0000:0000:ffff:c0a8:ff00
255.255.255.2551280000:0000:0000:0000:0000:ffff:ffff:ffff
fe80::85d:e82c:9446:7994128fe80:0000:0000:0000:085d:e82c:9446:7994
fe80::85d:e82c:9446:7994/120124fe80:0000:0000:0000:085d:e82c:9446:7900
::192.168.255.2551280000:0000:0000:0000:0000:ffff:c0a8:ffff
::192.168.255.255/241280000:0000:0000:0000:0000:ffff:c0a8:ff00

12.169 - parse_ipv6()

Learn how to use the parse_ipv6() function to convert IPv6 or IPv4 strings to a canonical IPv6 string representation.

Converts IPv6 or IPv4 string to a canonical IPv6 string representation.

Syntax

parse_ipv6(ip)

Parameters

NameTypeRequiredDescription
ipstring✔️The IPv6/IPv4 network address that is converted to canonical IPv6 representation. The value may include net-mask using IP-prefix notation.

Returns

If conversion is successful, the result is a string representing a canonical IPv6 network address. If conversion isn’t successful, the result is an empty string.

Example

datatable(ipv4: string)
[
    '192.168.255.255', '192.168.255.255/24', '255.255.255.255'
]
| extend ipv6 = parse_ipv6(ipv4)

Output

ipv4ipv6
192.168.255.2550000:0000:0000:0000:0000:ffff:c0a8:ffff
192.168.255.255/240000:0000:0000:0000:0000:ffff:c0a8:ff00
255.255.255.2550000:0000:0000:0000:0000:ffff:ffff:ffff

12.170 - parse_json() function

Learn how to use the parse_json() function to return an object of type dynamic.

Interprets a string as a JSON value and returns the value as dynamic. If possible, the value is converted into relevant data types. For strict parsing with no data type conversion, use extract() or extract_json() functions.

It’s better to use the parse_json() function over the extract_json() function when you need to extract more than one element of a JSON compound object. Use dynamic() when possible.

Syntax

parse_json(json)

Parameters

NameTypeRequiredDescription
jsonstring✔️The string in the form of a JSON-formatted value or a dynamic property bag to parse as JSON.

Returns

An object of type dynamic that is determined by the value of json:

  • If json is of type dynamic, its value is used as-is.
  • If json is of type string, and is a properly formatted JSON string, then the string is parsed, and the value produced is returned.
  • If json is of type string, but it isn’t a properly formatted JSON string, then the returned value is an object of type dynamic that holds the original string value.

Example

In the following example, when context_custom_metrics is a string that looks like this:

{"duration":{"value":118.0,"count":5.0,"min":100.0,"max":150.0,"stdDev":0.0,"sampledValue":118.0,"sum":118.0}}

then the following query retrieves the value of the duration slot in the object, and from that it retrieves two slots, duration.value and duration.min (118.0 and 110.0, respectively).

datatable(context_custom_metrics:string)
[
    '{"duration":{"value":118.0,"count":5.0,"min":100.0,"max":150.0,"stdDev":0.0,"sampledValue":118.0,"sum":118.0}}'
]
| extend d = parse_json(context_custom_metrics)
| extend duration_value = d.duration.value, duration_min = d.duration.min

Notes

It’s common to have a JSON string describing a property bag in which one of the “slots” is another JSON string.

For example:

let d='{"a":123, "b":"{\\"c\\":456}"}';
print d

In such cases, it isn’t only necessary to invoke parse_json twice, but also to make sure that in the second call, tostring is used. Otherwise, the second call to parse_json will just pass on the input to the output as-is, because its declared type is dynamic.

let d='{"a":123, "b":"{\\"c\\":456}"}';
print d_b_c=parse_json(tostring(parse_json(d).b)).c

12.171 - parse_path()

Learn how to use the parse_path() function to parse a file path.

Parses a file path string and returns a dynamic object that contains the following parts of the path:

  • Scheme
  • RootPath
  • DirectoryPath
  • DirectoryName
  • Filename
  • Extension
  • AlternateDataStreamName

In addition to the simple paths with both types of slashes, the function supports paths with:

  • Schemas. For example, “file://…”
  • Shared paths. For example, “\shareddrive\users…”
  • Long paths. For example, “\?\C:…”"

Syntax

parse_path(path)

Parameters

NameTypeRequiredDescription
pathstring✔️The file path.

Returns

An object of type dynamic that included the path components as listed above.

Example

datatable(p:string) 
[
    @"C:\temp\file.txt",
    @"temp\file.txt",
    "file://C:/temp/file.txt:some.exe",
    @"\\shared\users\temp\file.txt.gz",
    "/usr/lib/temp/file.txt"
]
| extend path_parts = parse_path(p)

Output

ppath_parts
C:\temp\file.txt{“Scheme”:"",“RootPath”:“C:”,“DirectoryPath”:“C:\temp”,“DirectoryName”:“temp”,“Filename”:“file.txt”,“Extension”:“txt”,“AlternateDataStreamName”:""}
temp\file.txt{“Scheme”:"",“RootPath”:"",“DirectoryPath”:“temp”,“DirectoryName”:“temp”,“Filename”:“file.txt”,“Extension”:“txt”,“AlternateDataStreamName”:""}
file://C:/temp/file.txt:some.exe{“Scheme”:“file”,“RootPath”:“C:”,“DirectoryPath”:“C:/temp”,“DirectoryName”:“temp”,“Filename”:“file.txt”,“Extension”:“txt”,“AlternateDataStreamName”:“some.exe”}
\shared\users\temp\file.txt.gz{“Scheme”:"",“RootPath”:"",“DirectoryPath”:"\\shared\users\temp",“DirectoryName”:“temp”,“Filename”:“file.txt.gz”,“Extension”:“gz”,“AlternateDataStreamName”:""}
/usr/lib/temp/file.txt{“Scheme”:"",“RootPath”:"",“DirectoryPath”:"/usr/lib/temp",“DirectoryName”:“temp”,“Filename”:“file.txt”,“Extension”:“txt”,“AlternateDataStreamName”:""}

12.172 - parse_url()

Learn how to use the parse_url() function to parse a URL string.

Parses an absolute URL string and returns a dynamic object contains URL parts.

Syntax

parse_url(url)

Parameters

NameTypeRequiredDescription
urlstring✔️An absolute URL, including its scheme, or the query part of the URL. For example, use the absolute https://bing.com instead of bing.com.

Returns

An object of type dynamic that included the URL components: Scheme, Host, Port, Path, Username, Password, Query Parameters, Fragment.

Example

print Result=parse_url("scheme://username:password@host:1234/this/is/a/path?k1=v1&k2=v2#fragment")

Output

Result
{“Scheme”:“scheme”, “Host”:“host”, “Port”:“1234”, “Path”:“this/is/a/path”, “Username”:“username”, “Password”:“password”, “Query Parameters”:"{“k1”:“v1”, “k2”:“v2”}", “Fragment”:“fragment”}

12.173 - parse_urlquery()

Learn how to use the parse_urlquery() function to return a dynamic object that contains the query parameters.

Returns a dynamic object that contains the query parameters.

Syntax

parse_urlquery(query)

Parameters

NameTypeRequiredDescription
querystring✔️The query part of the URL. The format must follow URL query standards (key=value& …).

Returns

An object of type dynamic that includes the query parameters.

Examples

print Result=parse_urlquery("k1=v1&k2=v2&k3=v3")

Output

Result
{ “Query Parameters”:"{“k1”:“v1”, “k2”:“v2”, “k3”:“v3”}" }

The following example uses a function to extract specific query parameters.

let getQueryParamValue = (querystring: string, param: string) {
    let params = parse_urlquery(querystring);
    tostring(params["Query Parameters"].[param])
};
print UrlQuery = 'view=vs-2019&preserve-view=true'
| extend view = getQueryParamValue(UrlQuery, 'view')
| extend preserve = getQueryParamValue(UrlQuery, 'preserve-view')

Output

UrlQueryviewpreserve
view=vs-2019&preserve-view=truevs-2019true

12.174 - parse_user_agent()

Learn how to use the parse_user_agent() to return a dynamic object that contains information about the user-agent.

Interprets a user-agent string, which identifies the user’s browser and provides certain system details to servers hosting the websites the user visits. The result is returned as dynamic.

Syntax

parse_user_agent(user-agent-string, look-for)

Parameters

NameTypeRequiredDescription
user-agent-stringstring✔️The user-agent string to parse.
look-forstring or dynamic✔️The value to search for in user-agent-string. The possible options are “browser”, “os”, or “device”. If only a single parsing target is required, it can be passed a string parameter. If two or three targets are required, they can be passed as a dynamic array.

Returns

An object of type dynamic that contains the information about the requested parsing targets.

Browser: Family, MajorVersion, MinorVersion, Patch

OperatingSystem: Family, MajorVersion, MinorVersion, Patch, PatchMinor

Device: Family, Brand, Model

When the function is used in a query, make sure it runs in a distributed manner on multiple machines. If queries with this function are frequently used, you may want to pre-create the results via update policy, but you need to take into account that using this function inside the update policy will increase the ingestion latency.

Examples

Look-for parameter as string

print useragent = "Mozilla/5.0 (Windows; U; en-US) AppleWebKit/531.9 (KHTML, like Gecko) AdobeAIR/2.5.1"
| extend x = parse_user_agent(useragent, "browser") 

Expected result is a dynamic object:

{
  "Browser": {
    "Family": "AdobeAIR",
    "MajorVersion": "2",
    "MinorVersion": "5",
    "Patch": "1"
  }
}

Look-for parameter as dynamic array

print useragent = "Mozilla/5.0 (SymbianOS/9.2; U; Series60/3.1 NokiaN81-3/10.0.032 Profile/MIDP-2.0 Configuration/CLDC-1.1 ) AppleWebKit/413 (KHTML, like Gecko) Safari/4"
| extend x = parse_user_agent(useragent, dynamic(["browser","os","device"])) 

Expected result is a dynamic object:

{
  "Browser": {
    "Family": "Nokia OSS Browser",
    "MajorVersion": "3",
    "MinorVersion": "1",
    "Patch": ""
  },
  "OperatingSystem": {
    "Family": "Symbian OS",
    "MajorVersion": "9",
    "MinorVersion": "2",
    "Patch": "",
    "PatchMinor": ""
  },
  "Device": {
    "Family": "Nokia N81",
    "Brand": "Nokia",
    "Model": "N81-3"
  }
}

12.175 - parse_version()

Learn how to use the parse_version() function to convert the input string representation of the version to a comparable decimal number,

Converts the input string representation of a version number into a decimal number that can be compared.

Syntax

parse_version (version)

Parameters

NameTypeRequiredDescription
versionstring✔️The version to be parsed.

Returns

If conversion is successful, the result is a decimal; otherwise, the result is null.

Examples

Parse version strings

The following query shows version strings with their parsed version numbers.

let dt = datatable(v: string)
    [
    "0.0.0.5", "0.0.7.0", "0.0.3", "0.2", "0.1.2.0", "1.2.3.4", "1"
];
dt
| extend parsedVersion = parse_version(v)

Output

vparsedVersion
0.0.0.55
0.0.7.0700,000,000
0.0.3300,000,000
0.220,000,000,000,000,000
0.1.2.010,000,000,200,000,000
1.2.3.41,000,000,020,000,000,300,000,004
11,000,000,000,000,000,000,000,000

Compare parsed version strings

The following query identifies which labs have equipment needing updates by comparing their parsed version strings to the minimum version number “1.0.0.0”.

let dt = datatable(lab: string, v: string)
[
    "Lab A", "0.0.0.5",
    "Lab B", "0.0.7.0",
    "Lab D","0.0.3",
    "Lab C", "0.2", 
    "Lab G", "0.1.2.0",
    "Lab F", "1.2.3.4",
    "Lab E", "1",
];
dt
| extend parsed_version = parse_version(v)
| extend needs_update = iff(parsed_version < parse_version("1.0.0.0"), "Yes", "No")
| project lab, v, needs_update
| sort by lab asc , v, needs_update

Output

labvneeds_update
Lab A0.0.0.5Yes
Lab B0.0.7.0Yes
Lab C0.2Yes
Lab D0.0.3Yes
Lab E1No
Lab F1.2.3.4No
Lab G0.1.2.0Yes

12.176 - parse_xml()

Learn how to use the parse_xml() function to return a dynamic object that is determined by the value of XML.

Interprets a string as an XML value, converts the value to a JSON, and returns the value as dynamic.

Syntax

parse_xml(xml)

Parameters

NameTypeRequiredDescription
xmlstring✔️The XML-formatted string value to parse.

Returns

An object of type dynamic that is determined by the value of xml, or null, if the XML format is invalid.

The conversion is done as follows:

XMLJSONAccess
<e/>{ “e”: null }o.e
<e>text</e>{ “e”: “text” }o.e
<e name="value" />{ “e”:{"@name": “value”} }o.e["@name"]
<e name="value">text</e>{ “e”: { “@name”: “value”, “#text”: “text” } }o.e["@name"] o.e["#text"]
<e> <a>text</a> <b>text</b> </e>{ “e”: { “a”: “text”, “b”: “text” } }o.e.a o.e.b
<e> <a>text</a> <a>text</a> </e>{ “e”: { “a”: [“text”, “text”] } }o.e.a[0] o.e.a[1]
<e> text <a>text</a> </e>{ “e”: { “#text”: “text”, “a”: “text” } }1`o.e["#text"] o.e.a

Example

In the following example, when context_custom_metrics is a string that looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<duration>
    <value>118.0</value>
    <count>5.0</count>
    <min>100.0</min>
    <max>150.0</max>
    <stdDev>0.0</stdDev>
    <sampledValue>118.0</sampledValue>
    <sum>118.0</sum>
</duration>

then the following CSL Fragment translates the XML to the following JSON:

{
    "duration": {
        "value": 118.0,
        "count": 5.0,
        "min": 100.0,
        "max": 150.0,
        "stdDev": 0.0,
        "sampledValue": 118.0,
        "sum": 118.0
    }
}

and retrieves the value of the duration slot in the object, and from that it retrieves two slots, duration.value and duration.min (118.0 and 100.0, respectively).

T
| extend d=parse_xml(context_custom_metrics) 
| extend duration_value=d.duration.value, duration_min=d["duration"]["min"]

12.177 - percentile_array_tdigest()

Learn how to use the percentile_array_tdigest() to calculate the percentile value of an expression.

Calculates the percentile result from the tdigest results (which was generated by tdigest() or tdigest_merge())

Syntax

percentiles_array_tdigest(tdigest, percentile1 [, percentile2, …])

percentiles_array_tdigest(tdigest, Dynamic array [, typeLiteral ])

Parameters

NameTypeRequiredDescription
tdigeststring✔️The tdigest or tdigest_merge() results used to calculate the percentiles.
percentilereal✔️A value or comma-separated list of values that specifies the percentiles.
Dynamic arraydynamic✔️A dynamic array of real numbers that specify the percentiles.
typeLiteralstringA type literal. For example, typeof(long). If provided, the result set is of this type.

Returns

The percentile/percentiles value of each value in tdigest.

a dynamic array that includes the results. (such like percentiles())

Examples

StormEvents
| summarize tdigestRes = tdigest(DamageProperty) by State
| project percentiles_array_tdigest(tdigestRes, range(0, 100, 50), typeof(int))

Output

percentile_tdigest_tdigestRes
[0,0,0]
[0,0,62000000]
[0,0,110000000]
[0,0,1200000]
[0,0,250000]

12.178 - percentile_tdigest()

Learn how to use the percentile_tdigest() function to calculate the percentile value of an expression.

Calculates the percentile result from the tdigest results (which was generated by tdigest() or tdigest_merge())

Syntax

percentile_tdigest(expr, percentile1 , typeLiteral)

Parameters

NameTypeRequiredDescription
exprstring✔️An expression that was generated by tdigest or tdigest_merge().
percentilelong✔️The value that specifies the percentile.
typeLiteralstringA type literal. If provided, the result set will be of this type. For example, typeof(long) will cast all results to type long.

Returns

The percentile value of each value in expr.

Examples

StormEvents
| summarize tdigestRes = tdigest(DamageProperty) by State
| project percentile_tdigest(tdigestRes, 100)

Output

percentile_tdigest_tdigestRes
0
62000000
110000000
1200000
250000
StormEvents
| summarize tdigestRes = tdigest(DamageProperty) by State
| union (StormEvents | summarize tdigestRes = tdigest(EndTime) by State)
| project percentile_tdigest(tdigestRes, 100)

Output

percentile_tdigest_tdigestRes
[0]
[62000000]
[“2007-12-20T11:30:00.0000000Z”]
[“2007-12-31T23:59:00.0000000Z”]

12.179 - percentrank_tdigest()

Learn how to use the percentrank_tdigest() function to calculate the approximate rank of the value in a set.

Calculates the approximate rank of the value in a set, where rank is expressed as a percentage of the set’s size. This function can be viewed as the inverse of the percentile.

Syntax

percentrank_tdigest(digest, value)

Parameters

NameTypeRequiredDescription
digeststring✔️An expression that was generated by tdigest() or tdigest_merge().
valuescalar✔️An expression representing a value to be used for percentage ranking calculation.

Returns

The percentage rank of value in a dataset.

Examples

Getting the percentrank_tdigest() of the damage property that valued 4490$ is ~85%:

StormEvents
| summarize tdigestRes = tdigest(DamageProperty)
| project percentrank_tdigest(tdigestRes, 4490)

Output

Column1
85.0015237192293

Using percentile 85 over the damage property should give 4490$:

StormEvents
| summarize tdigestRes = tdigest(DamageProperty)
| project percentile_tdigest(tdigestRes, 85, typeof(long))

Output

percentile_tdigest_tdigestRes
4490

12.180 - pi()

Learn how to use the pi() function to return the constant value of Pi.

Returns the constant value of Pi.

Syntax

pi()

Returns

The double value of Pi (3.1415926…)

12.181 - pow()

Learn how to use the pow() function to calculate the base raised to the power of the exponent.

Returns a result of raising to power

Syntax

pow(base, exponent )

Parameters

NameTypeRequiredDescription
baseint, real, or long✔️The base value.
exponentint, real, or long✔️The exponent value.

Returns

Returns base raised to the power exponent: base ^ exponent.

Example

print result=pow(2, 3)

Output

result
8

12.182 - punycode_domain_from_string

This article describes the punycode_domain_from_string() command.

Decodes input string from encoded Internationalized Domain Name in Applications (IDNA) punycode form.

Syntax

punycode_domain_from_string(encoded_string)

Parameters

NameTypeRequiredDescription
encoded_stringstring✔️An IDNA string to be decoded from punycode form. The function accepts one string argument.

Returns

  • Returns a string that represents the original Internationalized Domain Name.
  • Returns an empty result if decoding failed.

Example

datatable(encoded:string)
[
    "xn--Ge-mia.Bulg.edu", 
    "xn--Lin-8na.Celtchair.org", 
    "xn--Ry-lja8c.xn--Jng-uta63a.xn--Bng-9ka.com", 
] 
| extend domain=punycode_domain_from_string(encoded)
encodeddomain
xn–Ge-mia.Bulg.eduGáe.Bulg.edu
xn–Lin-8na.Celtchair.orgLúin.Celtchair.org
xn–Ry-lja8c.xn–Jng-uta63a.xn–Bng-9ka.comRúyì.Jīngū.Bàng.com

12.183 - punycode_domain_to_string

This article describes the punycode_domain_to_string() command.

Encodes Internationalized Domain Name in Applications (IDNA) string to Punycode form.

Syntax

punycode_domain_to_string(domain)

Parameters

NameTypeRequiredDescription
domainstring✔️A string to be encoded to punycode form. The function accepts one string argument.

Returns

  • Returns a string that represents punycode-encoded original string.
  • Returns an empty result if encoding failed.

Examples

datatable(domain:string )['Lê Lợi。Thuận Thiên。com', 'Riðill。Skáldskaparmál。org', "Kaledvoulc'h.Artorījos.edu"]
| extend str=punycode_domain_to_string(domain)
domainstr
Lê Lợi。Thuận Thiên。comxn–L Li-gpa4517b.xn–Thun Thin-s4a7194f.com
Riðill。Skáldskaparmál。orgxn–Riill-jta.xn–Skldskaparml-dbbj.org
Kaledvoulc’h.Artorījos.eduKaledvoulc’h.xn–Artorjos-ejb.edu

12.184 - punycode_from_string

This article describes the punycode_from_string() command.

Encodes input string to Punycode form. The result string contains only ASCII characters. The result string doesn’t start with “xn–”.

Syntax

punycode_from_string('input_string')

Parameters

NameTypeRequiredDescription
input_stringstring✔️A string to be encoded to punycode form. The function accepts one string argument.

Returns

  • Returns a string that represents punycode-encoded original string.
  • Returns an empty result if encoding failed.

Examples

 print encoded = punycode_from_string('académie-française')
encoded
acadmie-franaise-npb1a
 print domain='艺术.com'
| extend domain_vec = split(domain, '.')
| extend encoded_host = punycode_from_string(tostring(domain_vec[0]))
| extend encoded_domain = strcat('xn--', encoded_host, '.', domain_vec[1])
domaindomain_vecencoded_hostencoded_domain
艺术.com[“艺术”,“com”]cqv902dxn–cqv902d.com

12.185 - punycode_to_string

This article describes the punycode_to_string() command.

Decodes input string from punycode form. The string shouldn’t contain the initial xn–, and must contain only ASCII characters.

Syntax

punycode_to_string('input_string')

Parameters

NameTypeRequiredDescription
input_stringstring✔️A string to be decoded from punycode form. The function accepts one string argument.

Returns

  • Returns a string that represents the original, decoded string.
  • Returns an empty result if decoding failed.

Example

 print decoded = punycode_to_string('acadmie-franaise-npb1a')
decoded
académie-française

12.186 - radians()

Learn how to use the radians() function to convert angle values from degrees to radians.

Converts angle value in degrees into value in radians, using formula radians = (PI / 180 ) * angle_in_degrees

Syntax

radians(degrees)

Parameters

NameTypeRequiredDescription
degreesreal✔️The angle in degrees.

Returns

The corresponding angle in radians for an angle specified in degrees.

Example

print radians0 = radians(90), radians1 = radians(180), radians2 = radians(360) 

Output

radians0radians1radians2
1.57079632679493.141592653589796.28318530717959

12.187 - rand()

Learn how to use the rand() function to return a random number.

Returns a random number.

rand()
rand(1000)

Syntax

  • rand() - returns a value of type real with a uniform distribution in the range [0.0, 1.0).
  • rand( N ) - returns a value of type real chosen with a uniform distribution from the set {0.0, 1.0, …, N - 1}.

12.188 - range()

Learn how to use the range() function to generate a dynamic array holding a series of equally spaced values.

Generates a dynamic array holding a series of equally spaced values.

Syntax

range(start, stop [, step])

Parameters

NameTypeRequiredDescription
startscalar✔️The value of the first element in the resulting array.
stopscalar✔️The maximum value of the last element in the resulting array, such that the last value in the series is less than or equal to the stop value.
stepscalarThe difference between two consecutive elements of the array. The default value for step is 1 for numeric and 1h for timespan or datetime.

Returns

A dynamic array whose values are: start, start + step, … up to and including stop. The array is truncated if the maximum number of results allowed is reached.

Examples

The following example returns an array of numbers from one to eight, with an increment of three.

print r = range(1, 8, 3)

Output

r
[1,4,7]

The following example returns an array with all dates from the year 2007.

print r = range(datetime(2007-01-01), datetime(2007-12-31), 1d)

Output

r
[“2007-01-01T00:00:00.0000000Z”,“2007-01-02T00:00:00.0000000Z”,“2007-01-03T00:00:00.0000000Z”,…..,“2007-12-31T00:00:00.0000000Z”]

The following example returns an array with numbers between one and three.

print range(1, 3)

Output

print_0
[1,2,3]

The following example returns a range of hours between one hour and five hours.

print range(1h, 5h)

Output

print_0
1,000,000
["01:00:00","02:00:00","03:00:00","04:00:00","05:00:00"]:

The following example returns a truncated array as the range exceeds the maximum results limit. The example demonstrates that the limit is exceeded by using the mv-expand operator to expand the array into multiple records and then counting the number of records.

" target="_blank">Run the query

print r = range(1,1000000000) 
| mv-expand r 
| count

Output

Count
1,048,576

12.189 - rank_tdigest()

Learn how to use the rank_tdigest() function to calculate the approximate rank of the value in a set.

Calculates the approximate rank of the value in a set. Rank of value v in a set S is defined as count of members of S that are smaller or equal to v, S is represented by its tdigest.

Syntax

rank_tdigest(digest, value)

Parameters

NameTypeRequiredDescription
digeststringAn expression that was generated by tdigest() or tdigest_merge().
valuescalarAn expression representing a value to be used for ranking calculation.

Returns

The rank foreach value in a dataset.

Examples

In a sorted list (1-1000), the rank of 685 is its index:

range x from 1 to 1000 step 1
| summarize t_x=tdigest(x)
| project rank_of_685=rank_tdigest(t_x, 685)

Output

rank_of_685
685

This query calculates the rank of value 4490$ over all damage properties costs:

StormEvents
| summarize tdigestRes = tdigest(DamageProperty)
| project rank_of_4490=rank_tdigest(tdigestRes, 4490) 

Output

rank_of_4490
50207

Getting the estimated percentage of the rank (by dividing by the set size):

StormEvents
| summarize tdigestRes = tdigest(DamageProperty), count()
| project rank_tdigest(tdigestRes, 4490) * 100.0 / count_

Output

Column1
85.0015237192293

The percentile 85 of the damage properties costs is 4490$:

StormEvents
| summarize tdigestRes = tdigest(DamageProperty)
| project percentile_tdigest(tdigestRes, 85, typeof(long))

Output

percentile_tdigest_tdigestRes
4490

12.190 - regex_quote()

Learn how to use the regex_quote() function to return a string that escapes all regular expression characters.

Returns a string that escapes all regular expression characters.

Syntax

regex_quote(string)

Parameters

NameTypeRequiredDescription
stringstring✔️The string to escape.

Returns

Returns string where all regex expression characters are escaped.

Example

print result = regex_quote('(so$me.Te^xt)')

Output

result
\(so\$me\.Te\^xt\)

12.191 - repeat()

Learn how to use the repeat() function to generate a dynamic array containing a series comprised of repeated numbers.

Generates a dynamic array containing a series comprised of repeated numbers.

Syntax

repeat(value, count)

Parameters

NameTypeRequiredDescription
valuebool, int, long, real, datetime, string or timespan✔️The value of the element in the resulting array.
countint✔️The count of the elements in the resulting array.

Returns

If count is equal to zero, an empty array is returned. If count is less than zero, a null value is returned.

Examples

The following example returns [1, 1, 1]:

T | extend r = repeat(1, 3)

12.192 - replace_regex()

Learn how to use the replace_regex() function to replace all regex matches with another string.

Replaces all regular expression matches with a specified pattern.

Syntax

replace_regex(source,lookup_regex, rewrite_pattern)

Parameters

NameTypeRequiredDescription
sourcestring✔️The text to search and replace.
lookup_regexstring✔️The regular expression to search for in text. The expression can contain capture groups in parentheses. To match over multiple lines, use the m or s flags. For more information on flags, see Grouping and flags.
rewrite_patternstring✔️The replacement regex for any match made by matchingRegex. Use \0 to refer to the whole match, \1 for the first capture group, \2 and so on for subsequent capture groups.

Returns

Returns the source after replacing all matches of lookup_regex with evaluations of rewrite_pattern. Matches do not overlap.

Example

range x from 1 to 5 step 1
| extend str=strcat('Number is ', tostring(x))
| extend replaced=replace_regex(str, @'is (\d+)', @'was: \1')

Output

xstrreplaced
1Number is 1.000000Number was: 1.000000
2Number is 2.000000Number was: 2.000000
3Number is 3.000000Number was: 3.000000
4Number is 4.000000Number was: 4.000000
5Number is 5.000000Number was: 5.000000

12.193 - replace_string()

Learn how to use the replace_string() function to replace all string matches with another string.

Replaces all string matches with a specified string.

To replace multiple strings, see replace_strings().

Syntax

replace_string(text, lookup, rewrite)

Parameters

NameTypeRequiredDescription
textstring✔️The source string.
lookupstring✔️The string to be replaced.
rewritestring✔️The replacement string.

Returns

Returns the text after replacing all matches of lookup with evaluations of rewrite. Matches don’t overlap.

Examples

Replace words in a string

The following example uses replace_string() to replace the word “cat” with the word “hamster” in the Message string.

print Message="A magic trick can turn a cat into a dog"
| extend Outcome = replace_string(
        Message, "cat", "hamster")  // Lookup strings

Output

MessageOutcome
A magic trick can turn a cat into a dogA magic trick can turn a hamster into a dog

Generate and modify a sequence of numbers

The following example creates a table with column x containing numbers from one to five, incremented by one. It adds the column str that concatenates “Number is " with the string representation of the x column values using the strcat() function. It then adds the replaced column where “was” replaces the word “is” in the strings from the str column.

range x from 1 to 5 step 1
| extend str=strcat('Number is ', tostring(x))
| extend replaced=replace_string(str, 'is', 'was')

Output

xstrreplaced
1Number is 1.000000Number was 1.000000
2Number is 2.000000Number was 2.000000
3Number is 3.000000Number was 3.000000
4Number is 4.000000Number was 4.000000
5Number is 5.000000Number was 5.000000

12.194 - replace_strings()

Learn how to use the replace_strings() function to replace multiple strings matches with multiple replacement strings.

Replaces all strings matches with specified strings.

To replace an individual string, see replace_string().

Syntax

replace_strings(text, lookups, rewrites)

Parameters

NameTypeRequiredDescription
textstring✔️The source string.
lookupsdynamic✔️The array that includes lookup strings. Array element that isn’t a string is ignored.
rewritesdynamic✔️The array that includes rewrites. Array element that isn’t a string is ignored (no replacement made).

Returns

Returns text after replacing all matches of lookups with evaluations of rewrites. Matches don’t overlap.

Examples

Simple replacement

print Message="A magic trick can turn a cat into a dog"
| extend Outcome = replace_strings(
        Message,
        dynamic(['cat', 'dog']), // Lookup strings
        dynamic(['dog', 'pigeon']) // Replacements
        )
MessageOutcome
A magic trick can turn a cat into a dogA magic trick can turn a dog into a pigeon

Replacement with an empty string

Replacement with an empty string removes the matching string.

print Message="A magic trick can turn a cat into a dog"
| extend Outcome = replace_strings(
        Message,
        dynamic(['turn', ' into a dog']), // Lookup strings
        dynamic(['disappear', '']) // Replacements
        )
MessageOutcome
A magic trick can turn a cat into a dogA magic trick can disappear a cat

Replacement order

The order of match elements matters: the earlier match takes the precedence. Note the difference between Outcome1 and Outcome2: This vs Thwas.

 print Message="This is an example of using replace_strings()"
| extend Outcome1 = replace_strings(
        Message,
        dynamic(['This', 'is']), // Lookup strings
        dynamic(['This', 'was']) // Replacements
        ),
        Outcome2 = replace_strings(
        Message,
        dynamic(['is', 'This']), // Lookup strings
        dynamic(['was', 'This']) // Replacements
        )
MessageOutcome1Outcome2
This is an example of using replace_strings()This was an example of using replace_strings()Thwas was an example of using replace_strings()

Nonstring replacement

Replace elements that aren’t strings aren’t replaced and the original string is kept. The match is still considered being valid, and other possible replacements aren’t performed on the matched string. In the following example, ‘This’ isn’t replaced with the numeric 12345, and it remains in the output unaffected by possible match with ‘is’.

 print Message="This is an example of using replace_strings()"
| extend Outcome = replace_strings(
        Message,
        dynamic(['This', 'is']), // Lookup strings
        dynamic([12345, 'was']) // Replacements
        )
MessageOutcome
This is an example of using replace_strings()This was an example of using replace_strings()

12.195 - reverse()

Learn how to use the reverse() function to reverse the order of the input string.

Function reverses the order of the input string. If the input value isn’t of type string, then the function forcibly casts the value to type string.

Syntax

reverse(value)

Parameters

NameTypeRequiredDescription
valuestring✔️input value.

Returns

The reverse order of a string value.

Examples

print str = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
| extend rstr = reverse(str)

Output

strrstr
ABCDEFGHIJKLMNOPQRSTUVWXYZZYXWVUTSRQPONMLKJIHGFEDCBA
print ['int'] = 12345, ['double'] = 123.45, 
['datetime'] = datetime(2017-10-15 12:00), ['timespan'] = 3h
| project rint = reverse(['int']), rdouble = reverse(['double']), 
rdatetime = reverse(['datetime']), rtimespan = reverse(['timespan'])

Output

rintrdoublerdatetimertimespan
5432154.321Z0000000.00:00:21T51-01-710200:00:30

12.196 - round()

Learn how to use the round() function to round the number to the specified precision.

Returns the rounded number to the specified precision.

Syntax

round(number [, precision])

Parameters

NameTypeRequiredDescription
numberlong or real✔️The number to calculate the round on.
precisionintThe number of digits to round to. The default is 0.

Returns

The rounded number to the specified precision.

Round is different from the bin() function in that the round() function rounds a number to a specific number of digits while the bin() function rounds the value to an integer multiple of a given bin size. For example, round(2.15, 1) returns 2.2 while bin(2.15, 1) returns 2.

Examples

round(2.98765, 3)   // 2.988
round(2.15, 1)      // 2.2
round(2.15)         // 2 // equivalent to round(2.15, 0)
round(-50.55, -2)   // -100
round(21.5, -1)     // 20

12.197 - Scalar Functions

Learn how to use scalar functions to perform calculations that return a single value.

This article lists all available scalar functions grouped by type. For aggregation functions, see Aggregation function types.

Binary functions

Function NameDescription
binary_and()Returns a result of the bitwise and operation between two values.
binary_not()Returns a bitwise negation of the input value.
binary_or()Returns a result of the bitwise or operation of the two values.
binary_shift_left()Returns binary shift left operation on a pair of numbers: a « n.
binary_shift_right()Returns binary shift right operation on a pair of numbers: a » n.
binary_xor()Returns a result of the bitwise xor operation of the two values.
bitset_count_ones()Returns the number of set bits in the binary representation of a number.

Conversion functions

Function NameDescription
tobool()Convert inputs to boolean (signed 8-bit) representation.
todatetime()Converts input to datetime scalar.
todouble()Converts the input to a value of type real.
tostring()Converts input to a string representation.
totimespan()Converts input to timespan scalar.

DateTime/timespan functions

Function NameDescription
ago()Subtracts the given timespan from the current UTC clock time.
datetime_add()Calculates a new datetime from a specified datepart multiplied by a specified amount, added to a specified datetime.
datetime_diff()Returns the end of the year containing the date, shifted by an offset, if provided.
datetime_local_to_utc()Converts local datetime to UTC datetime using a time-zone specification.
datetime_part()Extracts the requested date part as an integer value.
datetime_utc_to_local()Converts UTC datetimgoe to local datetime using a time-zone specification.
dayofmonth()Returns the integer number representing the day number of the given month.
dayofweek()Returns the integer number of days since the preceding Sunday, as a timespan.
dayofyear()Returns the integer number represents the day number of the given year.
endofday()Returns the end of the day containing the date, shifted by an offset, if provided.
endofmonth()Returns the end of the month containing the date, shifted by an offset, if provided.
endofweek()Returns the end of the week containing the date, shifted by an offset, if provided.
endofyear()Returns the end of the year containing the date, shifted by an offset, if provided.
format_datetime()Formats a datetime parameter based on the format pattern parameter.
format_timespan()Formats a format-timespan parameter based on the format pattern parameter.
getyear()Returns the year part of the datetime argument.
hourofday()Returns the integer number representing the hour number of the given date.
make_datetime()Creates a datetime scalar value from the specified date and time.
make_timespan()Creates a timespan scalar value from the specified time period.
monthofyear()Returns the integer number that represents the month number of the given year.
now()Returns the current UTC clock time, optionally offset by a given timespan.
startofday()Returns the start of the day containing the date, shifted by an offset, if provided.
startofmonth()Returns the start of the month containing the date, shifted by an offset, if provided.
startofweek()Returns the start of the week containing the date, shifted by an offset, if provided.
startofyear()Returns the start of the year containing the date, shifted by an offset, if provided.
todatetime()Converts input to datetime scalar.
totimespan()Converts input to timespan scalar.
unixtime_microseconds_todatetime()Converts unix-epoch microseconds to UTC datetime.
unixtime_milliseconds_todatetime()Converts unix-epoch milliseconds to UTC datetime.
unixtime_nanoseconds_todatetime()Converts unix-epoch nanoseconds to UTC datetime.
unixtime_seconds_todatetime()Converts unix-epoch seconds to UTC datetime.
weekofyear()Returns an integer representing the week number.

Dynamic/array functions

Function NameDescription
array_concat()Concatenates a number of dynamic arrays to a single array.
array_iff()Applies element-wise iif function on arrays.
array_index_of()Searches the array for the specified item, and returns its position.
array_length()Calculates the number of elements in a dynamic array.
array_reverse()Reverses the order of the elements in a dynamic array.
array_rotate_left()Rotates values inside a dynamic array to the left.
array_rotate_right()Rotates values inside a dynamic array to the right.
array_shift_left()Shifts values inside a dynamic array to the left.
array_shift_right()Shifts values inside a dynamic array to the right.
array_slice()Extracts a slice of a dynamic array.
array_sort_asc()Sorts a collection of arrays in ascending order.
array_sort_desc()Sorts a collection of arrays in descending order.
array_split()Builds an array of arrays split from the input array.
array_sum()Calculates the sum of a dynamic array.
bag_has_key()Checks whether a dynamic bag column contains a given key.
bag_keys()Enumerates all the root keys in a dynamic property-bag object.
bag_merge()Merges dynamic property-bags into a dynamic property-bag with all properties merged.
bag_pack()Creates a dynamic object (property bag) from a list of names and values.
bag_pack_columns()Creates a dynamic object (property bag) from a list of columns.
bag_remove_keys()Removes keys and associated values from a dynamic property-bag.
bag_set_key()Sets a given key to a given value in a dynamic property-bag.
jaccard_index()Computes the Jaccard index of two sets.
pack_all()Creates a dynamic object (property bag) from all the columns of the tabular expression.
pack_array()Packs all input values into a dynamic array.
repeat()Generates a dynamic array holding a series of equal values.
set_difference()Returns an array of the set of all distinct values that are in the first array but aren’t in other arrays.
set_has_element()Determines whether the specified array contains the specified element.
set_intersect()Returns an array of the set of all distinct values that are in all arrays.
set_union()Returns an array of the set of all distinct values that are in any of provided arrays.
treepath()Enumerates all the path expressions that identify leaves in a dynamic object.
zip()The zip function accepts any number of dynamic arrays. Returns an array whose elements are each an array with the elements of the input arrays of the same index.

Window scalar functions

Function NameDescription
next()For the serialized row set, returns a value of a specified column from the later row according to the offset.
prev()For the serialized row set, returns a value of a specified column from the earlier row according to the offset.
row_cumsum()Calculates the cumulative sum of a column.
row_number()Returns a row’s number in the serialized row set - consecutive numbers starting from a given index or from 1 by default.
row_rank_dense()Returns a row’s dense rank in the serialized row set.
row_rank_min()Returns a row’s minimal rank in the serialized row set.

Flow control functions

Function NameDescription
toscalar()Returns a scalar constant value of the evaluated expression.

Mathematical functions

Function NameDescription
abs()Calculates the absolute value of the input.
acos()Returns the angle whose cosine is the specified number (the inverse operation of cos()).
asin()Returns the angle whose sine is the specified number (the inverse operation of sin()).
atan()Returns the angle whose tangent is the specified number (the inverse operation of tan()).
atan2()Calculates the angle, in radians, between the positive x-axis and the ray from the origin to the point (y, x).
beta_cdf()Returns the standard cumulative beta distribution function.
beta_inv()Returns the inverse of the beta cumulative probability beta density function.
beta_pdf()Returns the probability density beta function.
cos()Returns the cosine function.
cot()Calculates the trigonometric cotangent of the specified angle, in radians.
degrees()Converts angle value in radians into value in degrees, using formula degrees = (180 / PI) * angle-in-radians.
erf()Returns the error function.
erfc()Returns the complementary error function.
exp()The base-e exponential function of x, which is e raised to the power x: e^x.
exp10()The base-10 exponential function of x, which is 10 raised to the power x: 10^x.
exp2()The base-2 exponential function of x, which is 2 raised to the power x: 2^x.
gamma()Computes gamma function.
isfinite()Returns whether input is a finite value (isn’t infinite or NaN).
isinf()Returns whether input is an infinite (positive or negative) value.
isnan()Returns whether input is Not-a-Number (NaN) value.
log()Returns the natural logarithm function.
log10()Returns the common (base-10) logarithm function.
log2()Returns the base-2 logarithm function.
loggamma()Computes log of absolute value of the gamma function.
not()Reverses the value of its bool argument.
pi()Returns the constant value of Pi (π).
pow()Returns a result of raising to power.
radians()Converts angle value in degrees into value in radians, using formula radians = (PI / 180) * angle-in-degrees.
rand()Returns a random number.
range()Generates a dynamic array holding a series of equally spaced values.
round()Returns the rounded source to the specified precision.
sign()Sign of a numeric expression.
sin()Returns the sine function.
sqrt()Returns the square root function.
tan()Returns the tangent function.
welch_test()Computes the p-value of the Welch-test function.

Metadata functions

Function NameDescription
column_ifexists()Takes a column name as a string and a default value. Returns a reference to the column if it exists, otherwise - returns the default value.
current_cluster_endpoint()Returns the current cluster running the query.
current_database()Returns the name of the database in scope.
current_principal()Returns the current principal running this query.
current_principal_details()Returns details of the principal running the query.
current_principal_is_member_of()Checks group membership or principal identity of the current principal running the query.
cursor_after()Used to access to the records that were ingested after the previous value of the cursor.
estimate_data_size()Returns an estimated data size of the selected columns of the tabular expression.
extent_id()Returns a unique identifier that identifies the data shard (“extent”) that the current record resides in.
extent_tags()Returns a dynamic array with the tags of the data shard (“extent”) that the current record resides in.
ingestion_time()Retrieves the record’s $IngestionTime hidden datetime column, or null.

Rounding functions

Function NameDescription
bin()Rounds values down to an integer multiple of a given bin size.
bin_at()Rounds values down to a fixed-size “bin”, with control over the bin’s starting point. (See also bin function.)
ceiling()Calculates the smallest integer greater than, or equal to, the specified numeric expression.

Conditional functions

Function NameDescription
case()Evaluates a list of predicates and returns the first result expression whose predicate is satisfied.
coalesce()Evaluates a list of expressions and returns the first non-null (or non-empty for string) expression.
iff()Evaluate the first argument (the predicate), and returns the value of either the second or third arguments, depending on whether the predicate evaluated to true (second) or false (third).
max_of()Returns the maximum value of several evaluated numeric expressions.
min_of()Returns the minimum value of several evaluated numeric expressions.

Series element-wise functions

Function NameDescription
series_abs()Calculates the element-wise absolute value of the numeric series input.
series_acos()Calculates the element-wise arccosine function of the numeric series input.
series_add()Calculates the element-wise addition of two numeric series inputs.
series_asin()Calculates the element-wise arcsine function of the numeric series input.
series_atan()Calculates the element-wise arctangent function of the numeric series input.
series_ceiling()Calculates the element-wise ceiling function of the numeric series input.
series_cos()Calculates the element-wise cosine function of the numeric series input.
series_divide()Calculates the element-wise division of two numeric series inputs.
series_equals()Calculates the element-wise equals (==) logic operation of two numeric series inputs.
series_exp()Calculates the element-wise base-e exponential function (e^x) of the numeric series input.
series_floor()Calculates the element-wise floor function of the numeric series input.
series_greater()Calculates the element-wise greater (>) logic operation of two numeric series inputs.
series_greater_equals()Calculates the element-wise greater or equals (>=) logic operation of two numeric series inputs.
series_less()Calculates the element-wise less (<) logic operation of two numeric series inputs.
series_less_equals()Calculates the element-wise less or equal (<=) logic operation of two numeric series inputs.
series_log()Calculates the element-wise natural logarithm function (base-e) of the numeric series input.
series_multiply()Calculates the element-wise multiplication of two numeric series inputs.
series_not_equals()Calculates the element-wise not equals (!=) logic operation of two numeric series inputs.
series_pow()Calculates the element-wise power of two numeric series inputs.
series_sign()Calculates the element-wise sign of the numeric series input.
series_sin()Calculates the element-wise sine function of the numeric series input.
series_subtract()Calculates the element-wise subtraction of two numeric series inputs.
series_tan()Calculates the element-wise tangent function of the numeric series input.

Series processing functions

Function NameDescription
series_cosine_similarity()Calculates the cosine similarity of two numeric series.
series_decompose()Does a decomposition of the series into components.
series_decompose_anomalies()Finds anomalies in a series based on series decomposition.
series_decompose_forecast()Forecast based on series decomposition.
series_dot_product()Calculates the dot product of two numeric series.
series_fill_backward()Performs backward fill interpolation of missing values in a series.
series_fill_const()Replaces missing values in a series with a specified constant value.
series_fill_forward()Performs forward fill interpolation of missing values in a series.
series_fill_linear()Performs linear interpolation of missing values in a series.
series_fft()Applies the Fast Fourier Transform (FFT) on a series.
series_fir()Applies a Finite Impulse Response filter on a series.
series_fit_2lines()Applies two segments linear regression on a series, returning multiple columns.
series_fit_2lines_dynamic()Applies two segments linear regression on a series, returning dynamic object.
series_fit_line()Applies linear regression on a series, returning multiple columns.
series_fit_line_dynamic()Applies linear regression on a series, returning dynamic object.
series_fit_poly()Applies polynomial regression on a series, returning multiple columns.
series_ifft()Applies the Inverse Fast Fourier Transform (IFFT) on a series.
series_iir()Applies an Infinite Impulse Response filter on a series.
series_magnitude()Calculates the magnitude of the numeric series.
series_outliers()Scores anomaly points in a series.
series_pearson_correlation()Calculates the Pearson correlation coefficient of two series.
series_periods_detect()Finds the most significant periods that exist in a time series.
series_periods_validate()Checks whether a time series contains periodic patterns of given lengths.
series_seasonal()Finds the seasonal component of the series.
series_stats()Returns statistics for a series in multiple columns.
series_stats_dynamic()Returns statistics for a series in dynamic object.
series_sum()Calculates the sum of numeric series elements.

String functions

Function NameDescription
base64_encode_tostring()Encodes a string as base64 string.
base64_encode_fromguid()Encodes a GUID as base64 string.
base64_decode_tostring()Decodes a base64 string to a UTF-8 string.
base64_decode_toarray()Decodes a base64 string to an array of long values.
base64_decode_toguid()Decodes a base64 string to a GUID.
countof()Counts occurrences of a substring in a string. Plain string matches may overlap; regex matches don’t.
extract()Get a match for a regular expression from a text string.
extract_all()Get all matches for a regular expression from a text string.
extract_json()Get a specified element out of a JSON text using a path expression.
has_any_index()Searches the string for items specified in the array and returns the position of the first item found in the string.
indexof()Function reports the zero-based index of the first occurrence of a specified string within input string.
isempty()Returns true if the argument is an empty string or is null.
isnotempty()Returns true if the argument isn’t an empty string or a null.
isnotnull()Returns true if the argument is not null.
isnull()Evaluates its sole argument and returns a bool value indicating if the argument evaluates to a null value.
parse_command_line()Parses a Unicode command line string and returns an array of the command line arguments.
parse_csv()Splits a given string representing comma-separated values and returns a string array with these values.
parse_ipv4()Converts input to long (signed 64-bit) number representation.
parse_ipv4_mask()Converts input string and IP-prefix mask to long (signed 64-bit) number representation.
parse_ipv6()Converts IPv6 or IPv4 string to a canonical IPv6 string representation.
parse_ipv6_mask()Converts IPv6 or IPv4 string and netmask to a canonical IPv6 string representation.
parse_json()Interprets a string as a JSON value and returns the value as dynamic.
parse_url()Parses an absolute URL string and returns a dynamic object contains all parts of the URL.
parse_urlquery()Parses a url query string and returns a dynamic object contains the Query parameters.
parse_version()Converts input string representation of version to a comparable decimal number.
replace_regex()Replace all regex matches with another string.
replace_string()Replace all single string matches with a specified string.
replace_strings()Replace all multiple strings matches with specified strings.
punycode_from_string()Encodes domain name to Punycode form.
punycode_to_string()Decodes domain name from Punycode form.
reverse()Function makes reverse of input string.
split()Splits a given string according to a given delimiter and returns a string array with the contained substrings.
strcat()Concatenates between 1 and 64 arguments.
strcat_delim()Concatenates between 2 and 64 arguments, with delimiter, provided as first argument.
strcmp()Compares two strings.
strlen()Returns the length, in characters, of the input string.
strrep()Repeats given string provided number of times (default - 1).
substring()Extracts a substring from a source string starting from some index to the end of the string.
toupper()Converts a string to upper case.
translate()Replaces a set of characters (‘searchList’) with another set of characters (‘replacementList’) in a given a string.
trim()Removes all leading and trailing matches of the specified regular expression.
trim_end()Removes trailing match of the specified regular expression.
trim_start()Removes leading match of the specified regular expression.
url_decode()The function converts encoded URL into a regular URL representation.
url_encode()The function converts characters of the input URL into a format that can be transmitted over the Internet.

IPv4/IPv6 functions

Function NameDescription
ipv4_compare()Compares two IPv4 strings.
ipv4_is_in_range()Checks if IPv4 string address is in IPv4-prefix notation range.
ipv4_is_in_any_range()Checks if IPv4 string address is any of the IPv4-prefix notation ranges.
ipv4_is_match()Matches two IPv4 strings.
ipv4_is_private()Checks if IPv4 string address belongs to a set of private network IPs.
ipv4_netmask_suffixReturns the value of the IPv4 netmask suffix from IPv4 string address.
parse_ipv4()Converts input string to long (signed 64-bit) number representation.
parse_ipv4_mask()Converts input string and IP-prefix mask to long (signed 64-bit) number representation.
ipv4_range_to_cidr_list()Converts IPv4 address range to a list of CIDR ranges.
ipv6_compare()Compares two IPv4 or IPv6 strings.
ipv6_is_match()Matches two IPv4 or IPv6 strings.
parse_ipv6()Converts IPv6 or IPv4 string to a canonical IPv6 string representation.
parse_ipv6_mask()Converts IPv6 or IPv4 string and netmask to a canonical IPv6 string representation.
format_ipv4()Parses input with a netmask and returns string representing IPv4 address.
format_ipv4_mask()Parses input with a netmask and returns string representing IPv4 address as CIDR notation.
ipv6_is_in_range()Checks if an IPv6 string address is in IPv6-prefix notation range.
ipv6_is_in_any_range()Checks if an IPv6 string address is in any of the IPv6-prefix notation ranges.
geo_info_from_ip_address()Retrieves geolocation information about IPv4 or IPv6 addresses.

IPv4 text match functions

Function NameDescription
has_ipv4()Searches for an IPv4 address in a text.
has_ipv4_prefix()Searches for an IPv4 address or prefix in a text.
has_any_ipv4()Searches for any of the specified IPv4 addresses in a text.
has_any_ipv4_prefix()Searches for any of the specified IPv4 addresses or prefixes in a text.

Type functions

Function NameDescription
gettype()Returns the runtime type of its single argument.

Scalar aggregation functions

Function NameDescription
dcount_hll()Calculates the dcount from hll results (which was generated by hll or hll-merge).
hll_merge()Merges hll results (scalar version of the aggregate version hll-merge()).
percentile_tdigest()Calculates the percentile result from tdigest results (which was generated by tdigest or merge_tdigest).
percentile_array_tdigest()Calculates the percentile array result from tdigest results (which was generated by tdigest or merge_tdigest).
percentrank_tdigest()Calculates the percentage ranking of a value in a dataset.
rank_tdigest()Calculates relative rank of a value in a set.
merge_tdigest()Merge tdigest results (scalar version of the aggregate version tdigest-merge()).

Geospatial functions

Function NameDescription
geo_angle()Calculates clockwise angle in radians between two lines on Earth.
geo_azimuth()Calculates clockwise angle in radians between the line from point1 to true north and a line from point1 to point2 on Earth.
geo_distance_2points()Calculates the shortest distance between two geospatial coordinates on Earth.
geo_distance_point_to_line()Calculates the shortest distance between a coordinate and a line or multiline on Earth.
geo_distance_point_to_polygon()Calculates the shortest distance between a coordinate and a polygon or multipolygon on Earth.
geo_intersects_2lines()Calculates whether the two lines or multilines intersects.
geo_intersects_2polygons()Calculates whether the two polygons or multipolygons intersects.
geo_intersects_line_with_polygon()Calculates whether the line or multiline intersects with polygon or multipolygon.
geo_intersection_2lines()Calculates the intersection of two lines or multilines.
geo_intersection_2polygons()Calculates the intersection of two polygons or multipolygons.
geo_intersection_line_with_polygon()Calculates the intersection of line or multiline with polygon or multipolygon.
geo_point_buffer()Calculates polygon that contains all points within the given radius of the point on Earth.
geo_point_in_circle()Calculates whether the geospatial coordinates are inside a circle on Earth.
geo_point_in_polygon()Calculates whether the geospatial coordinates are inside a polygon or a multipolygon on Earth.
geo_point_to_geohash()Calculates the Geohash string value for a geographic location.
geo_point_to_s2cell()Calculates the S2 Cell token string value for a geographic location.
geo_point_to_h3cell()Calculates the H3 Cell token string value for a geographic location.
geo_line_buffer()Calculates polygon or multipolygon that contains all points within the given radius of the input line or multiline on Earth.
geo_line_centroid()Calculates the centroid of line or a multiline on Earth.
geo_line_densify()Converts planar line edges to geodesics by adding intermediate points.
geo_line_length()Calculates the total length of line or a multiline on Earth.
geo_line_simplify()Simplifies line or a multiline by replacing nearly straight chains of short edges with a single long edge on Earth.
geo_line_to_s2cells()Calculates S2 cell tokens that cover a line or multiline on Earth. Useful geospatial join tool.
geo_polygon_area()Calculates the area of polygon or a multipolygon on Earth.
geo_polygon_buffer()Calculates polygon or multipolygon that contains all points within the given radius of the input polygon or multipolygon on Earth.
geo_polygon_centroid()Calculates the centroid of polygon or a multipolygon on Earth.
geo_polygon_densify()Converts polygon or multipolygon planar edges to geodesics by adding intermediate points.
geo_polygon_perimeter()Calculates the length of the boundary of polygon or a multipolygon on Earth.
geo_polygon_simplify()Simplifies polygon or a multipolygon by replacing nearly straight chains of short edges with a single long edge on Earth.
geo_polygon_to_s2cells()Calculates S2 Cell tokens that cover a polygon or multipolygon on Earth. Useful geospatial join tool.
geo_polygon_to_h3cells()Converts polygon to H3 cells. Useful geospatial join and visualization tool.
geo_geohash_to_central_point()Calculates the geospatial coordinates that represent the center of a Geohash rectangular area.
geo_geohash_neighbors()Calculates the geohash neighbors.
geo_geohash_to_polygon()Calculates the polygon that represents the geohash rectangular area.
geo_s2cell_to_central_point()Calculates the geospatial coordinates that represent the center of an S2 Cell.
geo_s2cell_neighbors()Calculates the S2 cell neighbors.
geo_s2cell_to_polygon()Calculates the polygon that represents the S2 Cell rectangular area.
geo_h3cell_to_central_point()Calculates the geospatial coordinates that represent the center of an H3 Cell.
geo_h3cell_neighbors()Calculates the H3 cell neighbors.
geo_h3cell_to_polygon()Calculates the polygon that represents the H3 Cell rectangular area.
geo_h3cell_parent()Calculates the H3 cell parent.
geo_h3cell_children()Calculates the H3 cell children.
geo_h3cell_level()Calculates the H3 cell resolution.
geo_h3cell_rings()Calculates the H3 cell Rings.
geo_simplify_polygons_array()Simplifies polygons by replacing nearly straight chains of short edges with a single long edge, while ensuring mutual boundaries consistency related to each other, on Earth.
geo_union_lines_array()Calculates the union of lines or multilines on Earth.
geo_union_polygons_array()Calculates the union of polygons or multipolygons on Earth.

Hash functions

Function NameDescription
hash()Returns a hash value for the input value.
hash_combine()Combines two or more hash values.
hash_many()Returns a combined hash value of multiple values.
hash_md5()Returns an MD5 hash value for the input value.
hash_sha1()Returns a SHA1 hash value for the input value.
hash_sha256()Returns a SHA256 hash value for the input value.
hash_xxhash64()Returns an XXHASH64 hash value for the input value.

Units conversion functions

Function NameDescription
convert_angle()Returns the input value converted from one angle unit to another
convert_energy()Returns the input value converted from one energy unit to another
convert_force()Returns the input value converted from one force unit to another
convert_length()Returns the input value converted from one length unit to another
convert_mass()Returns the input value converted from one mass unit to another
convert_speed()Returns the input value converted from one speed unit to another
convert_temperature()Returns the input value converted from one temperature unit to another
convert_volume()Returns the input value converted from one volume unit to another

| convert_volume() | Returns the input value converted from one volume unit to another |

12.198 - set_difference()

Learn how to use the set_difference() function to create a difference set of all distinct values in the first array that aren’t in the other array inputs.

Returns a dynamic (JSON) array of the set of all distinct values that are in the first array but aren’t in other arrays - (((arr1 \ arr2) \ arr3) \ …).

Syntax

set_difference(set1, set2 [,set3, …])

Parameters

NameTypeRequiredDescription
set1…setNdynamic✔️Arrays used to create a difference set. A minimum of 2 arrays are required. See pack_array.

Returns

Returns a dynamic array of the set of all distinct values that are in set1 but aren’t in other arrays.

Example

range x from 1 to 3 step 1
| extend y = x * 2
| extend z = y * 2
| extend w = z * 2
| extend a1 = pack_array(x,y,x,z), a2 = pack_array(x, y), a3 = pack_array(x,y,w)
| project set_difference(a1, a2, a3)

Output

Column1
[4]
[8]
[12]
print arr = set_difference(dynamic([1,2,3]), dynamic([1,2,3]))

Output

arr
[]

12.199 - set_has_element()

Learn how to use the set_has_element() function to determine if the input set contains the specified value.

Determines whether the specified set contains the specified element.

Syntax

set_has_element(set, value)

Parameters

NameTypeRequiredDescription
setdynamic✔️The input array to search.
value✔️The value for which to search. The value should be of type long, int, double, datetime, timespan, decimal, string, guid, or bool.

Returns

true or false depending on if the value exists in the array.

Example

print arr=dynamic(["this", "is", "an", "example"]) 
| project Result=set_has_element(arr, "example")

Output

Result
true

Use array_index_of(arr, value) to find the position at which the value exists in the array. Both functions are equally performant.

12.200 - set_intersect()

Learn how to use the set_intersect() function to create a set of the distinct values that are in all the array inputs.

Returns a dynamic array of the set of all distinct values that are in all arrays - (arr1 ∩ arr2 ∩ …).

Syntax

set_intersect(set1, set2 [, set3, …])

Parameters

NameTypeRequiredDescription
set1…setNdynamic✔️Arrays used to create an intersect set. A minimum of 2 arrays are required. See pack_array.

Returns

Returns a dynamic array of the set of all distinct values that are in all arrays.

Example

range x from 1 to 3 step 1
| extend y = x * 2
| extend z = y * 2
| extend w = z * 2
| extend a1 = pack_array(x,y,x,z), a2 = pack_array(x, y), a3 = pack_array(w,x)
| project set_intersect(a1, a2, a3)

Output

Column1
[1]
[2]
[3]
print arr = set_intersect(dynamic([1, 2, 3]), dynamic([4,5]))

Output

arr
[]

12.201 - set_union()

Learn how to use the set_union() function to create a union set of all the distinct values in all of the array inputs.

Returns a dynamic array of the set of all distinct values that are in any of the arrays - (arr1 ∪ arr2 ∪ …).

Syntax

set_union(set1, set2 [, set3, …])

Parameters

NameTypeRequiredDescription
set1…setNdynamic✔️Arrays used to create a union set. A minimum of two arrays are required. See pack_array.

Returns

Returns a dynamic array of the set of all distinct values that are in any of arrays.

Example

Set from multiple dynamic array

range x from 1 to 3 step 1
| extend y = x * 2
| extend z = y * 2
| extend w = z * 2
| extend a1 = pack_array(x,y,x,z), a2 = pack_array(x, y), a3 = pack_array(w)
| project a1,a2,a3,Out=set_union(a1, a2, a3)

Output

a1a2a3Out
[1,2,1,4][1,2][8][1,2,4,8]
[2,4,2,8][2,4][16][2,4,8,16]
[3,6,3,12][3,6][24][3,6,12,24]

Set from one dynamic array

datatable (Arr1: dynamic)
[
    dynamic(['A4', 'A2', 'A7', 'A2']), 
    dynamic(['C4', 'C7', 'C1', 'C4'])
] 
| extend Out=set_union(Arr1, Arr1)

Output

Arr1Out
[“A4”,“A2”,“A7”,“A2”][“A4”,“A2”,“A7”]
[“C4”,“C7”,“C1”,“C4”][“C4”,“C7”,“C1”]

12.202 - sign()

Learn how to use the sign() function to return the sign of the numeric expression.

Returns the sign of the numeric expression.

Syntax

sign(number)

Parameters

NameTypeRequiredDescription
numberreal✔️The number for which to return the sign.

Returns

The positive (+1), zero (0), or negative (-1) sign of the specified expression.

Examples

print s1 = sign(-42), s2 = sign(0), s3 = sign(11.2)

Output

s1s2s3
-101

12.203 - sin()

Learn how to use the sin() function to return the sine value of the input.

Returns the sine function value of the specified angle. The angle is specified in radians.

Syntax

sin(number)

Parameters

NameTypeRequiredDescription
numberreal✔️The value in radians for which to calculate the sine.

Returns

The sine of number of radians.

Example

print sin(1)

Output

result
0.841470984807897

12.204 - split()

Learn how to use the split() function to split the source string according to a given delimiter.

The split() function takes a string and splits it into substrings based on a specified delimiter, returning the substrings in an array. Optionally, you can retrieve a specific substring by specifying its index.

Syntax

split(source, delimiter [, requestedIndex])

Parameters

NameTypeRequiredDescription
sourcestring✔️The source string that is split according to the given delimiter.
delimiterstring✔️The delimiter that will be used in order to split the source string.
requestedIndexintA zero-based index. If provided, the returned string array contains the requested substring at the index if it exists.

Returns

An array of substrings obtained by separating the source string by the specified delimiter, or a single substring at the specified requestedIndex.

Examples

print
    split("aa_bb", "_"),           // ["aa","bb"]
    split("aaa_bbb_ccc", "_", 1),  // ["bbb"]
    split("", "_"),                // [""]
    split("a__b", "_"),            // ["a","","b"]
    split("aabbcc", "bb")          // ["aa","cc"]
print_0print_1print_2print_3print4
[“aa”,“bb”][“bbb”][""][“a”,"",“b”][“aa”,“cc”]

12.205 - sqrt()

Learn how to use the sqrt() function to return the square root of the input,

Returns the square root of the input.

Syntax

sqrt(number)

Parameters

NameTypeRequiredDescription
numberint, long, or real✔️The number for which to calculate the square root.

Returns

  • A positive number such that sqrt(x) * sqrt(x) == x
  • null if the argument is negative or can’t be converted to a real value.

12.206 - startofday()

Learn how to use the startofday() function to return the start of the day for the given date.

Returns the start of the day containing the date, shifted by an offset, if provided.

Syntax

startofday(date [, offset ])

Parameters

NameTypeRequiredDescription
datedatetime✔️The date for which to find the start.
offsetintThe number of days to offset from the input date. The default is 0.

Returns

A datetime representing the start of the day for the given date value, with the offset, if specified.

Example

range offset from -1 to 1 step 1
| project dayStart = startofday(datetime(2017-01-01 10:10:17), offset) 

Output

dayStart
2016-12-31 00:00:00.0000000
2017-01-01 00:00:00.0000000
2017-01-02 00:00:00.0000000

12.207 - startofmonth()

Learn how to use the startofmonth() function to return the start of the month for the given date.

Returns the start of the month containing the date, shifted by an offset, if provided.

Syntax

startofmonth(date [, offset ])

Parameters

NameTypeRequiredDescription
datedatetime✔️The date for which to find the start of month.
offsetintThe number of months to offset from the input date. The default is 0.

Returns

A datetime representing the start of the month for the given date value, with the offset, if specified.

Example

range offset from -1 to 1 step 1
| project monthStart = startofmonth(datetime(2017-01-01 10:10:17), offset) 

Output

monthStart
2016-12-01 00:00:00.0000000
2017-01-01 00:00:00.0000000
2017-02-01 00:00:00.0000000

12.208 - startofweek()

Learn how to use the startofweek() function to return the start of the week for the given date.

Returns the start of the week containing the date, shifted by an offset, if provided.

Start of the week is considered to be a Sunday.

Syntax

startofweek(date [, offset ])

Parameters

NameTypeRequiredDescription
datedatetime✔️The date for which to find the start of week.
offsetintThe number of weeks to offset from the input date. The default is 0.

Returns

A datetime representing the start of the week for the given date value, with the offset, if specified.

Example

range offset from -1 to 1 step 1
| project weekStart = startofweek(datetime(2017-01-01 10:10:17), offset) 

Output

weekStart
2016-12-25 00:00:00.0000000
2017-01-01 00:00:00.0000000
2017-01-08 00:00:00.0000000

12.209 - startofyear()

Learn how to use the startofyear() function to return the start of the year for the given date.

Returns the start of the year containing the date, shifted by an offset, if provided.

Syntax

startofyear(date [, offset ])

Parameters

NameTypeRequiredDescription
datedatetime✔️The date for which to find the start of the year.
offsetintThe number of years to offset from the input date. The default is 0.

Returns

A datetime representing the start of the year for the given date value, with the offset, if specified.

Example

range offset from -1 to 1 step 1
| project yearStart = startofyear(datetime(2017-01-01 10:10:17), offset) 

Output

yearStart
2016-01-01 00:00:00.0000000
2017-01-01 00:00:00.0000000
2018-01-01 00:00:00.0000000

12.210 - strcat_array()

Learn how to use the strcat_array() function to create a concatenated string of array values using a specified delimiter.

Creates a concatenated string of array values using a specified delimiter.

Syntax

strcat_array(array, delimiter)

Parameters

NameTypeRequiredDescription
arraydynamic✔️An array of values to be concatenated.
delimeterstring✔️The value used to concatenate the values in array.

Returns

The input array values concatenated to a single string with the specified delimiter.

Examples

Custom delimeter

print str = strcat_array(dynamic([1, 2, 3]), "->")

Output

str
1->2->3

Using quotes as the delimeter

To use quotes as the delimeter, enclose the quotes in single quotes.

print str = strcat_array(dynamic([1, 2, 3]), '"')

Output

str
1"2"3

12.211 - strcat_delim()

Learn how to use the strcat_delim() function to concatenate between 2 and 64 arguments using a specified delimiter as the first argument.

Concatenates between 2 and 64 arguments, using a specified delimiter as the first argument.

Syntax

strcat_delim(delimiter, argument1, argument2[ , argumentN])

Parameters

NameTypeRequiredDescription
delimiterstring✔️The string to be used as separator in the concatenation.
argument1argumentNscalar✔️The expressions to concatenate.

Returns

The arguments concatenated to a single string with delimiter.

Example

print st = strcat_delim('-', 1, '2', 'A', 1s)

Output

st
1-2-A-00:00:01

12.212 - strcat()

Learn how to use the strcat() function to concatenate between 1 and 64 arguments.

Concatenates between 1 and 64 arguments.

Syntax

strcat(argument1, argument2 [, argument3 … ])

Parameters

NameTypeRequiredDescription
argument1argumentNscalar✔️The expressions to concatenate.

Returns

The arguments concatenated to a single string.

Examples

Concatenated string

The following example uses the strcat() function to concatenate the strings provided to form the string, “hello world.” The results are assigned to the variable str.

print str = strcat("hello", " ", "world")

Output

str
hello world

Concatenated multi-line string

The following example uses the strcat() function to create a concatenated multi-line string which is saved to the variable, MultiLineString. It uses the newline character to break the string into new lines.

print MultiLineString = strcat("Line 1\n", "Line 2\n", "Line 3")

Output

The results show the expanded row view with the multiline string.

MultiLineString
1. “MultiLineString”: Line 1
2. Line 2
3. Line 3

12.213 - strcmp()

Learn how to use the strcmp() function to compare two strings.

Compares two strings.

The function starts comparing the first character of each string. If they’re equal to each other, it continues with the following pairs until the characters differ or until the end of shorter string is reached.

Syntax

strcmp(string1, string2)

Parameters

NameTypeRequiredDescription
string1string✔️The first input string for comparison.
string2string✔️The second input string for comparison.

Returns

Returns an integer value indicating the relationship between the strings:

  • <0 - the first character that doesn’t match has a lower value in string1 than in string2
  • 0 - the contents of both strings are equal
  • >0 - the first character that doesn’t match has a greater value in string1 than in string2

Example

datatable(string1:string, string2:string) [
    "ABC","ABC",
    "abc","ABC",
    "ABC","abc",
    "abcde","abc"
]
| extend result = strcmp(string1,string2)

Output

string1string2result
ABCABC0
abcABC1
ABCabc-1
abcdeabc1

12.214 - string_size()

Learn how to use the string_size() function to measure the size of the input string.

Returns the size, in bytes, of the input string.

Syntax

string_size(source)

Parameters

NameTypeRequiredDescription
sourcestring✔️The string for which to return the byte size.

Returns

Returns the length, in bytes, of the input string.

Examples

String of letters

print size = string_size("hello")

Output

size
5

String of letters and symbols

print size = string_size("⒦⒰⒮⒯⒪")

Output

size
15

12.215 - strlen()

Learn how to use the strlen() function to measure the length of the input string.

Returns the length, in characters, of the input string.

Syntax

strlen(source)

Parameters

NameTypeRequiredDescription
sourcestring✔️The string for which to return the length.

Returns

Returns the length, in characters, of the input string.

Examples

String of letters

print length = strlen("hello")

Output

length
5

String of letters and symbols

print length = strlen("⒦⒰⒮⒯⒪")

Output

length
5

String with grapheme

print strlen('Çedilla') // the first character is a grapheme cluster
                        // that requires 2 code points to represent

Output

length
8

12.216 - strrep()

Learn how to use the strrep() function to repeat the input value.

Replicates a string the number of times specified.

Syntax

strrep(value, multiplier, [ delimiter ])

Parameters

NameTypeRequiredDescription
valuestring✔️The string to replicate.
multiplierint✔️The amount of times to replicate the string. Must be a value from 1 to 67108864.
delimiterstringThe delimeter used to separate the string replications. The default delimiter is an empty string.

Returns

The value string repeated the number of times as specified by multiplier, concatenated with delimiter.

If multiplier is more than the maximal allowed value of 1024, the input string will be repeated 1024 times.

Example

print from_str = strrep('ABC', 2), from_int = strrep(123,3,'.'), from_time = strrep(3s,2,' ')

Output

from_strfrom_intfrom_time
ABCABC123.123.12300:00:03 00:00:03

12.217 - substring()

Learn how to use the substring() function to extract a substring from the source string.

Extracts a substring from the source string starting from some index to the end of the string.

Optionally, the length of the requested substring can be specified.

Syntax

substring(source, startingIndex [, length])

Parameters

NameTypeRequiredDescription
sourcestring✔️The string from which to take the substring.
startingIndexint✔️The zero-based starting character position of the requested substring. If a negative number, the substring will be retrieved from the end of the source string.
lengthintThe requested number of characters in the substring. The default behavior is to take from startingIndex to the end of the source string.

Returns

A substring from the given string. The substring starts at startingIndex (zero-based) character position and continues to the end of the string or length characters if specified.

Examples

substring("123456", 1)        // 23456
substring("123456", 2, 2)     // 34
substring("ABCD", 0, 2)       // AB
substring("123456", -2, 2)    // 56

12.218 - tan()

Learn how to use the tan() function to return the tangent value of the specified number.

Returns the tangent value of the specified number.

Syntax

tan(x)

Parameters

NameTypeRequiredDescription
xreal✔️The number for which to calculate the tangent.

Returns

The result of tan(x)

12.219 - The has_any_index operator

Learn how to use the has_any_index operator to search the input string for items specified in the array.

Searches the string for items specified in the array and returns the position in the array of the first item found in the string.

Syntax

has_any_index (source, values)

Parameters

NameTypeRequiredDescription
sourcestring✔️The value to search.
valuesdynamic✔️An array of scalar or literal expressions to look up.

Returns

Zero-based index position of the first item in values that is found in source. Returns -1 if none of the array items were found in the string or if values is empty.

Example

print
 idx1 = has_any_index("this is an example", dynamic(['this', 'example']))  // first lookup found in input string
 , idx2 = has_any_index("this is an example", dynamic(['not', 'example'])) // last lookup found in input string
 , idx3 = has_any_index("this is an example", dynamic(['not', 'found'])) // no lookup found in input string
 , idx4 = has_any_index("Example number 2", range(1, 3, 1)) // Lookup array of integers
 , idx5 = has_any_index("this is an example", dynamic([]))  // Empty lookup array

Output

idx1idx2idx3idx4idx5
01-11-1

12.220 - tobool()

Learn how to use the tobool() function to convert an input to a boolean representation.

Convert inputs to boolean (signed 8-bit) representation.

Syntax

tobool(value)

Parameters

NameTypeRequiredDescription
valuestring✔️The value to convert to boolean.

Returns

If conversion is successful, result will be a boolean. If conversion isn’t successful, result will be null.

Example

tobool("true") == true
tobool("false") == false
tobool(1) == true
tobool(123) == true

12.221 - todatetime()

Learn how to use the todatetime() function to convert the input expression to a datetime value.

Converts the input to a datetime scalar value.

Syntax

todatetime(value)

Parameters

NameTypeRequiredDescription
valuescalar✔️The value to convert to datetime.

Returns

If the conversion is successful, the result will be a datetime value. Else, the result will be null.

Example

The following example converts a date and time string into a datetime value.

print todatetime("2015-12-31 23:59:59.9")

The following example compares a converted date string to a datetime value.

print todatetime('12-02-2022') == datetime('12-02-2022')

Output

print_0
true

12.222 - todecimal()

Learn how to use the todecimal() function to convert the input expression to a decimal number representation.

Converts the input to a decimal number representation.

Syntax

todecimal(value)

Parameters

NameTypeRequiredDescription
valuescalar✔️The value to convert to a decimal.

Returns

If conversion is successful, result will be a decimal number. If conversion isn’t successful, result will be null.

Example

print todecimal("123.45678") == decimal(123.45678)

Output

print_0
true

12.223 - toguid()

Learn how to use the toguid() function to convert the input string to a guid scalar.

Converts a string to a guid scalar.

Syntax

toguid(value)

Parameters

NameTypeRequiredDescription
valuescalar✔️The value to convert to guid.

Returns

The conversion process takes the first 32 characters of the input, ignoring properly located hyphens, validates that the characters are between 0-9 or a-f, and then converts the string into a guid scalar. The rest of the string is ignored.

If the conversion is successful, the result will be a guid scalar. Otherwise, the result will be null.

Example

datatable(str: string)
[
    "0123456789abcdef0123456789abcdef",
    "0123456789ab-cdef-0123-456789abcdef",
    "a string that is not a guid"
]
| extend guid = toguid(str)

Output

strguid
0123456789abcdef0123456789abcdef01234567-89ab-cdef-0123-456789abcdef
0123456789ab-cdef-0123-456789abcdef01234567-89ab-cdef-0123-456789abcdef
a string that isn’t a guid

12.224 - tohex()

Learn how to use the tohex() function to convert the input value to a hexadecimal string.

Converts input to a hexadecimal string.

Syntax

tohex(value, [, minLength ])

Parameters

NameTypeRequiredDescription
valueint or long✔️The value that will be converted to a hex string.
minLengthintThe value representing the number of leading characters to include in the output. Values between 1 and 16 are supported. Values greater than 16 will be truncated to 16. If the string is longer than minLength without leading characters, then minLength is effectively ignored. Negative numbers may only be represented at minimum by their underlying data size, so for an integer (32-bit) the minLength will be at minimum 8, for a long (64-bit) it will be at minimum 16.

Returns

If conversion is successful, result will be a string value. If conversion isn’t successful, result will be null.

Example

print
    tohex(256) == '100',
    tohex(-256) == 'ffffffffffffff00', // 64-bit 2's complement of -256
    tohex(toint(-256), 8) == 'ffffff00', // 32-bit 2's complement of -256
    tohex(256, 8) == '00000100',
    tohex(256, 2) == '100' // Exceeds min length of 2, so min length is ignored.

Output

print_0print_1print_2print_3print_04
truetruetruetruetrue

12.225 - toint()

Learn how to use the toint() function to convert the input value to an integer number representation.

Converts the input to an integer value (signed 32-bit) number representation.

Syntax

toint(value)

Parameters

NameTypeRequiredDescription
valuescalar✔️The value to convert to an integer.

Returns

If the conversion is successful, the result is an integer. Otherwise, the result is null. If the input includes a decimal value, the result truncate to only the integer portion.

Example

Convert string to integer

The following example converts a string to an integer and checks if the converted value is equal to a specific integer.

print toint("123") == 123
|project Integer = print_0

Output

Integer
true

Truncated integer

The following example inputs a decimal value and returns a truncated integer.

print toint(2.3)
|project Integer = print_0

Output

Integer
2

12.226 - tolong()

Learn how to use the tolong() function to convert the input value to a long number representation.

Converts the input value to a long (signed 64-bit) number representation.

Syntax

tolong(value)

Parameters

NameTypeRequiredDescription
valuescalar✔️The value to convert to a long.

Returns

If conversion is successful, the result is a long number. If conversion isn’t successful, the result is null.

Example

tolong("123") == 123

12.227 - tolower()

Learn how to use the tolower() function to convert the input string to lower case.

Converts the input string to lower case.

Syntax

tolower(value)

Parameters

NameTypeRequiredDescription
valuestring✔️The value to convert to a lowercase string.

Returns

If conversion is successful, result is a lowercase string. If conversion isn’t successful, result is null.

Example

tolower("Hello") == "hello"

12.228 - toreal()

Learn how to use the toreal() function to convert the input expression to a value of type real.

Converts the input expression to a value of type real.

Syntax

toreal(Expr)

Parameters

NameTypeRequiredDescription
valuescalar✔️The value to convert to real.

Returns

If conversion is successful, the result is a value of type real. Otherwise, the returned value will be real(null).

Example

toreal("123.4") == 123.4

12.229 - tostring()

Learn how to use the tostring() function to convert the input value to a string representation.

Converts the input to a string representation.

Syntax

tostring(value)

Parameters

NameTypeRequiredDescription
valuescalar✔️The value to convert to a string.

Returns

If value is non-null, the result is a string representation of value. If value is null, the result is an empty string.

Example

print tostring(123)

12.230 - totimespan()

Learn how to use the totimespan() function to convert the input to a timespan scalar value.

Converts the input to a timespan scalar value.

Syntax

totimespan(value)

Parameters

NameTypeRequiredDescription
valuestring✔️The value to convert to a timespan.

Returns

If conversion is successful, result will be a timespan value. Else, result will be null.

Example

totimespan("0.00:01:00") == time(1min)

12.231 - toupper()

Learn how to use the toupper() function to convert a string to upper case.

Converts a string to upper case.

Syntax

toupper(value)

Parameters

NameTypeRequiredDescription
valuestring✔️The value to convert to an uppercase string.

Returns

If conversion is successful, result is an uppercase string. If conversion isn’t successful, result is null.

Example

toupper("hello") == "HELLO"

12.232 - translate()

Learn how to use the translate() function to replace a set of characters with another set of characters in a given string.

Replaces a set of characters (‘searchList’) with another set of characters (‘replacementList’) in a given a string. The function searches for characters in the ‘searchList’ and replaces them with the corresponding characters in ‘replacementList’

Syntax

translate(searchList, replacementList, source)

Parameters

NameTypeRequiredDescription
searchListstring✔️The list of characters that should be replaced.
replacementListstring✔️The list of characters that should replace the characters in searchList.
sourcestring✔️A string to search.

Returns

source after replacing all occurrences of characters in ‘replacementList’ with the corresponding characters in ‘searchList’

Examples

InputOutput
translate("abc", "x", "abc")"xxx"
translate("abc", "", "ab")""
translate("krasp", "otsku", "spark")"kusto"

12.233 - treepath()

This article describes treepath().

Enumerates all the path expressions that identify leaves in a dynamic object.

Syntax

treepath(object)

Parameters

NameTypeRequiredDescription
objectdynamic✔️A dynamic property bag object for which to enumerate the path expressions.

Returns

An array of path expressions.

Examples

ExpressionEvaluates to
treepath(parse_json('{"a":"b", "c":123}'))["['a']","['c']"]
treepath(parse_json('{"prop1":[1,2,3,4], "prop2":"value2"}'))["['prop1']","['prop1'][0]","['prop2']"]
treepath(parse_json('{"listProperty":[100,200,300,"abcde",{"x":"y"}]}'))["['listProperty']","['listProperty'][0]","['listProperty'][0]['x']"]

12.234 - trim_end()

Learn how to use the trim_end() function to remove the trailing match of the specified regular expression.

Removes trailing match of the specified regular expression.

Syntax

trim_end(regex, source)

Parameters

NameTypeRequiredDescription
regexstring✔️The string or regular expression to be trimmed from the end of source.
sourcestring✔️The source string from which to trim regex.

Returns

source after trimming matches of regex found in the end of source.

Examples

The following statement trims substring from the end of string_to_trim.

let string_to_trim = @"bing.com";
let substring = ".com";
print string_to_trim = string_to_trim,trimmed_string = trim_end(substring,string_to_trim)

Output

string_to_trimtrimmed_string
bing.combing

Trim non-alphanumeric characters

The following example trims all non-word characters from the end of the string.

print str = strcat("-  ","Te st",x,@"// $")
| extend trimmed_str = trim_end(@"[^\w]+",str)

Output

strtrimmed_str
- Te st1// $- Te st1
- Te st2// $- Te st2
- Te st3// $- Te st3
- Te st4// $- Te st4
- Te st5// $- Te st5

Trim whitespace

The following example trims all spaces from the end of the string.

let string_to_trim = @"    Hello, world!    ";
let substring = @"\s+";
print
    string_to_trim = string_to_trim,
    trimmed_end = trim_end(substring, string_to_trim)

Output

string_to_trimtrimmed_end
Hello, world!Hello, world!

| Hello, world! | Hello, world!|

12.235 - trim_start()

Learn how to use the trim_start() function to remove the leading match of the specified regular expression.

Removes leading match of the specified regular expression.

Syntax

trim_start(regex, source)

Parameters

NameTypeRequiredDescription
regexstring✔️The string or regular expression to be trimmed from the beginning of source.
sourcestring✔️The source string from which to trim regex.

Returns

source after trimming match of regex found in the beginning of source.

Examples

Trim specific substring

The following example trims substring from the start of string_to_trim.

let string_to_trim = @"https://bing.com";
let substring = "https://";
print string_to_trim = string_to_trim,trimmed_string = trim_start(substring,string_to_trim)

Output

string_to_trimtrimmed_string
https://bing.combing.com

Trim non-alphanumeric characters

The following example trims all non-word characters from the beginning of the string.

range x from 1 to 5 step 1
| project str = strcat("-  ","Te st",x,@"// $")
| extend trimmed_str = trim_start(@"[^\w]+",str)

Output

strtrimmed_str
- Te st1// $Te st1// $
- Te st2// $Te st2// $
- Te st3// $Te st3// $
- Te st4// $Te st4// $
- Te st5// $Te st5// $

Trim whitespace

The following example trims all spaces from the start of the string.

let string_to_trim = @"    Hello, world!    ";
let substring = @"\s+";
print
    string_to_trim = string_to_trim,
    trimmed_start = trim_start(substring, string_to_trim)

Output

string_to_trimtrimmed_start
Hello, world!Hello, world!

| Hello, world! |Hello, world! |

12.236 - trim()

Learn how to use the trim() function to remove the leading and trailing match of the specified regular expression.

Removes all leading and trailing matches of the specified regular expression.

Syntax

trim(regex, source)

Parameters

NameTypeRequiredDescription
regexstring✔️The string or regular expression to be trimmed from source.
sourcestring✔️The source string from which to trim regex.

Returns

source after trimming matches of regex found in the beginning and/or the end of source.

Examples

Trim specific substring

The following example trims substring from the start and the end of the string_to_trim.

let string_to_trim = @"--https://bing.com--";
let substring = "--";
print string_to_trim = string_to_trim, trimmed_string = trim(substring,string_to_trim)

Output

string_to_trimtrimmed_string
--https://bing.com--https://bing.com

Trim non-alphanumeric characters

The following example trims all non-word characters from start and end of the string.

range x from 1 to 5 step 1
| project str = strcat("-  ","Te st",x,@"// $")
| extend trimmed_str = trim(@"[^\w]+",str)

Output

strtrimmed_str
- Te st1// $Te st1
- Te st2// $Te st2
- Te st3// $Te st3
- Te st4// $Te st4
- Te st5// $Te st5

Trim whitespaces

The next statement trims all spaces from start and end of the string.

let string_to_trim = @"    Hello, world!    ";
let substring = @"\s+";
print
    string_to_trim = string_to_trim,
    trimmed_string = trim(substring, string_to_trim)

Output

string_to_trimtrimmed_string
Hello, world!Hello, world!

12.237 - unicode_codepoints_from_string()

Learn how to use the unicode_codepoints_from_string() function to return a dynamic array of the Unicode codepoints of the input string.

Returns a dynamic array of the Unicode codepoints of the input string. This function is the inverse operation of unicode_codepoints_to_string() function.

Syntax

unicode_codepoints_from_string(value)

Parameters

NameTypeRequiredDescription
valuestring✔️The source string to convert.

Returns

Returns a dynamic array of the Unicode codepoints of the characters that make up the string provided to this function. See unicode_codepoints_to_string())

Examples

print arr = unicode_codepoints_from_string("⒦⒰⒮⒯⒪")

Output

arr
[9382, 9392, 9390, 9391, 9386]
print arr = unicode_codepoints_from_string("קוסטו - Kusto")

Output

arr
[1511, 1493, 1505, 1496, 1493, 32, 45, 32, 75, 117, 115, 116, 111]
print str = unicode_codepoints_to_string(unicode_codepoints_from_string("Kusto"))

Output

str
Kusto

12.238 - unicode_codepoints_to_string()

Learn how to use the unicode_codepoints_to_string() function to return the string represented by the Unicode codepoints.

Returns the string represented by the Unicode codepoints. This function is the inverse operation of unicode_codepoints_from_string() function.

Syntax

unicode_codepoints_to_string (values)

Parameters

NameTypeRequiredDescription
valuesint, long, or dynamic✔️One or more comma-separated values to convert. The values may also be a dynamic array.

Returns

Returns the string made of the UTF characters whose Unicode codepoint value is provided by the arguments to this function. The input must consist of valid Unicode codepoints. If any argument isn’t a valid Unicode codepoint, the function returns null.

Examples

print str = unicode_codepoints_to_string(75, 117, 115, 116, 111)

Output

str
Kusto
print str = unicode_codepoints_to_string(dynamic([75, 117, 115, 116, 111]))

Output

str
Kusto
print str = unicode_codepoints_to_string(dynamic([75, 117, 115]), 116, 111)

Output

str
Kusto
print str = unicode_codepoints_to_string(75, 10, 117, 10, 115, 10, 116, 10, 111)

Output

str
K
u
s
t
o
print str = unicode_codepoints_to_string(range(48,57), range(65,90), range(97,122))

Output

str
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

12.239 - unixtime_microseconds_todatetime()

Learn how to use the unixtime_microseconds_todatetime() function to convert unix-epoch microseconds to UTC datetime.

Converts unix-epoch microseconds to UTC datetime.

Syntax

unixtime_microseconds_todatetime(microseconds)

Parameters

NameTypeRequiredDescription
microsecondsreal✔️The epoch timestamp in microseconds. A datetime value that occurs before the epoch time (1970-01-01 00:00:00) has a negative timestamp value.

Returns

If the conversion is successful, the result is a datetime value. Otherwise, the result is null.

Example

print date_time = unixtime_microseconds_todatetime(1546300800000000)

Output

date_time
2019-01-01 00:00:00.0000000

12.240 - unixtime_milliseconds_todatetime()

Learn how to use the unixtime_milliseconds_todatetime() function to convert unix-epoch milliseconds to UTC datetime.

Converts unix-epoch milliseconds to UTC datetime.

Syntax

unixtime_milliseconds_todatetime(milliseconds)

Parameters

NameTypeRequiredDescription
millisecondsreal✔️The epoch timestamp in microseconds. A datetime value that occurs before the epoch time (1970-01-01 00:00:00) has a negative timestamp value.

Returns

If the conversion is successful, the result is a datetime value. Otherwise, the result is null.

Example

print date_time = unixtime_milliseconds_todatetime(1546300800000)

Output

date_time
2019-01-01 00:00:00.0000000

12.241 - unixtime_nanoseconds_todatetime()

Learn how to use the unixtime_nanoseconds_todatetime() function to convert unix-epoch nanoseconds to UTC datetime.

Converts unix-epoch nanoseconds to UTC datetime.

Syntax

unixtime_nanoseconds_todatetime(nanoseconds)

Parameters

NameTypeRequiredDescription
nanosecondsreal✔️The epoch timestamp in nanoseconds. A datetime value that occurs before the epoch time (1970-01-01 00:00:00) has a negative timestamp value.

Returns

If the conversion is successful, the result is a datetime value. Otherwise, the result is null.

Example

print date_time = unixtime_nanoseconds_todatetime(1546300800000000000)

Output

date_time
2019-01-01 00:00:00.0000000

12.242 - unixtime_seconds_todatetime()

Learn how to use the unixtime_seconds_todatetime() function to convert unix-epoch seconds to UTC datetime.

Converts unix-epoch seconds to UTC datetime.

Syntax

unixtime_seconds_todatetime(seconds)

Parameters

NameTypeRequiredDescription
secondsreal✔️The epoch timestamp in seconds. A datetime value that occurs before the epoch time (1970-01-01 00:00:00) has a negative timestamp value.

Returns

If the conversion is successful, the result is a datetime value. Otherwise, the result is null.

Example

print date_time = unixtime_seconds_todatetime(1546300800)

Output

date_time
2019-01-01 00:00:00.0000000

12.243 - url_decode()

Learn how to use the url_decode() function to convert an encoded URL into a regular URL representation.

The function converts an encoded URL into a regular URL representation.

For more information about URL encoding and decoding, see Percent-encoding.

Syntax

url_decode(encoded_url)

Parameters

NameTypeRequiredDescription
encoded_urlstring✔️The encoded URL to decode.

Returns

URL (string) in a regular representation.

Example

let url = @'https%3a%2f%2fwww.bing.com%2f';
print original = url, decoded = url_decode(url)

Output

originaldecoded
https%3a%2f%2fwww.bing.com%2fhttps://www.bing.com/

12.244 - url_encode_component()

Learn how to use the url_encode_component() function to convert characters of the input URL into a transmittable format.

The function converts characters of the input URL into a format that can be transmitted over the internet. Differs from url_encode by encoding spaces as ‘%20’ and not as ‘+’.

For more information about URL encoding and decoding, see Percent-encoding.

Syntax

url_encode_component(url)

Parameters

NameTypeRequiredDescription
urlstring✔️The URL to encode.

Returns

URL (string) converted into a format that can be transmitted over the Internet.

Example

let url = @'https://www.bing.com/hello world/';
print original = url, encoded = url_encode_component(url)

Output

originalencoded
https://www.bing.com/hello world/https%3a%2f%2fwww.bing.com%2fhello%20world

12.245 - url_encode()

Learn how to use the url_encode() function to convert characters of the input URL into a transmittable format.

The function converts characters of the input URL into a format that can be transmitted over the internet. Differs from url_encode_component by encoding spaces as ‘+’ and not as ‘%20’ (see application/x-www-form-urlencoded here).

For more information about URL encoding and decoding, see Percent-encoding.

Syntax

url_encode(url)

Parameters

NameTypeRequiredDescription
urlstring✔️The URL to encode.

Returns

URL (string) converted into a format that can be transmitted over the Internet.

Examples

let url = @'https://www.bing.com/hello world';
print original = url, encoded = url_encode(url)

Output

originalencoded
https://www.bing.com/hello world/https%3a%2f%2fwww.bing.com%2fhello+world

12.246 - week_of_year()

Learn how to use the week_of_year() function to get the integer representation of the week.

Returns an integer that represents the week number. The week number is calculated from the first week of a year, which is the one that includes the first Thursday, according to ISO 8601.

Deprecated aliases: weekofyear()

Syntax

week_of_year(date)

Parameters

NameTypeRequiredDescription
datedatetime✔️The date for which to return the week of the year.

Returns

week number - The week number that contains the given date.

Examples

InputOutput
week_of_year(datetime(2020-12-31))53
week_of_year(datetime(2020-06-15))25
week_of_year(datetime(1970-01-01))1
week_of_year(datetime(2000-01-01))52

The current version of this function, week_of_year(), is ISO 8601 compliant; the first week of a year is defined as the week with the year’s first Thursday in it.

The current version of this function, week_of_year(), is ISO 8601 compliant; the first week of a year is defined as the week with the year’s first Thursday in it.

12.247 - welch_test()

Learn how to use the welch_test() function to compute the p_value of the Welch-test.

Computes the p_value of the Welch-test function

Syntax

welch_test(mean1, variance1, count1, mean2, variance2, count2)

Parameters

NameTypeRequiredDescription
mean1real or long✔️The mean (average) value of the first series.
variance1real or long✔️The variance value of the first series.
count1real or long✔️The count of values in the first series.
mean2real or long✔️The mean (average) value of the second series.
variance2real or long✔️The variance value of the second series.
count2real or long✔️The count of values in the second series.

Returns

From Wikipedia:

In statistics, Welch’s t-test is a two-sample location test that’s used to test the hypothesis that two populations have equal means. Welch’s t-test is an adaptation of Student’s t-test, and is more reliable when the two samples have unequal variances and unequal sample sizes. These tests are often referred to as “unpaired” or “independent samples” t-tests. The tests are typically applied when the statistical units underlying the two samples being compared are non-overlapping. Welch’s t-test is less popular than Student’s t-test, and may be less familiar to readers. The test is also called “Welch’s unequal variances t-test”, or “unequal variances t-test”.

Example

// s1, s2 values are from https://en.wikipedia.org/wiki/Welch%27s_t-test
print
    s1 = dynamic([27.5, 21.0, 19.0, 23.6, 17.0, 17.9, 16.9, 20.1, 21.9, 22.6, 23.1, 19.6, 19.0, 21.7, 21.4]),
    s2 = dynamic([27.1, 22.0, 20.8, 23.4, 23.4, 23.5, 25.8, 22.0, 24.8, 20.2, 21.9, 22.1, 22.9, 20.5, 24.4])
| mv-expand s1 to typeof(double), s2 to typeof(double)
| summarize m1=avg(s1), v1=variance(s1), c1=count(), m2=avg(s2), v2=variance(s2), c2=count()
| extend pValue=welch_test(m1,v1,c1,m2,v2,c2)
// pValue = 0.021

12.248 - zip()

This article describes zip().

The zip function accepts any number of dynamic arrays, and returns an array whose elements are each an array holding the elements of the input arrays of the same index.

Syntax

zip(arrays)

Parameters

NameTypeRequiredDescription
arraysdynamic✔️The dynamic array values to zip. The function accepts between 2-16 arrays.

Examples

print zip(dynamic([1,3,5]), dynamic([2,4,6]))

Output

print_0
[[1,2],[3,4],[5,6]]
print zip(dynamic(["A", 1, 1.5]), dynamic([{}, "B"]))

Output

print_0
[["A",{}], [1,"B"], [1.5, null]]
datatable(a:int, b:string) [1,"one",2,"two",3,"three"]
| summarize a = make_list(a), b = make_list(b)
| project zip(a, b)

Output

print_0
[[1,"one"],[2,"two"],[3,"three"]]

12.249 - zlib_compress_to_base64_string

This article describes the zlib_compress_to_base64_string() command.

Performs zlib compression and encodes the result to base64.

Syntax

zlib_compress_to_base64_string(string)

Parameters

NameTypeRequiredDescription
stringstring✔️The string to be compressed and base64 encoded.

Returns

  • Returns a string that represents zlib-compressed and base64-encoded original string.
  • Returns an empty result if compression or encoding failed.

Example

Using Kusto Query Language

print zcomp = zlib_compress_to_base64_string("1234567890qwertyuiop")

Output

zcomp
“eAEBFADr/zEyMzQ1Njc4OTBxd2VydHl1aW9wOAkGdw==”

Using Python

Compression can be done using other tools, for example Python.

print(base64.b64encode(zlib.compress(b'<original_string>')))

12.250 - zlib_decompress_from_base64_string()

This article describes the zlib_decompress_from_base64_string() command.

Decodes the input string from base64 and performs zlib decompression.

Syntax

zlib_decompress_from_base64_string(string)

Parameters

NameTypeRequiredDescription
stringstring✔️The string to decode. The string should have been compressed with zlib and then base64-encoded.

Returns

  • Returns a string that represents the original string.
  • Returns an empty result if decompression or decoding failed.
    • For example, invalid zlib-compressed and base 64-encoded strings will return an empty output.

Examples

Valid input

print zcomp = zlib_decompress_from_base64_string("eJwLSS0uUSguKcrMS1cwNDIGACxqBQ4=")

Output

zcomp
Test string 123

Invalid input

print zcomp = zlib_decompress_from_base64_string("x0x0x0")

Output

zcomp

13 - Scalar operators

13.1 - Bitwise (binary) operators

This article lists the bitwise (binary) operators supported in the Kusto Query Language.

Kusto support several bitwise (binary) operators between integers:

13.2 - Datetime / timespan arithmetic

This article describes Datetime / timespan arithmetic.

Kusto supports performing arithmetic operations on values of types datetime and timespan.

Supported operations

  • One can subtract (but not add) two datetime values to get a timespan value expressing their difference. For example, datetime(1997-06-25) - datetime(1910-06-11) is how old was Jacques-Yves Cousteau when he died.

  • One can add or subtract two timespan values to get a timespan value which is their sum or difference. For example, 1d + 2d is three days.

  • One can add or subtract a timespan value from a datetime value. For example, datetime(1910-06-11) + 1d is the date Cousteau turned one day old.

  • One can divide two timespan values to get their quotient. For example, 1d / 5h gives 4.8. This gives one the ability to express any timespan value as a multiple of another timespan value. For example, to express an hour in seconds, simply divide 1h by 1s: 1h / 1s (with the obvious result, 3600).

  • Conversely, one can multiple a numeric value (such as double and long) by a timespan value to get a timespan value. For example, one can express an hour and a half as 1.5 * 1h.

Examples

Unix time, which is also known as POSIX time or UNIX Epoch time, is a system for describing a point in time as the number of seconds that have elapsed since 00:00:00 Thursday, 1 January 1970, Coordinated Universal Time (UTC), minus leap seconds.

If your data includes representation of Unix time as an integer, or you require converting to it, the following functions are available.

From Unix time

let fromUnixTime = (t: long) { 
    datetime(1970-01-01) + t * 1sec 
};
print result = fromUnixTime(1546897531)

Output

result
2019-01-07 21:45:31.0000000

To Unix time

let toUnixTime = (dt: datetime) { 
    (dt - datetime(1970-01-01)) / 1s 
};
print result = toUnixTime(datetime(2019-01-07 21:45:31.0000000))

Output

result
1546897531

For unix-epoch time conversions, see the following functions:

13.3 - Logical (binary) operators

Learn how to use Logical (binary) operators to return a Boolean result.

The following logical operators can be used to perform comparisons and evaluations:

Operator nameSyntaxMeaning
Equality==Returns true if both operands are non-null and equal to each other. Otherwise, returns false.
Inequality!=Returns true if any of the operands are null or if the operands aren’t equal to each other. Otherwise, returns false.
Logical andandReturns true only if both operands are true. The logical and has higher precedence than the logical or.
Logical ororReturns true if either of the operands is true, regardless of the other operand.

How logical operators work with null values

Null values adhere to the following rules:

OperationResult
bool(null) == bool(null)false
bool(null) != bool(null)false
bool(null) and truefalse
bool(null) or truetrue

Examples

Equality

The following query returns a count of all storm events where the event type is “Tornado”.

StormEvents
| where EventType == "Tornado"
| count

Output

Count
1238

Inequality

The following query returns a count of all storm events where the event type isn’t “Tornado”.

StormEvents
| where EventType != "Tornado"
| count

Output

Count
57828

Logical and

The following query returns a count of all storm events where the event type is “Tornado” and the state is “KANSAS”.

StormEvents
| where EventType == "Tornado" and State == "KANSAS"
| count

Output

Count
161

Logical or

The following query returns a count of all storm events where the event type is “Tornado” or “Thunderstorm Wind”.

StormEvents
| where EventType == "Tornado" or EventType == "Thunderstorm Wind"
| count

Output

Count
14253

Null values

The following query shows that null values are treated as false.

print print=iff(bool(null) and true, true, false)

Output

print
false

13.4 - Numerical operators

Learn how to use numerical operators to calculate the value from two or more numbers.

The types int, long, and real represent numerical types. The following operators can be used between pairs of these types:

OperatorDescriptionExample
+Add3.14 + 3.14, ago(5m) + 5m
-Subtract0.23 - 0.22,
*Multiply1s * 5, 2 * 2
/Divide10m / 1s, 4 / 2
%Modulo4 % 2
<Less1 < 10, 10sec < 1h, now() < datetime(2100-01-01)
>Greater0.23 > 0.22, 10min > 1sec, now() > ago(1d)
==Equals1 == 1
!=Not equals1 != 0
<=Less or Equal4 <= 5
>=Greater or Equal5 >= 4
inEquals to one of the elementssee here
!inNot equals to any of the elementssee here

Type rules for arithmetic operations

The data type of the result of an arithmetic operation is determined by the data types of the operands. If one of the operands is of type real, the result will be of type real. If both operands are of integer types (int or long), the result will be of type long.

Due to these rules, the result of division operations that only involve integers will be truncated to an integer, which might not always be what you want. To avoid truncation, convert at least one of the integer values to real using the todouble() before performing the operation.

The following examples illustrate how the operand types affect the result type in division operations.

OperationResultDescription
1.0 / 20.5One of the operands is of type real, so the result is real.
1 / 2.00.5One of the operands is of type real, so the result is real.
1 / 20Both of the operands are of type int, so the result is int. Integer division occurs and the decimal is truncated, resulting in 0 instead of 0.5, as one might expect.
real(1) / 20.5To avoid truncation due to integer division, one of the int operands was first converted to real using the real() function.

Comment about the modulo operator

The modulo of two numbers always returns in Kusto a “small non-negative number”. Thus, the modulo of two numbers, N % D, is such that: 0 ≤ (N % D) < abs(D).

For example, the following query:

print plusPlus = 14 % 12, minusPlus = -14 % 12, plusMinus = 14 % -12, minusMinus = -14 % -12

Produces this result:

plusPlusminusPlusplusMinusminusMinus
210210

13.5 - Between operators

13.5.1 - The !between operator

Learn how to use the !between operator to match the input that is outside of the inclusive range.

Matches the input that is outside of the inclusive range.

!between can operate on any numeric, datetime, or timespan expression.

Syntax

T | where expr !between (leftRange..rightRange)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose records are to be matched.
exprscalar✔️The expression to filter.
leftRangeint, long, real, or datetime✔️The expression of the left range. The range is inclusive.
rightRangeint, long, real, datetime, or timespan✔️The expression of the right range. The range is inclusive.

This value can only be of type timespan if expr and leftRange are both of type datetime. See example.

Returns

Rows in T for which the predicate of (expr < leftRange or expr > rightRange) evaluates to true.

Examples

Filter numeric values

range x from 1 to 10 step 1
| where x !between (5 .. 9)

Output

x
1
2
3
4
10

Filter datetime

StormEvents
| where StartTime !between (datetime(2007-07-27) .. datetime(2007-07-30))
| count 

Output

Count
58590

Filter datetime using a timespan range

StormEvents
| where StartTime !between (datetime(2007-07-27) .. 3d)
| count 

Output

Count
58590

13.5.2 - The between operator

Learn how to use the between operator to return a record set of values in an inclusive range for which the predicate evaluates to true.

Filters a record set for data matching the values in an inclusive range.

between can operate on any numeric, datetime, or timespan expression.

Syntax

T | where expr between (leftRange..rightRange)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose records are to be matched. For example, the table name.
exprscalar✔️The expression used to filter.
leftRangeint, long, real, or datetime✔️The expression of the left range. The range is inclusive.
rightRangeint, long, real, datetime, or timespan✔️The expression of the right range. The range is inclusive.

This value can only be of type timespan if expr and leftRange are both of type datetime. See example.

Returns

Rows in T for which the predicate of (expr >= leftRange and expr <= rightRange) evaluates to true.

Examples

Filter numeric values

range x from 1 to 100 step 1
| where x between (50 .. 55)

Output

x
50
51
52
53
54
55

Filter by date

StormEvents
| where StartTime between (datetime(2007-07-27) .. datetime(2007-07-30))
| count

Output

Count
476

Filter by date and time

StormEvents
| where StartTime between (datetime(2007-12-01T01:30:00) .. datetime(2007-12-01T08:00:00))
| count

Output

Count
301

Filter using a timespan range

StormEvents
| where StartTime between (datetime(2007-07-27) .. 3d)
| count

Output

Count
476

13.6 - in operators

13.6.1 - The case-insensitive !in~ string operator

Learn how to use the !in~ string operator to filter records for data without a case-insensitive string.

Filters a record set for data without a case-insensitive string.

Performance tips

When possible, use the case-sensitive !in.

Syntax

T | where col !in~ (expression,)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input to filter.
colstring✔️The column by which to filter.
expressionscalar or tabular✔️An expression that specifies the values for which to search. Each expression can be a scalar value or a tabular expression that produces a set of values. If a tabular expression has multiple columns, the first column is used. The search will consider up to 1,000,000 distinct values.

Returns

Rows in T for which the predicate is true.

Example

List of scalars

The following query shows how to use !in~ with a comma-separated list of scalar values.

StormEvents 
| where State !in~ ("Florida", "Georgia", "New York") 
| count

Output

Count
54,291

Dynamic array

The following query shows how to use !in~ with a dynamic array.

StormEvents 
| where State !in~ (dynamic(["Florida", "Georgia", "New York"])) 
| count

Output

Count
54291

The same query can also be written with a let statement.

let states = dynamic(["Florida", "Georgia", "New York"]);
StormEvents 
| where State !in~ (states)
| summarize count() by State

Output

Count
54291

Tabular expression

The following query shows how to use !in~ with an inline tabular expression. Notice that an inline tabular expression must be enclosed with double parentheses.

StormEvents 
| where State !in~ (PopulationData | where Population > 5000000 | project State)
| summarize count() by State

Output

Statecount_
KANSAS3166
IOWA2337
NEBRASKA1766
OKLAHOMA1716
SOUTH DAKOTA1567

The same query can also be written with a let statement. Notice that the double parentheses as provided in the last example aren’t necessary in this case.

let large_states = PopulationData | where Population > 5000000 | project State;
StormEvents 
| where State !in~ (large_states)
| summarize count() by State

Output

Statecount_
KANSAS3166
IOWA2337
NEBRASKA1766
OKLAHOMA1716
SOUTH DAKOTA1567

13.6.2 - The case-insensitive in~ string operator

Learn how to use the in~ operator to filter data with a case-insensitive string.

Filters a record set for data with a case-insensitive string.

Performance tips

When possible, use the case-sensitive in.

Syntax

T | where col in~ (expression,)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input to filter.
colstring✔️The column by which to filter.
expressionscalar or tabular✔️An expression that specifies the values for which to search. Each expression can be a scalar value or a tabular expression that produces a set of values. If a tabular expression has multiple columns, the first column is used. The search will consider up to 1,000,000 distinct values.

Returns

Rows in T for which the predicate is true.

Examples

List of scalars

The following query shows how to use in~ with a comma-separated list of scalar values.

StormEvents 
| where State in~ ("FLORIDA", "georgia", "NEW YORK") 
| count

Output

Count
4775

Dynamic array

The following query shows how to use in~ with a dynamic array.

StormEvents 
| where State in~ (dynamic(["FLORIDA", "georgia", "NEW YORK"])) 
| count

Output

Count
4775

The same query can also be written with a let statement.

let states = dynamic(["FLORIDA", "georgia", "NEW YORK"]);
StormEvents 
| where State has_any (states)
| summarize count() by State

Output

Count
4775

Tabular expression

The following query shows how to use in~ with an inline tabular expression. Notice that an inline tabular expression must be enclosed with double parentheses.

StormEvents 
| where State in~ (PopulationData | where Population > 5000000 | project State)
| summarize count() by State

Output

Statecount_
TEXAS4701
ILLINOIS2022
MISSOURI2016
GEORGIA1983
MINNESOTA1881

The same query can also be written with a let statement. Notice that the double parentheses as provided in the last example aren’t necessary in this case.

let large_states = PopulationData | where Population > 5000000 | project State;
StormEvents 
| where State in~ (large_states)
| summarize count() by State

Output

Statecount_
TEXAS4701
ILLINOIS2022
MISSOURI2016
GEORGIA1983
MINNESOTA1881

13.6.3 - The case-sensitive !in string operator

Learn how to use the !in string operator to filter records for data without a case-sensitive string.

Filters a record set for data without a case-sensitive string.

Performance tips

Syntax

T | where col !in (expression,)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input to filter.
colstring✔️The column by which to filter.
expressionscalar or tabular✔️An expression that specifies the values for which to search. Each expression can be a scalar value or a tabular expression that produces a set of values. If a tabular expression has multiple columns, the first column is used. The search will consider up to 1,000,000 distinct values.

Returns

Rows in T for which the predicate is true.

Example

List of scalars

The following query shows how to use !in with a comma-separated list of scalar values.

StormEvents 
| where State !in ("FLORIDA", "GEORGIA", "NEW YORK") 
| count

Output

Count
54291

Dynamic array

The following query shows how to use !in with a dynamic array.

StormEvents 
| where State !in (dynamic(["FLORIDA", "GEORGIA", "NEW YORK"])) 
| count

Output

Count
54291

The same query can also be written with a let statement.

let states = dynamic(["FLORIDA", "GEORGIA", "NEW YORK"]);
StormEvents 
| where State !in (states)
| summarize count() by State

Output

Count
54291

Tabular expression

The following query shows how to use !in with an inline tabular expression. Notice that an inline tabular expression must be enclosed with double parentheses.

StormEvents 
| where State !in (PopulationData | where Population > 5000000 | project State)
| summarize count() by State

Output

StateCount
KANSAS3166
IOWA2337
NEBRASKA1766
OKLAHOMA1716
SOUTH DAKOTA1567

The same query can also be written with a let statement. Notice that the double parentheses as provided in the last example aren’t necessary in this case.

let large_states = PopulationData | where Population > 5000000 | project State;
StormEvents 
| where State !in (large_states)
| summarize count() by State

Output

StateCount
KANSAS3166
IOWA2337
NEBRASKA1766
OKLAHOMA1716
SOUTH DAKOTA1567

13.6.4 - The case-sensitive in string operator

Learn how to use the in operator to filter data with a case-sensitive string.

Filters a record set for data with a case-sensitive string.

Performance tips

Syntax

T | where col in (expression,)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input to filter.
colstring✔️The column by which to filter.
expressionscalar or tabular✔️An expression that specifies the values for which to search. Each expression can be a scalar value or a tabular expression that produces a set of values. If a tabular expression has multiple columns, the first column is used. The search considers up to 1,000,000 distinct values.

Returns

Rows in T for which the predicate is true.

Examples

List of scalars

The following query shows how to use in with a list of scalar values.

StormEvents 
| where State in ("FLORIDA", "GEORGIA", "NEW YORK") 
| count

Output

Count
4775

Dynamic array

The following query shows how to use in with a dynamic array.

let states = dynamic(['FLORIDA', 'ATLANTIC SOUTH', 'GEORGIA']);
StormEvents 
| where State in (states)
| count

Output

Count
3218

Tabular expression

The following query shows how to use in with a tabular expression.

let Top_5_States = 
    StormEvents
    | summarize count() by State
    | top 5 by count_; 
StormEvents 
| where State in (Top_5_States) 
| count

The same query can be written with an inline tabular expression statement.

StormEvents 
| where State in (
    StormEvents
    | summarize count() by State
    | top 5 by count_
    ) 
| count

Output

Count
14242

Top with other example

The following example identifies the top five states with lightning events and uses the iff() function and in operator to classify lightning events by the top five states, labeled by state name, and all others labeled as “Other.”

let Lightning_By_State = materialize(StormEvents
    | summarize lightning_events = countif(EventType == 'Lightning') by State);
let Top_5_States = Lightning_By_State | top 5 by lightning_events | project State; 
Lightning_By_State
| extend State = iff(State in (Top_5_States), State, "Other")
| summarize sum(lightning_events) by State 

Output

Statesum_lightning_events
ALABAMA29
WISCONSIN31
TEXAS55
FLORIDA85
GEORGIA106
Other415

Use a static list returned by a function

The following example counts events from the StormEvents table based on a predefined list of interesting states. The interesting states are defined by the InterestingStates() function.

StormEvents 
| where State in (InterestingStates()) 
| count

Output

Count
4775

The following query displays which states are considered interesting by the InterestingStates() function.

.show function InterestingStates

Output

NameParametersBodyFolderDocString
InterestingStates(){ dynamic([“WASHINGTON”, “FLORIDA”, “GEORGIA”, “NEW YORK”]) }

13.7 - String operators

13.7.1 - matches regex operator

Learn how to use the matches regex string operator to filter a record set based on a case-sensitive regex value.

Filters a record set based on a case-sensitive regular expression value.

For more information about other operators and to determine which operator is most appropriate for your query, see datatype string operators.

Syntax

T | where col matches regex (expression)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose records are to be filtered.
colstring✔️The column by which to filter.
expressionscalar✔️The regular expression /Query/Data%20types/real.md used to filter. The maximum number of regex groups is 16. For more information about the regex syntax supported by Kusto, see regular expression.

Returns

Rows in T for which the predicate is true.

Example

StormEvents
| summarize event_count=count() by State
| where State matches regex "K.*S"
| where event_count > 10
| project State, event_count

Output

Stateevent_count
KANSAS3166
ARKANSAS1028
LAKE SUPERIOR34
LAKE ST CLAIR32

13.7.2 - String operators

Learn about query operators for searching string data types.

Kusto Query Language (KQL) offers various query operators for searching string data types. The following article describes how string terms are indexed, lists the string query operators, and gives tips for optimizing performance.

Understanding string terms

Kusto indexes all columns, including columns of type string. Multiple indexes are built for such columns, depending on the actual data. These indexes aren’t directly exposed, but are used in queries with the string operators that have has as part of their name, such as has, !has, hasprefix, !hasprefix. The semantics of these operators are dictated by the way the column is encoded. Instead of doing a “plain” substring match, these operators match terms.

What is a term?

By default, each string value is broken into maximal sequences of alphanumeric characters, and each of those sequences is made into a term.

For example, in the following string, the terms are Kusto, KustoExplorerQueryRun, and the following substrings: ad67d136, c1db, 4f9f, 88ef, d94f3b6b0b5a.

Kusto: ad67d136-c1db-4f9f-88ef-d94f3b6b0b5a;KustoExplorerQueryRun

Kusto builds a term index consisting of all terms that are three characters or more, and this index is used by string operators such as has, !has, and so on. If the query looks for a term that is smaller than three characters, or uses a contains operator, then the query will revert to scanning the values in the column. Scanning is much slower than looking up the term in the term index.

Operators on strings

The following abbreviations are used in this article:

  • RHS = right hand side of the expression
  • LHS = left hand side of the expression

Operators with an _cs suffix are case sensitive.

OperatorDescriptionCase-SensitiveExample (yields true)
==EqualsYes"aBc" == "aBc"
!=Not equalsYes"abc" != "ABC"
=~EqualsNo"abc" =~ "ABC"
!~Not equalsNo"aBc" !~ "xyz"
containsRHS occurs as a subsequence of LHSNo"FabriKam" contains "BRik"
!containsRHS doesn’t occur in LHSNo"Fabrikam" !contains "xyz"
contains_csRHS occurs as a subsequence of LHSYes"FabriKam" contains_cs "Kam"
!contains_csRHS doesn’t occur in LHSYes"Fabrikam" !contains_cs "Kam"
endswithRHS is a closing subsequence of LHSNo"Fabrikam" endswith "Kam"
!endswithRHS isn’t a closing subsequence of LHSNo"Fabrikam" !endswith "brik"
endswith_csRHS is a closing subsequence of LHSYes"Fabrikam" endswith_cs "kam"
!endswith_csRHS isn’t a closing subsequence of LHSYes"Fabrikam" !endswith_cs "brik"
hasRight-hand-side (RHS) is a whole term in left-hand-side (LHS)No"North America" has "america"
!hasRHS isn’t a full term in LHSNo"North America" !has "amer"
has_allSame as has but works on all of the elementsNo"North and South America" has_all("south", "north")
has_anySame as has but works on any of the elementsNo"North America" has_any("south", "north")
has_csRHS is a whole term in LHSYes"North America" has_cs "America"
!has_csRHS isn’t a full term in LHSYes"North America" !has_cs "amer"
hasprefixRHS is a term prefix in LHSNo"North America" hasprefix "ame"
!hasprefixRHS isn’t a term prefix in LHSNo"North America" !hasprefix "mer"
hasprefix_csRHS is a term prefix in LHSYes"North America" hasprefix_cs "Ame"
!hasprefix_csRHS isn’t a term prefix in LHSYes"North America" !hasprefix_cs "CA"
hassuffixRHS is a term suffix in LHSNo"North America" hassuffix "ica"
!hassuffixRHS isn’t a term suffix in LHSNo"North America" !hassuffix "americ"
hassuffix_csRHS is a term suffix in LHSYes"North America" hassuffix_cs "ica"
!hassuffix_csRHS isn’t a term suffix in LHSYes"North America" !hassuffix_cs "icA"
inEquals to any of the elementsYes"abc" in ("123", "345", "abc")
!inNot equals to any of the elementsYes"bca" !in ("123", "345", "abc")
in~Equals to any of the elementsNo"Abc" in~ ("123", "345", "abc")
!in~Not equals to any of the elementsNo"bCa" !in~ ("123", "345", "ABC")
matches regexLHS contains a match for RHSYes"Fabrikam" matches regex "b.*k"
startswithRHS is an initial subsequence of LHSNo"Fabrikam" startswith "fab"
!startswithRHS isn’t an initial subsequence of LHSNo"Fabrikam" !startswith "kam"
startswith_csRHS is an initial subsequence of LHSYes"Fabrikam" startswith_cs "Fab"
!startswith_csRHS isn’t an initial subsequence of LHSYes"Fabrikam" !startswith_cs "fab"

Performance tips

For better performance, when there are two operators that do the same task, use the case-sensitive one. For example:

  • Use ==, not =~
  • Use in, not in~
  • Use hassuffix_cs, not hassuffix

For faster results, if you’re testing for the presence of a symbol or alphanumeric word that is bound by non-alphanumeric characters, or the start or end of a field, use has or in. has works faster than contains, startswith, or endswith.

To search for IPv4 addresses or their prefixes, use one of special operators on IPv4 addresses, which are optimized for this purpose.

For more information, see Query best practices.

For example, the first of these queries will run faster:

StormEvents | where State has "North" | count;
StormEvents | where State contains "nor" | count

Operators on IPv4 addresses

The following group of operators provide index accelerated search on IPv4 addresses or their prefixes.

OperatorDescriptionExample (yields true)
has_ipv4LHS contains IPv4 address represented by RHShas_ipv4("Source address is 10.1.2.3:1234", "10.1.2.3")
has_ipv4_prefixLHS contains an IPv4 address that matches a prefix represented by RHShas_ipv4_prefix("Source address is 10.1.2.3:1234", "10.1.2.")
has_any_ipv4LHS contains one of IPv4 addresses provided by RHShas_any_ipv4("Source address is 10.1.2.3:1234", dynamic(["10.1.2.3", "127.0.0.1"]))
has_any_ipv4_prefixLHS contains an IPv4 address that matches one of prefixes provided by RHShas_any_ipv4_prefix("Source address is 10.1.2.3:1234", dynamic(["10.1.2.", "127.0.0."]))

13.7.3 - The case-insensitive !~ (not equals) string operator

Learn how to use the !~ (not equals) string operator to filter records for data that doesn’t match a case-insensitive string.

Filters a record set for data that doesn’t match a case-insensitive string.

The following table provides a comparison of the == (equals) operators:

OperatorDescriptionCase-SensitiveExample (yields true)
==EqualsYes"aBc" == "aBc"
!=Not equalsYes"abc" != "ABC"
=~EqualsNo"abc" =~ "ABC"
!~Not equalsNo"aBc" !~ "xyz"

For more information about other operators and to determine which operator is most appropriate for your query, see datatype string operators.

Performance tips

When possible, use the case-sensitive !=.

Syntax

T | where column !~ (expression)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose records are to be filtered.
columnstring✔️The column by which to filter.
expressionscalar✔️The scalar or literal expression for which to search.

Returns

Rows in T for which the predicate is true.

Example

StormEvents
| summarize event_count=count() by State
| where (State !~ "texas") and (event_count > 3000)
| project State, event_count

Output

Stateevent_count
KANSAS3,166

13.7.4 - The case-insensitive !contains string operator

Learn how to use the !contains string operator to filter data that doesn’t include a case sensitive string.

Filters a record set for data that doesn’t include a case-sensitive string. !contains searches for characters rather than terms of three or more characters. The query scans the values in the column, which is slower than looking up a term in a term index.

Performance tips

When possible, use the case-sensitive !contains_cs.

Use !has if you’re looking for a term.

Syntax

Case insensitive syntax

T | where Column !contains (Expression)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose records are to be filtered.
Columnstring✔️The column by which to filter.
Expressionscalar✔️The scalar or literal expression for which to search.

Returns

Rows in T for which the predicate is true.

Example

StormEvents
| summarize event_count=count() by State
| where State !contains "kan"
| where event_count > 3000
| project State, event_count

Output

Stateevent_count
TEXAS4701

13.7.5 - The case-insensitive !endswith string operator

Learn how to use the !endswith string operator to filter records for data that excludes a case-insensitive ending string.

Filters a record set for data that excludes a case-insensitive ending string.

Performance tips

When possible, use the case-sensitive !endswith_cs.

Syntax

T | where col !endswith (expression)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose records are to be filtered.
colstring✔️The column to filter.
expressionstring✔️The expression used to filter.

Returns

Rows in T for which the predicate is true.

Example

StormEvents
| summarize Events=count() by State
| where State !endswith "is"
| where Events > 2000
| project State, Events

Output

StateEvents
TEXAS4701
KANSAS3166
IOWA2337
MISSOURI2016

13.7.6 - The case-insensitive !has string operators

Learn how to use the !has string operator to filter records for data that doesn’t have a matching case-insensitive string.

Filters a record set for data that doesn’t have a matching case-insensitive string. !has searches for indexed terms, where an indexed term is three or more characters. If your term is fewer than three characters, the query scans the values in the column, which is slower than looking up the term in the term index.

Performance tips

When possible, use the case-sensitive !has_cs.

Syntax

T | where column !has (expression)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose records are to be filtered.
columnstring✔️The column by which to filter.
expressionscalar✔️The scalar or literal expression for which to search.

Returns

Rows in T for which the predicate is true.

Example

StormEvents
| summarize event_count=count() by State
| where State !has "NEW"
| where event_count > 3000
| project State, event_count

Output

Stateevent_count
TEXAS4,701
KANSAS3,166

13.7.7 - The case-insensitive !hasprefix string operator

Learn how to use the !hasprefix operator to filter records for data that doesn’t include a case-insensitive prefix.

Filters a record set for data that doesn’t include a case-insensitive starting string.

For best performance, use strings of three characters or more. !hasprefix searches for indexed terms, where an indexed term is three or more characters. If your term is fewer than three characters, the query scans the values in the column, which is slower than looking up the term in the term index.

Performance tips

When possible, use the case-sensitive !hasprefix_cs.

Syntax

T | where Column !hasprefix (Expression)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose records are to be filtered.
Columnstring✔️The column used to filter.
Expressionstring✔️The expression for which to search.

Returns

Rows in T for which the predicate is true.

Example

StormEvents
| summarize event_count=count() by State
| where State !hasprefix "N"
| where event_count > 2000
| project State, event_count
Stateevent_count
TEXAS4701
KANSAS3166
IOWA2337
ILLINOIS2022
MISSOURI2016

13.7.8 - The case-insensitive !hassuffix string operator

Learn how to use the !hassuffix string operator to filter records for data that doesn’t have a case-insensitive suffix.

Filters a record set for data that doesn’t have a case-insensitive ending string. !hassuffix returns true if there’s no term inside string column ending with the specified string expression.

Performance tips

When possible, use !hassuffix_cs - a case-sensitive version of the operator.

Syntax

T | where column !hassuffix (expression)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose records are to be filtered.
columnstring✔️The column by which to filter.
expressionscalar✔️The scalar or literal expression for which to search.

Returns

Rows in T for which the predicate is true.

Example

StormEvents
| summarize event_count=count() by State
| where State !hassuffix "A"
| where event_count > 2000
| project State, event_count

Output

Stateevent_count
TEXAS4701
KANSAS3166
ILLINOIS2022
MISSOURI2016

13.7.9 - The case-insensitive !in~ string operator

Learn how to use the !in~ string operator to filter records for data without a case-insensitive string.

Filters a record set for data without a case-insensitive string.

Performance tips

When possible, use the case-sensitive !in.

Syntax

T | where col !in~ (expression,)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input to filter.
colstring✔️The column by which to filter.
expressionscalar or tabular✔️An expression that specifies the values for which to search. Each expression can be a scalar value or a tabular expression that produces a set of values. If a tabular expression has multiple columns, the first column is used. The search will consider up to 1,000,000 distinct values.

Returns

Rows in T for which the predicate is true.

Example

List of scalars

The following query shows how to use !in~ with a comma-separated list of scalar values.

StormEvents 
| where State !in~ ("Florida", "Georgia", "New York") 
| count

Output

Count
54,291

Dynamic array

The following query shows how to use !in~ with a dynamic array.

StormEvents 
| where State !in~ (dynamic(["Florida", "Georgia", "New York"])) 
| count

Output

Count
54291

The same query can also be written with a let statement.

let states = dynamic(["Florida", "Georgia", "New York"]);
StormEvents 
| where State !in~ (states)
| summarize count() by State

Output

Count
54291

Tabular expression

The following query shows how to use !in~ with an inline tabular expression. Notice that an inline tabular expression must be enclosed with double parentheses.

StormEvents 
| where State !in~ (PopulationData | where Population > 5000000 | project State)
| summarize count() by State

Output

Statecount_
KANSAS3166
IOWA2337
NEBRASKA1766
OKLAHOMA1716
SOUTH DAKOTA1567

The same query can also be written with a let statement. Notice that the double parentheses as provided in the last example aren’t necessary in this case.

let large_states = PopulationData | where Population > 5000000 | project State;
StormEvents 
| where State !in~ (large_states)
| summarize count() by State

Output

Statecount_
KANSAS3166
IOWA2337
NEBRASKA1766
OKLAHOMA1716
SOUTH DAKOTA1567

13.7.10 - The case-insensitive !startswith string operators

Learn how to use the !startswith string operator to filter records for data that doesn’t start with a case-insensitive search string.

Filters a record set for data that doesn’t start with a case-insensitive search string.

Performance tips

When possible, use the case-sensitive !startswith_cs.

Syntax

T | where column !startswith (expression)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose records are to be filtered.
columnstring✔️The column by which to filter.
expressionscalar✔️The scalar or literal expression for which to search.

Returns

Rows in T for which the predicate is true.

Example

StormEvents
| summarize event_count=count() by State
| where State !startswith "i"
| where event_count > 2000
| project State, event_count

Output

Stateevent_count
TEXAS4701
KANSAS3166
MISSOURI2016

13.7.11 - The case-insensitive =~ (equals) string operator

Learn how to use the =~ (equals) operator to filter a record set for data with a case-insensitive string.

Filters a record set for data with a case-insensitive string.

The following table provides a comparison of the == (equals) operators:

OperatorDescriptionCase-SensitiveExample (yields true)
==EqualsYes"aBc" == "aBc"
!=Not equalsYes"abc" != "ABC"
=~EqualsNo"abc" =~ "ABC"
!~Not equalsNo"aBc" !~ "xyz"

For more information about other operators and to determine which operator is most appropriate for your query, see datatype string operators.

Performance tips

When possible, use == - a case-sensitive version of the operator.

Syntax

T | where col =~ (expression)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose records are to be filtered.
colstring✔️The column to filter.
expressionstring✔️The expression used to filter.

Returns

Rows in T for which the predicate is true.

Example

The State values in the StormEvents table are capitalized. The following query matches columns with the value “KANSAS”.

StormEvents
| where State =~ "kansas"
| project EventId, State

The following table only shows the first 10 results. To see the full output, run the query.

EventIdState
70787KANSAS
43450KANSAS
43451KANSAS
38844KANSAS
18463KANSAS
18464KANSAS
18495KANSAS
43466KANSAS
43467KANSAS
43470KANSAS

13.7.12 - The case-insensitive contains string operator

Learn how to use the contains operator to filter a record set for data containing a case-insensitive string.

Filters a record set for data containing a case-insensitive string. contains searches for arbitrary sub-strings rather than terms.

Performance tips

When possible, use contains_cs - a case-sensitive version of the operator.

If you’re looking for a term, use has for faster results.

Syntax

T | where col contains_cs (string)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose records are to be filtered.
colstring✔️The name of the column to check for string.
stringstring✔️The case-sensitive string by which to filter the data.

Returns

Rows in T for which string is in col.

Example

StormEvents
| summarize event_count=count() by State
| where State contains "enn"
| where event_count > 10
| project State, event_count
| render table

Output

Stateevent_count
PENNSYLVANIA1687
TENNESSEE1125

13.7.13 - The case-insensitive endswith string operator

Learn how to use the endswith operator to filter a record set for data with a case-insensitive string.

Filters a record set for data with a case-insensitive ending string.

Performance tips

For faster results, use the case-sensitive version of an operator. For example, use endswith_cs instead of endswith.

Syntax

T | where col endswith (expression)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose records are to be filtered.
colstring✔️The column to filter.
expressionstring✔️The expression used to filter.

Returns

Rows in T for which the predicate is true.

Example

StormEvents
| summarize Events=count() by State
| where State endswith "sas"
| where Events > 10
| project State, Events

Output

StateEvents
KANSAS3166
ARKANSAS1028

13.7.14 - The case-insensitive has string operator

Learn how to use the has operator to filter data with a case-insensitive string.

Filters a record set for data with a case-insensitive string. has searches for indexed terms, where an indexed term is three or more characters. If your term is fewer than three characters, the query scans the values in the column, which is slower than looking up the term in the term index.

Performance tips

When possible, use the case-sensitive has_cs.

Syntax

T | where Column has (Expression)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose records are to be filtered.
Columnstring✔️The column used to filter the records.
Expressionscalar or tabular✔️An expression for which to search. If the value is a tabular expression and has multiple columns, the first column is used.

Returns

Rows in T for which the predicate is true.

Example

StormEvents
| summarize event_count=count() by State
| where State has "New"
| where event_count > 10
| project State, event_count

Output

Stateevent_count
NEW YORK1,750
NEW JERSEY1,044
NEW MEXICO527
NEW HAMPSHIRE394

13.7.15 - The case-insensitive has_all string operator

Learn how to use the has_all string operator to filter a record set for data with one or more case-insensitive search strings.

Filters a record set for data with one or more case-insensitive search strings. has_all searches for indexed terms, where an indexed term is three or more characters. If your term is fewer than three characters, the query scans the values in the column, which is slower than looking up the term in the term index.

For more information about other operators and to determine which operator is most appropriate for your query, see datatype string operators.

Syntax

T | where col has_all (expression,)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input to filter.
colstring✔️The column by which to filter.
expressionscalar or tabular✔️An expression that specifies the values for which to search. Each expression can be a scalar value or a tabular expression that produces a set of values. If a tabular expression has multiple columns, the first column is used. The search will consider up to 256 distinct values.

Returns

Rows in T for which the predicate is true.

Examples

Set of scalars

The following query shows how to use has_all with a comma-separated set of scalar values.

StormEvents 
| where EpisodeNarrative has_all ("cold", "strong", "afternoon", "hail")
| summarize Count=count() by EventType
| top 3 by Count

Output

EventTypeCount
Thunderstorm Wind517
Hail392
Flash Flood24

Dynamic array

The same result can be achieved using a dynamic array notation.

StormEvents 
| where EpisodeNarrative has_all (dynamic(["cold", "strong", "afternoon", "hail"]))
| summarize Count=count() by EventType
| top 3 by Count

Output

EventTypeCount
Thunderstorm Wind517
Hail392
Flash Flood24

The same query can also be written with a let statement.

let criteria = dynamic(["cold", "strong", "afternoon", "hail"]);
StormEvents 
| where EpisodeNarrative has_all (criteria)
| summarize Count=count() by EventType
| top 3 by Count
EventTypeCount
Thunderstorm Wind517
Hail392
Flash Flood24

13.7.16 - The case-insensitive has_any string operator

Learn how to use the has_any operator to filter data with any set of case-insensitive strings.

Filters a record set for data with any set of case-insensitive strings. has_any searches for indexed terms, where an indexed term is three or more characters. If your term is fewer than three characters, the query scans the values in the column, which is slower than looking up the term in the term index.

For more information about other operators and to determine which operator is most appropriate for your query, see datatype string operators.

Performance tips

Syntax

T | where col has_any (expression,)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input to filter.
colstring✔️The column by which to filter.
expressionscalar or tabular✔️An expression that specifies the values for which to search. Each expression can be a scalar value or a tabular expression that produces a set of values. If a tabular expression has multiple columns, the first column is used. The search will consider up to 10,000 distinct values.

Returns

Rows in T for which the predicate is true.

Examples

List of scalars

The following query shows how to use has_any with a comma-separated list of scalar values.

StormEvents 
| where State has_any ("CAROLINA", "DAKOTA", "NEW") 
| summarize count() by State

Output

Statecount_
NEW YORK1750
NORTH CAROLINA1721
SOUTH DAKOTA1567
NEW JERSEY1044
SOUTH CAROLINA915
NORTH DAKOTA905
NEW MEXICO527
NEW HAMPSHIRE394

Dynamic array

The following query shows how to use has_any with a dynamic array.

StormEvents 
| where State has_any (dynamic(['south', 'north']))
| summarize count() by State

Output

Statecount_
NORTH CAROLINA1721
SOUTH DAKOTA1567
SOUTH CAROLINA915
NORTH DAKOTA905
ATLANTIC SOUTH193
ATLANTIC NORTH188

The same query can also be written with a let statement.

let areas = dynamic(['south', 'north']);
StormEvents 
| where State has_any (areas)
| summarize count() by State

Output

Statecount_
NORTH CAROLINA1721
SOUTH DAKOTA1567
SOUTH CAROLINA915
NORTH DAKOTA905
ATLANTIC SOUTH193
ATLANTIC NORTH188

Tabular expression

The following query shows how to use has_any with an inline tabular expression. Notice that an inline tabular expression must be enclosed with double parentheses.

StormEvents 
| where State has_any ((PopulationData | where Population > 5000000 | project State))
| summarize count() by State

Output

Statecount_
TEXAS4701
ILLINOIS2022
MISSOURI2016
GEORGIA1983
MINNESOTA1881

The same query can also be written with a let statement. Notice that the double parentheses as provided in the last example aren’t necessary in this case.

let large_states = PopulationData | where Population > 5000000 | project State;
StormEvents 
| where State has_any (large_states)
| summarize count() by State

Output

Statecount_
TEXAS4701
ILLINOIS2022
MISSOURI2016
GEORGIA1983
MINNESOTA1881

|…|…|

13.7.17 - The case-insensitive hasprefix string operator

Learn how to use the hasprefix operator to filter data with a case-insensitive string.

Filters a record set for data with a case-insensitive starting string.

For best performance, use strings of three characters or more. hasprefix searches for indexed terms, where a term is three or more characters. If your term is fewer than three characters, the query scans the values in the column, which is slower than looking up the term in the term index.

Performance tips

When possible, use the case-sensitive hasprefix_cs.

Syntax

T | where Column hasprefix (Expression)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose records are to be filtered.
Columnstring✔️The column used to filter.
Expressionstring✔️The expression for which to search.

Returns

Rows in T for which the predicate is true.

Example

StormEvents
| summarize event_count=count() by State
| where State hasprefix "la"
| project State, event_count
Stateevent_count
LAKE MICHIGAN182
LAKE HURON63
LAKE SUPERIOR34
LAKE ST CLAIR32
LAKE ERIE27
LAKE ONTARIO8

13.7.18 - The case-insensitive hassuffix string operator

Learn how to use the hassuffix operator to filter data with a case-insensitive suffix string.

Filters a record set for data with a case-insensitive ending string. hassuffix returns true if there is a term inside the filtered string column ending with the specified string expression.

Performance tips

When possible, use the case-sensitive hassuffix_cs.

Syntax

T | where Column hassuffix (Expression)

Parameters

NameTypeRequiredDescription
TstringThe tabular input whose records are to be filtered.
ColumnstringThe column by which to filter.
ExpressionscalarThe scalar or literal expression for which to search.

Returns

Rows in T for which the predicate is true.

Example

StormEvents
| summarize event_count=count() by State
| where State hassuffix "o"
| project State, event_count

Output

Stateevent_count
COLORADO1654
OHIO1233
GULF OF MEXICO577
NEW MEXICO527
IDAHO247
PUERTO RICO192
LAKE ONTARIO8

13.7.19 - The case-insensitive in~ string operator

Learn how to use the in~ operator to filter data with a case-insensitive string.

Filters a record set for data with a case-insensitive string.

Performance tips

When possible, use the case-sensitive in.

Syntax

T | where col in~ (expression,)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input to filter.
colstring✔️The column by which to filter.
expressionscalar or tabular✔️An expression that specifies the values for which to search. Each expression can be a scalar value or a tabular expression that produces a set of values. If a tabular expression has multiple columns, the first column is used. The search will consider up to 1,000,000 distinct values.

Returns

Rows in T for which the predicate is true.

Examples

List of scalars

The following query shows how to use in~ with a comma-separated list of scalar values.

StormEvents 
| where State in~ ("FLORIDA", "georgia", "NEW YORK") 
| count

Output

Count
4775

Dynamic array

The following query shows how to use in~ with a dynamic array.

StormEvents 
| where State in~ (dynamic(["FLORIDA", "georgia", "NEW YORK"])) 
| count

Output

Count
4775

The same query can also be written with a let statement.

let states = dynamic(["FLORIDA", "georgia", "NEW YORK"]);
StormEvents 
| where State has_any (states)
| summarize count() by State

Output

Count
4775

Tabular expression

The following query shows how to use in~ with an inline tabular expression. Notice that an inline tabular expression must be enclosed with double parentheses.

StormEvents 
| where State in~ (PopulationData | where Population > 5000000 | project State)
| summarize count() by State

Output

Statecount_
TEXAS4701
ILLINOIS2022
MISSOURI2016
GEORGIA1983
MINNESOTA1881

The same query can also be written with a let statement. Notice that the double parentheses as provided in the last example aren’t necessary in this case.

let large_states = PopulationData | where Population > 5000000 | project State;
StormEvents 
| where State in~ (large_states)
| summarize count() by State

Output

Statecount_
TEXAS4701
ILLINOIS2022
MISSOURI2016
GEORGIA1983
MINNESOTA1881

13.7.20 - The case-insensitive startswith string operator

Learn how to use the case-insensitive startswith string operator to filter a record set with a case-insensitive string starting sequence.

Filters a record set for data with a case-insensitive string starting sequence.

Performance tips

When possible, use the case-sensitive startswith_cs.

Syntax

T | where col startswith (expression)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input to filter.
colstring✔️The column used to filter.
expressionstring✔️The expression by which to filter.

Returns

Rows in T for which the predicate is true.

Example

StormEvents
| summarize event_count=count() by State
| where State startswith "Lo"
| where event_count > 10
| project State, event_count

Output

Stateevent_count
LOUISIANA463

13.7.21 - The case-sensitive != (not equals) string operator

Learn how to use the != (not equals) string operator to filter records for data that doesn’t match a case-sensitive string.

Filters a record set for data that doesn’t match a case-sensitive string.

The following table provides a comparison of the == (equals) operators:

OperatorDescriptionCase-SensitiveExample (yields true)
==EqualsYes"aBc" == "aBc"
!=Not equalsYes"abc" != "ABC"
=~EqualsNo"abc" =~ "ABC"
!~Not equalsNo"aBc" !~ "xyz"

For more information about other operators and to determine which operator is most appropriate for your query, see datatype string operators.

Performance tips

Syntax

T | where column != (expression)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose records are to be filtered.
columnstring✔️The column by which to filter.
expressionscalar✔️The scalar or literal expression for which to search.

Returns

Rows in T for which the predicate is true.

Example

StormEvents
| summarize event_count=count() by State
| where (State != "FLORIDA") and (event_count > 4000)
| project State, event_count

Output

Stateevent_count
TEXAS4,701

13.7.22 - The case-sensitive !contains_cs string operator

Learn how to use the !contains_cs string operator to filter data that doesn’t include a case-sensitive string.

Filters a record set for data that doesn’t include a case-sensitive string. !contains_cs searches for characters rather than terms of three or more characters. The query scans the values in the column, which is slower than looking up a term in a term index.

Performance tips

If you’re looking for a term, use !has_cs for faster results.

Syntax

Case-sensitive syntax

T | where Column !contains_cs (Expression)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose records are to be filtered.
Columnstring✔️The column by which to filter.
Expressionscalar✔️The scalar or literal expression for which to search.

Returns

Rows in T for which the predicate is true.

Examples

StormEvents
| summarize event_count=count() by State
| where State !contains_cs "AS"
| count

Output

Count
59
StormEvents
| summarize event_count=count() by State
| where State !contains_cs "TEX"
| where event_count > 3000
| project State, event_count

Output

Stateevent_count
KANSAS3,166

13.7.23 - The case-sensitive !endswith_cs string operator

Learn how to use the !endswith_cs string operator to filter data that doesn’t contain a case-insensitive string.

Filters a record set for data that doesn’t contain a case-insensitive ending string.

Performance tips

Syntax

T | where col !endswith_cs (expression)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose records are to be filtered.
colstring✔️The column to filter.
expressionstring✔️The expression used to filter.

Returns

Rows in T for which the predicate is true.

Example

StormEvents
| summarize Events=count() by State
| where State !endswith_cs "A"

The following table only shows the first 10 results. To see the full output, run the query.

StateEvents
TEXAS4701
KANSAS3166
ILLINOIS2022
MISSOURI2016
WISCONSIN1850
NEW YORK1750
COLORADO1654
MICHIGAN1637
KENTUCKY1391
OHIO1233

13.7.24 - The case-sensitive !has_cs string operator

Learn how to use the !has_cs string operator to filter records for data that doesn’t have a matching case-sensitive string.

Filters a record set for data that doesn’t have a matching case-sensitive string. !has_cs searches for indexed terms, where an indexed term is three or more characters. If your term is fewer than three characters, the query scans the values in the column, which is slower than looking up the term in the term index.

Performance tips

Syntax

T | where column !has_cs (expression)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose records are to be filtered.
columnstring✔️The column by which to filter.
expressionscalar✔️The scalar or literal expression for which to search.

Returns

Rows in T for which the predicate is true.

Example

StormEvents
| summarize event_count=count() by State
| where State !has_cs "new"
| count

Output

Count
67

13.7.25 - The case-sensitive !hasprefix_cs string operator

Learn how to use the !hasprefix_cs string operator to filter records for data that doesn’t have a case-sensitive prefix.

Filters a record set for data that doesn’t have a case-sensitive starting string. !hasprefix_cs searches for indexed terms, where an indexed term is three or more characters. If your term is fewer than three characters, the query scans the values in the column, which is slower than looking up the term in the term index.

OperatorDescriptionCase-SensitiveExample (yields true)
hasprefixRHS is a term prefix in LHSNo"North America" hasprefix "ame"
!hasprefixRHS isn’t a term prefix in LHSNo"North America" !hasprefix "mer"
hasprefix_csRHS is a term prefix in LHSYes"North America" hasprefix_cs "Ame"
!hasprefix_csRHS isn’t a term prefix in LHSYes"North America" !hasprefix_cs "CA"

For more information about other operators and to determine which operator is most appropriate for your query, see datatype string operators.

Performance tips

Syntax

T | where column !hasprefix_cs (expression)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose records are to be filtered.
columnstring✔️The column by which to filter.
expressionscalar✔️The scalar or literal expression for which to search.

Returns

Rows in T for which the predicate is true.

Example

StormEvents
| summarize event_count=count() by State
| where State !hasprefix_cs "P"
| count

Output

Count
64

13.7.26 - The case-sensitive !hassuffix_cs string operator

Learn how to use the !hassuffix_cs string operator to filter records for data that doesn’t have a case-sensitive suffix.

Filters a record set for data that doesn’t have a case-sensitive ending string. !hassuffix_cs returns true if there is no term inside string column ending with the specified string expression.

Performance tips

Syntax

T | where column !hassuffix_cs (expression)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose records are to be filtered.
columnstring✔️The column by which to filter.
expressionscalar✔️The scalar or literal expression for which to search.

Returns

Rows in T for which the predicate is true.

Example

StormEvents
| summarize event_count=count() by State
| where State !hassuffix_cs "AS"
| where event_count > 2000
| project State, event_count

Output

Stateevent_count
IOWA2337
ILLINOIS2022
MISSOURI2016

13.7.27 - The case-sensitive !in string operator

Learn how to use the !in string operator to filter records for data without a case-sensitive string.

Filters a record set for data without a case-sensitive string.

Performance tips

Syntax

T | where col !in (expression,)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input to filter.
colstring✔️The column by which to filter.
expressionscalar or tabular✔️An expression that specifies the values for which to search. Each expression can be a scalar value or a tabular expression that produces a set of values. If a tabular expression has multiple columns, the first column is used. The search will consider up to 1,000,000 distinct values.

Returns

Rows in T for which the predicate is true.

Example

List of scalars

The following query shows how to use !in with a comma-separated list of scalar values.

StormEvents 
| where State !in ("FLORIDA", "GEORGIA", "NEW YORK") 
| count

Output

Count
54291

Dynamic array

The following query shows how to use !in with a dynamic array.

StormEvents 
| where State !in (dynamic(["FLORIDA", "GEORGIA", "NEW YORK"])) 
| count

Output

Count
54291

The same query can also be written with a let statement.

let states = dynamic(["FLORIDA", "GEORGIA", "NEW YORK"]);
StormEvents 
| where State !in (states)
| summarize count() by State

Output

Count
54291

Tabular expression

The following query shows how to use !in with an inline tabular expression. Notice that an inline tabular expression must be enclosed with double parentheses.

StormEvents 
| where State !in (PopulationData | where Population > 5000000 | project State)
| summarize count() by State

Output

StateCount
KANSAS3166
IOWA2337
NEBRASKA1766
OKLAHOMA1716
SOUTH DAKOTA1567

The same query can also be written with a let statement. Notice that the double parentheses as provided in the last example aren’t necessary in this case.

let large_states = PopulationData | where Population > 5000000 | project State;
StormEvents 
| where State !in (large_states)
| summarize count() by State

Output

StateCount
KANSAS3166
IOWA2337
NEBRASKA1766
OKLAHOMA1716
SOUTH DAKOTA1567

13.7.28 - The case-sensitive !startswith_cs string operator

Learn how to use the !startswith_cs string operator to filter records for data that doesn’t start with a case-sensitive search string.

Filters a record set for data that doesn’t start with a case-sensitive search string.

Performance tips

Syntax

T | where column !startswith_cs (expression)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose records are to be filtered.
columnstring✔️The column by which to filter.
expressionscalar✔️The scalar or literal expression for which to search.

Returns

Rows in T for which the predicate is true.

Example

StormEvents
| summarize event_count=count() by State
| where State !startswith_cs "I"
| where event_count > 2000
| project State, event_count

Output

Stateevent_count
TEXAS4701
KANSAS3166
MISSOURI2016

13.7.29 - The case-sensitive == (equals) string operator

Learn how to use the == (equals) operator to filter a record set for data matching a case-sensitive string.

Filters a record set for data matching a case-sensitive string.

The following table provides a comparison of the == operators:

OperatorDescriptionCase-SensitiveExample (yields true)
==EqualsYes"aBc" == "aBc"
!=Not equalsYes"abc" != "ABC"
=~EqualsNo"abc" =~ "ABC"
!~Not equalsNo"aBc" !~ "xyz"

For more information about other operators and to determine which operator is most appropriate for your query, see datatype string operators.

Performance tips

Syntax

T | where col == (expression,)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose records are to be filtered.
colstring✔️The column to filter.
expressionstring✔️The expression used to filter.

Returns

Rows in T for which the predicate is true.

Example

StormEvents
| where State == "kansas"
| count 
Count
0
StormEvents
| where State == "KANSAS"
| count 
Count
3,166

13.7.30 - The case-sensitive contains_cs string operator

Learn how to use the contains_cs operator to filter a record set for data containing a case-sensitive string.

Filters a record set for data containing a case-sensitive string. contains_cs searches for arbitrary sub-strings rather than terms.

Performance tips

If you’re looking for a term, use has_cs for faster results.

Syntax

T | where col contains_cs (string)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose records are to be filtered.
colstring✔️The name of the column to check for string.
stringstring✔️The case-sensitive string by which to filter the data.

Returns

Rows in T for which string is in col.

Example

StormEvents
| summarize event_count=count() by State
| where State contains_cs "AS"

Output

Count
8

13.7.31 - The case-sensitive endswith_cs string operator

Learn how to use the endswith_cs operator to filter a record set for data with a case-sensitive ending string.

Filters a record set for data with a case-sensitive ending string.

Performance tips

Syntax

T | where col endswith_cs (expression)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose records are to be filtered.
colstring✔️The column to filter.
expressionstring✔️The expression used to filter.

Returns

Rows in T for which the predicate is true.

Example

StormEvents
| summarize Events = count() by State
| where State endswith_cs "NA"

Output

StateEvents
NORTH CAROLINA1721
MONTANA1230
INDIANA1164
SOUTH CAROLINA915
LOUISIANA463
ARIZONA340

13.7.32 - The case-sensitive has_cs string operator

Learn how to use the has_cs operator to filter data with a case-sensitive search string.

Filters a record set for data with a case-sensitive search string. has_cs searches for indexed terms, where an indexed term is three or more characters. If your term is fewer than three characters, the query scans the values in the column, which is slower than looking up the term in the term index.

Performance tips

Syntax

T | where Column has_cs (Expression)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose records are to be filtered.
Columnstring✔️The column used to filter the records.
Expressionscalar or tabular✔️An expression for which to search. If the value is a tabular expression and has multiple columns, the first column is used.

Returns

Rows in T for which the predicate is true.

Example

StormEvents
| summarize event_count=count() by State
| where State has_cs "FLORIDA"

Output

Stateevent_count
FLORIDA1042

Since all State values are capitalized, searching for a lowercase string with the same value, such as “florida”, won’t yield any results.

StormEvents
| summarize event_count=count() by State
| where State has_cs "florida"

Output

Stateevent_count

13.7.33 - The case-sensitive hasprefix_cs string operator

Learn how to use the hasprefix_cs operator to filter data with a case-sensitive prefix string.

Filters a record set for data with a case-sensitive starting string.

For best performance, use strings of three characters or more. hasprefix_cs searches for indexed terms, where a term is three or more characters. If your term is fewer than three characters, the query scans the values in the column, which is slower than looking up the term in the term index.

Performance tips

Syntax

T | where Column hasprefix_cs (Expression)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose records are to be filtered.
Columnstring✔️The column used to filter.
Expressionstring✔️The expression for which to search.

Returns

Rows in T for which the predicate is true.

Examples

StormEvents
| summarize event_count=count() by State
| where State hasprefix_cs "P"
| count 
Count
3
StormEvents
| summarize event_count=count() by State
| where State hasprefix_cs "P"
| project State, event_count
Stateevent_count
PENNSYLVANIA1687
PUERTO RICO192
E PACIFIC10

13.7.34 - The case-sensitive hassuffix_cs string operator

Learn how to use the hassuffix_cs operator to filter data with a case-sensitive suffix string.

Filters a record set for data with a case-insensitive ending string. hassuffix_cs returns true if there is a term inside the filtered string column ending with the specified string expression.

Performance tips

Syntax

T | where column hassuffix_cs ( expression )

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose records are to be filtered.
columnstring✔️The column by which to filter.
expressionscalar✔️The scalar or literal expression for which to search.

Returns

Rows in T for which the predicate is true.

Examples

StormEvents
| summarize event_count=count() by State
| where State hassuffix_cs "AS"
| where event_count > 2000
| project State, event_count

Output

Stateevent_count
TEXAS4701
KANSAS3166

13.7.35 - The case-sensitive in string operator

Learn how to use the in operator to filter data with a case-sensitive string.

Filters a record set for data with a case-sensitive string.

Performance tips

Syntax

T | where col in (expression,)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input to filter.
colstring✔️The column by which to filter.
expressionscalar or tabular✔️An expression that specifies the values for which to search. Each expression can be a scalar value or a tabular expression that produces a set of values. If a tabular expression has multiple columns, the first column is used. The search considers up to 1,000,000 distinct values.

Returns

Rows in T for which the predicate is true.

Examples

List of scalars

The following query shows how to use in with a list of scalar values.

StormEvents 
| where State in ("FLORIDA", "GEORGIA", "NEW YORK") 
| count

Output

Count
4775

Dynamic array

The following query shows how to use in with a dynamic array.

let states = dynamic(['FLORIDA', 'ATLANTIC SOUTH', 'GEORGIA']);
StormEvents 
| where State in (states)
| count

Output

Count
3218

Tabular expression

The following query shows how to use in with a tabular expression.

let Top_5_States = 
    StormEvents
    | summarize count() by State
    | top 5 by count_; 
StormEvents 
| where State in (Top_5_States) 
| count

The same query can be written with an inline tabular expression statement.

StormEvents 
| where State in (
    StormEvents
    | summarize count() by State
    | top 5 by count_
    ) 
| count

Output

Count
14242

Top with other example

The following example identifies the top five states with lightning events and uses the iff() function and in operator to classify lightning events by the top five states, labeled by state name, and all others labeled as “Other.”

let Lightning_By_State = materialize(StormEvents
    | summarize lightning_events = countif(EventType == 'Lightning') by State);
let Top_5_States = Lightning_By_State | top 5 by lightning_events | project State; 
Lightning_By_State
| extend State = iff(State in (Top_5_States), State, "Other")
| summarize sum(lightning_events) by State 

Output

Statesum_lightning_events
ALABAMA29
WISCONSIN31
TEXAS55
FLORIDA85
GEORGIA106
Other415

Use a static list returned by a function

The following example counts events from the StormEvents table based on a predefined list of interesting states. The interesting states are defined by the InterestingStates() function.

StormEvents 
| where State in (InterestingStates()) 
| count

Output

Count
4775

The following query displays which states are considered interesting by the InterestingStates() function.

.show function InterestingStates

Output

NameParametersBodyFolderDocString
InterestingStates(){ dynamic([“WASHINGTON”, “FLORIDA”, “GEORGIA”, “NEW YORK”]) }

13.7.36 - The case-sensitive startswith string operator

Learn how to use the startswith string operator to filter a record set with a case-sensitive string starting sequence.

Filters a record set for data with a case-sensitive string starting sequence.

Performance tips

Syntax

T | where col startswith_cs (expression)

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input to filter.
colstring✔️The column used to filter.
expressionstring✔️The expression by which to filter.

Returns

Rows in T for which the predicate is true.

Example

StormEvents
| summarize event_count=count() by State
| where State startswith_cs "I"
| where event_count > 2000
| project State, event_count

Output

Stateevent_count
IOWA2337
ILLINOIS2022

14 - Special functions

14.1 - cluster()

Learn how to use the cluster() function to change the reference of the query to a remote cluster or Eventhouse.

Changes the reference of the query to a remote cluster. To access a database within the same cluster, use the database() function. For more information, see cross-database and cross-cluster queries.

Changes the reference of the query to a remote Eventhouse. To access a database within the same Eventhouse, use the database() function. For more information, see cross-database and cross-cluster queries.

Syntax

cluster(name)

Parameters

NameTypeRequiredDescription
namestring✔️The name of the cluster to reference. The value can be specified as a fully qualified domain name, or the name of the cluster without the .kusto.windows.net suffix. The cluster name is treated as case-insenstive and the recommendation is to provide it lower-case. The value can’t be the result of subquery evaluation.
NameTypeRequiredDescription
namestring✔️The full URL of the Eventhouse to reference. The value can be specified as a fully qualified domain name, or the name of the Eventhouse. The Eventhouse name is treated as case-insenstive and the recommendation is to provide it lower-case. The value can’t be the result of subquery evaluation.

Examples

Use cluster() to access remote cluster

The following query can be run on any cluster.

cluster('help').database('Samples').StormEvents | count

cluster('help.kusto.windows.net').database('Samples').StormEvents | count

Use cluster() to access remote Eventhouse

The following query can be run on any Eventhouse.

cluster('help').database('Samples').StormEvents | count

cluster('help.kusto.windows.net').database('Samples').StormEvents | count

Output

Count
59066

Use cluster() inside let statements

The previous query can be rewritten to use a query-defined function (let statement) that takes a parameter called clusterName and passes it to the cluster() function.

let foo = (clusterName:string)
{
    cluster(clusterName).database('Samples').StormEvents | count
};
foo('help')

Output

Count
59066

Use cluster() inside Functions

The same query as above can be rewritten to be used in a function that receives a parameter clusterName - which is passed into the cluster() function.

.create function foo(clusterName:string)
{
    cluster(clusterName).database('Samples').StormEvents | count
};

14.2 - Cross-cluster and cross-database queries

This article describes cross-database and cross-cluster queries.

Queries run with a particular database designated as the database in context. This database acts as the default for permission checking. If an entity is referenced in a query without specifying the cluster or database, it’s resolved against this database. Queries run with a particular database designated as the database in context. This database acts as the default for permission checking. If an entity is referenced in a query without specifying the context, it’s resolved against this database.

This article explains how to execute queries that involve entities located outside the current context database.

Prerequisites

Identify the cluster and database in context

Identify the eventhouse and database in context

The following table explains how to identify the database in context by query environment.

EnvironmentDatabase in context
Kusto ExplorerThe default database is the one selected in the connections panel, and the current cluster is the cluster containing that database.
Azure Data Explorer web UIThe default database is the one selected in the connection pane, and the current cluster is the cluster containing that database.
Client librariesSpecify the default database and cluster by the Data Source and Initial Catalog properties of the Kusto connection strings.
EnvironmentDatabase/Eventhouse in context
Kusto ExplorerThe default database is the one selected in the connections panel and the current eventhouse is the eventhouse containing that database.
Real-Time Intelligence KQL querysetThe default database is the current database selected either directly or through an eventhouse.
Client librariesSpecify the default database with the database URI, used for the Data Source properties of the Kusto connection strings. For the eventhouse, use its cluster URI. You can find it by selecting System Overview in the Eventhouse details section for the selected eventhouse.

Perform cross-cluster or cross-database queries

Perform cross-eventhouse or cross-database queries

To access entities outside the database in context, use the cluster() and database() functions to qualify the entity name.

For a table in a different database within the same cluster:

database("<DatabaseName>").<TableName>

For a table in a remote cluster:

cluster("<ClusterName>").database("<DatabaseName>").<TableName>

For a table in a different database within the same eventhouse:

database("<DatabaseName>").<TableName>

For a table in a remote eventhouse or remote service (like Azure Data Explorer) cluster:

cluster("<EventhouseClusterURI>").database("<DatabaseName>").<TableName>

Qualified names and the union operator

When a qualified name appears as an operand of the union operator, then wildcards can be used to specify multiple tables and multiple databases. Wildcards aren’t permitted in cluster names.

union withsource=TableName *, database("OtherDb*").*Table, cluster("OtherCluster").database("*").*

When a qualified name appears as an operand of the union operator, then wildcards can be used to specify multiple tables and multiple databases. Wildcards aren’t permitted in eventhouse names.

union withsource=TableName *, database("OtherDb*").*Table, cluster("OtherEventhouseClusterURI").database("*").*

Qualified names and restrict access statements

Qualified names or patterns can also be included in restrict access statement. Wildcards in cluster names aren’t permitted. Wildcards in eventhouse names aren’t permitted.

The following query restricts query access to the following entities:

  • Any entity name starting with my… in the default database.
  • Any table in all the databases named MyOther… of the current cluster.
  • Any table in all the databases named my2… in the cluster OtherCluster.kusto.windows.net.
restrict access to (my*, database("MyOther*").*, cluster("OtherCluster").database("my2*").*);
  • Any entity name starting with event… in the default database.
  • Any table in all the databases named EventOther… of the current eventhouse.
  • Any table in all the databases named event2… in the eventhouse OtherEventhouse.kusto.data.microsoft.com.
restrict access to (event*, database("EventOther*").*, cluster("OtherEventhouseClusterURI").database("event2*").*);

Handle schema changes of remote entities

To process a cross-cluster query, the cluster that performs the initial query interpretation needs to have the schema of the entities referenced on remote clusters. To obtain this information, a command is sent to retrieve the schemas, which are then stored in a cache.

If there’s a schema change in the remote cluster, a cached schema might become outdated. This can lead to undesired effects, including scenarios where new or deleted columns cause a Partial query failure. To solve such issues, manually refresh the schema with the .clear cache remote-schema command. To process a cross-eventhouse or eventhouse-to-ADX cluster query, the eventhouse that performs the initial query interpretation needs to have the schema of the entities referenced on remote eventhouses or clusters. To obtain this information, a command is sent to retrieve the schemas, which are then stored in a cache.

If there’s a remote schema change, a cached schema might become outdated. This can lead to undesired effects, including scenarios where new or deleted columns cause a Partial query failure. To solve such issues, manually refresh the schema with the .clear cache remote-schema command.

Functions and views

Functions and views (persistent and created inline) can reference tables across database and cluster boundaries. The following code is valid.

let MyView = Table1 join database("OtherDb").Table2 on Key | join cluster("OtherCluster").database("SomeDb").Table3 on Key;
MyView | where ...

Persistent functions and views can be accessed from another database in the same cluster.

For example, say you create the following tabular function (view) in a database OtherDb:

.create function MyView(v:string) { Table1 | where Column1 has v ...  }  

Then, you create the following scalar function in a database OtherDb:

.create function MyCalc(a:double, b:double, c:double) { (a + b) / c }  

In default database, these entities can be referenced as follows:

database("OtherDb").MyView("exception") | extend CalCol=database("OtherDb").MyCalc(Col1, Col2, Col3) | take 10

Functions and views (persistent and created inline) can reference tables across database and eventhouse boundaries. The following code is valid.

let EventView = Table1 join database("OtherDb").Table2 on Key | join cluster("OtherEventhouseClusterURI").database("SomeDb").Table3 on Key;
EventView | where ...

Persistent functions and views can be accessed from another database in the same eventhouse.

For example, say you create the following tabular function (view) in a database OtherDb:

.create function EventView(v:string) { Table1 | where Column1 has v ...  }  

Then, you create the following scalar function in a database OtherDb:

.create function EventCalc(a:double, b:double, c:double) { (a + b) / c }  

For example, say you create the following tabular function (view) in a database OtherDb:

.create function EventView(v:string) { Table1 | where Column1 has v ...  }  

Then, you create the following scalar function in a database OtherDb:

.create function EventCalc(a:double, b:double, c:double) { (a + b) / c }  

In default database, these entities can be referenced as follows:

database("OtherDb").EventView("exception") | extend CalCol=database("OtherDb").EventCalc(Col1, Col2, Col3) | take 10

Limitations of cross-cluster function calls

Tabular functions or views can be referenced across clusters. The following limitations apply:

  • Remote functions must return tabular schema. Scalar functions can only be accessed in the same cluster.
  • Remote functions can accept only scalar arguments. Functions that get one or more table arguments can only be accessed in the same cluster.
  • Remote functions’ result schema must be fixed (known in advance without executing parts of the query). So query constructs such as the pivot plugin can’t be used. Some plugins, such as the bag_unpack plugin, support a way to indicate the result schema statically, and in this form it can be used in cross-cluster function calls.
  • For performance reasons, the calling cluster caches the schema of remote entities after the initial call. Therefore, changes made to the remote entity might result in a mismatch with the cached schema information, potentially leading to query failures. For more information, see Cross-cluster queries and schema changes.

Limitations of cross-eventhouse function calls

Tabular functions or views can be referenced across eventhouses. The following limitations apply:

  • Remote functions must return tabular schema. Scalar functions can only be accessed in the same eventhouse.
  • Remote functions can accept only scalar arguments. Functions that get one or more table arguments can only be accessed in the same eventhouse.
  • Remote functions’ result schema must be fixed (known in advance without executing parts of the query). So query constructs such as the pivot plugin can’t be used. Some plugins, such as the bag_unpack plugin, support a way to indicate the result schema statically, and in this form it can be used in cross-eventhouse function calls.
  • For performance reasons, the calling eventhouse caches the schema of remote entities after the initial call. Therefore, changes made to the remote entity might result in a mismatch with the cached schema information, potentially leading to query failures. For more information, see Cross-cluster queries and schema changes.

Examples

The following cross-cluster call is valid.

cluster("OtherCluster").database("SomeDb").MyView("exception") | count

The following query calls a remote scalar function MyCalc. This call violates rule #1, so it’s not valid.

MyTable | extend CalCol=cluster("OtherCluster").database("OtherDb").MyCalc(Col1, Col2, Col3) | take 10

The following query calls remote function MyCalc and provides a tabular parameter. This call violates rule #2, so it’s not valid.

cluster("OtherCluster").database("OtherDb").MyCalc(datatable(x:string, y:string)["x","y"] )

The following cross-eventhouse call is valid.

cluster("OtherEventhouseURI").database("SomeDb").EventView("exception") | count

The following query calls a remote scalar function EventCalc. This call violates rule #1, so it’s not valid.

Eventtable | extend CalCol=cluster("OtherEventhouseClusterURI").database("OtherDb").MyCalc(Col1, Col2, Col3) | take 10

The following query calls remote function EventCalc and provides a tabular parameter. This call violates rule #2, so it’s not valid.

cluster("EventhouseClusterURI").database("OtherDb").MyCalc(datatable(x:string, y:string)["x","y"] )

The following query calls remote function SomeTable that has a variable schema output based on the parameter tablename. This call violates rule #3, so it’s not valid.

Tabular function in OtherDb.

.create function SomeTable(tablename:string) { table(tablename)  }  

In default database.

cluster("OtherCluster").database("OtherDb").SomeTable("MyTable")
cluster("OtherEventhouseClusterURI").database("OtherDb").SomeTable("EventTable")

The following query calls remote function GetDataPivot that has a variable schema output based on the data (pivot() plugin has dynamic output). This call violates rule #3, so it’s not valid.

Tabular function in OtherDb.

.create function GetDataPivot() { T | evaluate pivot(PivotColumn) }  

Tabular function in the default database.

cluster("OtherCluster").database("OtherDb").GetDataPivot()
cluster("OtherEventhouseClusterURI").database("OtherDb").GetDataPivot()

14.3 - database()

Learn how to use the database() function to change the reference of the query to a specific database.

Changes the reference of the query to a specific database within the cluster scope.

Changes the reference of the query to a specific database within the Eventhouse scope.

``

Syntax

database(databaseName)

Parameters

NameTypeRequiredDescription
databaseNamestringThe name of the database to reference. The databaseName can be either the DatabaseName or PrettyName. The argument must be a constant value and can’t come from a subquery evaluation.

Examples

Use database() to access table of other database

database('Samples').StormEvents | count

Output

Count
59066

Use database() inside let statements

The query above can be rewritten as a query-defined function (let statement) that receives a parameter dbName - which is passed into the database() function.

let foo = (dbName:string)
{
    database(dbName).StormEvents | count
};
foo('help')

Output

Count
59066

Use database() inside stored functions

The same query as above can be rewritten to be used in a function that receives a parameter dbName - which is passed into the database() function.

.create function foo(dbName:string)
{
    database(dbName).StormEvents | count
};

14.4 - external_table()

Learn how to use the external_table() function to reference an external table by name.

References an external table by name.

To accelerate queries over external delta tables, see Query acceleration policy.

Syntax

external_table( TableName [, MappingName ] )

Parameters

NameTypeRequiredDescription
TableNamestring✔️The name of the external table being queried. Must reference an external table of kind blob, adl, or sql.
MappingNamestringA name of a mapping object that maps fields in the external data shards to columns output.

Authentication and authorization

The authentication method to access an external table is based on the connection string provided during its creation, and the permissions required to access the table vary depending on the authentication method. For more information, see Azure Storage external table or SQL Server external table.

14.5 - materialize()

Learn how to use the materialize() function to capture the value of a tabular expression for reuse.

Captures the value of a tabular expression for the duration of the query execution so that it can be referenced multiple times by the query without recalculation.

Syntax

materialize(expression)

Parameters

NameTypeRequiredDescription
expressionstring✔️The tabular expression to be evaluated and cached during query execution.

Remarks

The materialize() function is useful in the following scenarios:

  • To speed up queries that perform heavy calculations whose results are used multiple times in the query.
  • To evaluate a tabular expression only once and use it many times in a query. This is commonly required if the tabular expression is non-deterministic. For example, if the expression uses the rand() or the dcount() functions.

Examples of query performance improvement

The following example shows how materialize() can be used to improve performance of the query. The expression _detailed_data is defined using materialize() function and therefore is calculated only once.

let _detailed_data = materialize(StormEvents | summarize Events=count() by State, EventType);
_detailed_data
| summarize TotalStateEvents=sum(Events) by State
| join (_detailed_data) on State
| extend EventPercentage = Events*100.0 / TotalStateEvents
| project State, EventType, EventPercentage, Events
| top 10 by EventPercentage

Output

StateEventTypeEventPercentageEvents
HAWAII WATERSWaterspout1002
LAKE ONTARIOMarine Thunderstorm Wind1008
GULF OF ALASKAWaterspout1004
ATLANTIC NORTHMarine Thunderstorm Wind95.2127659574468179
LAKE ERIEMarine Thunderstorm Wind92.592592592592625
E PACIFICWaterspout909
LAKE MICHIGANMarine Thunderstorm Wind85.1648351648352155
LAKE HURONMarine Thunderstorm Wind79.365079365079450
GULF OF MEXICOMarine Thunderstorm Wind71.7504332755633414
HAWAIIHigh Surf70.0218818380744320

The following example generates a set of random numbers and calculates:

  • How many distinct values in the set (Dcount)
  • The top three values in the set
  • The sum of all these values in the set

This operation can be done using batches and materialize:

let randomSet = 
    materialize(
        range x from 1 to 3000000 step 1
        | project value = rand(10000000));
randomSet | summarize Dcount=dcount(value);
randomSet | top 3 by value;
randomSet | summarize Sum=sum(value)

Result set 1:

Dcount
2578351

Result set 2:

value
9999998
9999998
9999997

Result set 3:

Sum
15002960543563

Examples of using materialize()

To use the let statement with a value that you use more than once, use the materialize() function. Try to push all possible operators that will reduce the materialized dataset and still keep the semantics of the query. For example, use filters, or project only required columns.

    let materializedData = materialize(Table
    | where Timestamp > ago(1d));
    union (materializedData
    | where Text !has "somestring"
    | summarize dcount(Resource1)), (materializedData
    | where Text !has "somestring"
    | summarize dcount(Resource2))

The filter on Text is mutual and can be pushed to the materialize expression. The query only needs columns Timestamp, Text, Resource1, and Resource2. Project these columns inside the materialized expression.

    let materializedData = materialize(Table
    | where Timestamp > ago(1d)
    | where Text !has "somestring"
    | project Timestamp, Resource1, Resource2, Text);
    union (materializedData
    | summarize dcount(Resource1)), (materializedData
    | summarize dcount(Resource2))

If the filters aren’t identical, as in the following query:

    let materializedData = materialize(Table
    | where Timestamp > ago(1d));
    union (materializedData
    | where Text has "String1"
    | summarize dcount(Resource1)), (materializedData
    | where Text has "String2"
    | summarize dcount(Resource2))

When the combined filter reduces the materialized result drastically, combine both filters on the materialized result by a logical or expression as in the following query. However, keep the filters in each union leg to preserve the semantics of the query.

    let materializedData = materialize(Table
    | where Timestamp > ago(1d)
    | where Text has "String1" or Text has "String2"
    | project Timestamp, Resource1, Resource2, Text);
    union (materializedData
    | where Text has "String1"
    | summarize dcount(Resource1)), (materializedData
    | where Text has "String2"
    | summarize dcount(Resource2))

14.6 - materialized_view()

Learn how to use the materialized_view() function to reference the materialized part of a materialized view.

References the materialized part of a materialized view.

The materialized_view() function supports a way of querying the materialized part only of the view, while specifying the max latency the user is willing to tolerate. This option isn’t guaranteed to return the most up-to-date records, but should always be more performant than querying the entire view. This function is useful for scenarios in which you’re willing to sacrifice some freshness for performance, for example in telemetry dashboards.

Syntax

materialized_view(ViewName, [ max_age ] )

Parameters

NameTypeRequiredDescription
ViewNamestring✔️The name of the materialized view.
max_agetimespanIf not provided, only the materialized part of the view is returned. If provided, the function will return the materialized part of the view if last materialization time is greater than @now - max_age. Otherwise, the entire view is returned, which is identical to querying ViewName directly.

Examples

Query the materialized part of the view only, independent on when it was last materialized.

materialized_view("ViewName")

Query the materialized part only if it was materialized in the last 10 minutes. If the materialized part is older than 10 minutes, return the full view. This option is expected to be less performant than querying the materialized part.

materialized_view("ViewName", 10m)

Notes

  • Once a view is created, it can be queried just as any other table in the database, including participate in cross-cluster / cross-database queries.
  • Materialized views aren’t included in wildcard unions or searches.
  • Syntax for querying the view is the view name (like a table reference).
  • Querying the materialized view will always return the most up-to-date results, based on all records ingested to the source table. The query combines the materialized part of the view with all unmaterialized records in the source table. For more information, see how materialized views work for details.

14.7 - Query results cache

Learn how to use the query results cache functionality to get cached results.

Kusto includes a query results cache. You can choose to get cached results when issuing a query. You’ll experience better query performance and lower resource consumption if your query’s results can be returned by the cache. However, this performance comes at the expense of some “staleness” in the results.

Use the cache

Set the query_results_cache_max_age option as part of the query to use the query results cache. You can set this option in the query text or as a client request property. For example:

set query_results_cache_max_age = time(5m);
GithubEvent
| where CreatedAt > ago(180d)
| summarize arg_max(CreatedAt, Type) by Id

The option value is a timespan that indicates the maximum “age” of the results cache, measured from the query start time. Beyond the set timespan, the cache entry is obsolete and won’t be used again. Setting a value of 0 is equivalent to not setting the option.

Compatibility between queries

Identical queries

The query results cache returns results only for queries that are considered “identical” to a previous cached query. Two queries are considered identical if all of the following conditions are met:

  • The two queries have the same representation (as UTF-8 strings).
  • The two queries are made to the same database.
  • The two queries share the same client request properties. The following properties are ignored for caching purposes:
    • ClientRequestId
    • Application
    • User

Incompatible queries

The query results won’t be cached if any of the following conditions is true:

No valid cache entry

If a cached result satisfying the time constraints couldn’t be found, or there isn’t a cached result from an “identical” query in the cache, the query will be executed and its results cached, as long as:

  • The query execution completes successfully, and
  • The query results size doesn’t exceed 16 MB.

Results from the cache

How does the service indicate that the query results are being served from the cache? When responding to a query, Kusto sends another ExtendedProperties response table that includes a Key column and a Value column. Cached query results will have another row appended to that table:

  • The row’s Key column will contain the string ServerCache
  • The row’s Value column will contain a property bag with two fields:
    • OriginalClientRequestId - Specifies the original request’s ClientRequestId.
    • OriginalStartedOn - Specifies the original request’s execution start time.

Query consistency

Queries using weak consistency can be processed on different cluster nodes. The cache isn’t shared by cluster nodes, every node has a dedicated cache in its own private storage. Therefore, if two identical queries land on different nodes, the query will be executed and cached on both nodes. By setting query consistency to affinitizedweakconsistency, you can ensure that weak consistency queries that are identical land on the same query head, and thus increase the cache hit rate. This is not relevant when using strong consistency.

Management

The following management and observability commands are supported:

  • Show query results cache: Returns statistics related to the query results cache.
  • Clear query results cache: Clears query results cache.
  • Refresh query cache entry: a specific query cache entry can be refreshed using query_results_cache_force_refresh (OptionQueryResultsCacheForceRefresh)client request property. When set to true, this command will force query results cache to be refreshed also when an existing cache is present. This process is useful in scenarios that require queries results to be available for querying. This property must be used in combination with ‘query_results_cache_max_age’, and sent via ClientRequestProperties object. The property can’t be part of a ‘set’ statement.

Capacity

The cache capacity is currently fixed at 1 GB per cluster node. The eviction policy is LRU.

Shard level query results cache

You can use shard-level query results cache for scenarios that require the most up-to-date results, such as a live dashboard. For example, a query that runs every 10 seconds and spans the last 1 hour can benefit from caching intermediate query results at the storage (shard) level.

The shard level query results cache is automatically enabled when the Query results cache is in use. Because it shares the same cache as Query results cache, the same capacity and eviction policies apply.

Syntax

set query_results_cache_per_shard; Query

Example

set query_results_cache_per_shard;
GithubEvent
| where CreatedAt > ago(180d)
| summarize arg_max(CreatedAt, Type) by Id

14.8 - stored_query_result()

Learn how to use the stored_query_result() function to reference a stored query result.

Retrieves a previously created stored query result.

To set a stored query result, see .set stored_query_result command.

Syntax

stored_query_result( StoredQueryResultName )

Parameters

NameTypeRequiredDescription
StoredQueryResultNamestring✔️The name of the stored query result.

Examples

References the stored query result named Numbers.

stored_query_result("Numbers")

Output

X
1
2
3

Pagination

The following example retrieves clicks by Ad network and day, for the last seven days:

.set stored_query_result DailyClicksByAdNetwork7Days with (previewCount = 100) <|
Events
| where Timestamp > ago(7d)
| where EventType == 'click'
| summarize Count=count() by Day=bin(Timestamp, 1d), AdNetwork
| order by Count desc
| project Num=row_number(), Day, AdNetwork, Count

Output

NumDayAdNetworkCount
12020-01-01 00:00:00.0000000NeoAds1002
22020-01-01 00:00:00.0000000HighHorizons543
32020-01-01 00:00:00.0000000PieAds379

Retrieve the next page:

stored_query_result("DailyClicksByAdNetwork7Days")
| where Num between(100 .. 200)

Output

NumDayAdNetworkCount
1002020-01-01 00:00:00.0000000CoolAds301
1012020-01-01 00:00:00.0000000DreamAds254
1022020-01-02 00:00:00.0000000SuperAds123

14.9 - table()

Learn how to use the table() function to reference a table.

The table() function references a table by providing its name as an expression of type string.

Syntax

table( TableName [, DataScope] )

Parameters

NameTypeRequiredDescription
TableNamestring✔️The name of the table being referenced. The value of this expression must be constant at the point of call to the function, meaning it cannot vary by the data context.
DataScopestringUsed to restrict the table reference to data according to how this data falls under the table’s effective cache policy. If used, the actual argument must be one of the Valid data scope values.

Valid data scope values

ValueDescription
hotcacheOnly data that is categorized as hot cache will be referenced.
allAll the data in the table will be referenced.
defaultThe default is all, except if it has been set to hotcache by the cluster admin.

Returns

table(T) returns:

  • Data from table T if a table named T exists.
  • Data returned by function T if a table named T doesn’t exist but a function named T exists. Function T must take no arguments and must return a tabular result.
  • A semantic error is raised if there’s no table named T and no function named T.

Examples

Use table() to access table of the current database

table('StormEvents') | count

Output

Count
59066

Use table() inside let statements

The query above can be rewritten as a query-defined function (let statement) that receives a parameter tableName - which is passed into the table() function.

let foo = (tableName:string)
{
    table(tableName) | count
};
foo('StormEvents')

Output

Count
59066

Use table() inside Functions

The same query as above can be rewritten to be used in a function that receives a parameter tableName - which is passed into the table() function.

.create function foo(tableName:string)
{
    table(tableName) | count
};

Use table() with non-constant parameter

A parameter, which isn’t a scalar constant string, can’t be passed as a parameter to the table() function.

Below, given an example of workaround for such case.

let T1 = print x=1;
let T2 = print x=2;
let _choose = (_selector:string)
{
    union
    (T1 | where _selector == 'T1'),
    (T2 | where _selector == 'T2')
};
_choose('T2')

Output

x
2

14.10 - toscalar()

Learn how to use the toscalar() function to return a scalar constant value of the evaluated expression.

Returns a scalar constant value of the evaluated expression.

This function is useful for queries that require staged calculations. For example, calculate a total count of events, and then use the result to filter groups that exceed a certain percent of all events.

Any two statements must be separated by a semicolon.

Syntax

toscalar(expression)

Parameters

NameTypeRequiredDescription
expressionstring✔️The value to convert to a scalar value.

Returns

A scalar constant value of the evaluated expression. If the result is a tabular, then the first column and first row will be taken for conversion.

Limitations

toscalar() can’t be applied on a scenario that applies the function on each row. This is because the function can only be calculated a constant number of times during the query execution. Usually, when this limitation is hit, the following error will be returned: can't use '<column name>' as it is defined outside its row-context scope.

In the following example, the query fails with the error:

let _dataset1 = datatable(x:long)[1,2,3,4,5];
let _dataset2 = datatable(x:long, y:long) [ 1, 2, 3, 4, 5, 6];
let tg = (x_: long)
{
    toscalar(_dataset2| where x == x_ | project y);
};
_dataset1
| extend y = tg(x)

This failure can be mitigated by using the join operator, as in the following example:

let _dataset1 = datatable(x: long)[1, 2, 3, 4, 5];
let _dataset2 = datatable(x: long, y: long) [1, 2, 3, 4, 5, 6];
_dataset1
| join (_dataset2) on x 
| project x, y

Output

xy
12
34
56

Examples

Evaluate Start, End, and Step as scalar constants, and use the result for range evaluation.

let Start = toscalar(print x=1);
let End = toscalar(range x from 1 to 9 step 1 | count);
let Step = toscalar(2);
range z from Start to End step Step | extend start=Start, end=End, step=Step

Output

zstartendstep
1192
3192
5192
7192
9192

The following example shows how toscalar can be used to “fix” an expression so that it will be calculated precisely once. In this case, the expression being calculated returns a different value per evaluation.

let g1 = toscalar(new_guid());
let g2 = new_guid();
range x from 1 to 2 step 1
| extend x=g1, y=g2

Output

xy
e6a15e72-756d-4c93-93d3-fe85c18d19a3c2937642-0d30-4b98-a157-a6706e217620
e6a15e72-756d-4c93-93d3-fe85c18d19a3c6a48cb3-9f98-4670-bf5b-589d0e0dcaf5

15 - Tabular operators

15.1 - Join operator

15.1.1 - join flavors

15.1.1.1 - fullouter join

Learn how to use the fullouter join flavor to merge the rows of two tables.

A fullouter join combines the effect of applying both left and right outer-joins. For columns of the table that lack a matching row, the result set contains null values. For those records that do match, a single row is produced in the result set containing fields populated from both tables.

Diagram that shows how the join works.

Syntax

LeftTable | join kind=fullouter [ Hints ] RightTable on Conditions

Returns

Schema: All columns from both tables, including the matching keys.
Rows: All records from both tables with unmatched cells populated with null.

Example

This example query combines rows from both tables X and Y, filling in missing values with NULL where there’s no match in the other table. This allows you to see all possible combinations of keys from both tables.

let X = datatable(Key:string, Value1:long)
[
    'a',1,
    'b',2,
    'b',3,
    'c',4
];
let Y = datatable(Key:string, Value2:long)
[
    'b',10,
    'c',20,
    'c',30,
    'd',40
];
X | join kind=fullouter Y on Key

Output

KeyValue1Key1Value2
b3b10
b2b10
c4c20
c4c30
d40
a1

15.1.1.2 - inner join

Learn how to use the inner join flavor to merge the rows of two tables.

The inner join flavor is like the standard inner join from the SQL world. An output record is produced whenever a record on the left side has the same join key as the record on the right side.

Diagram that shows how the join works.

Syntax

LeftTable | join kind=inner [ Hints ] RightTable on Conditions

Returns

Schema: All columns from both tables, including the matching keys.
Rows: Only matching rows from both tables.

Example

The example query combines rows from tables X and Y where the keys match, showing only the rows that exist in both tables.

let X = datatable(Key:string, Value1:long)
[
    'a',1,
    'b',2,
    'b',3,
    'k',5,
    'c',4
];
let Y = datatable(Key:string, Value2:long)
[
    'b',10,
    'c',20,
    'c',30,
    'd',40,
    'k',50
];
X | join kind=inner Y on Key

Output

KeyValue1Key1Value2
b3b10
b2b10
c4c20
c4c30
k5k50

15.1.1.3 - innerunique join

Learn how to use the innerunique join flavor to merge the rows of two tables.

The innerunique join flavor removes duplicate keys from the left side. This behavior ensures that the output contains a row for every combination of unique left and right keys.

By default, the innerunique join flavor is used if the kind parameter isn’t specified. This default implementation is useful in log/trace analysis scenarios, where you aim to correlate two events based on a shared correlation ID. It allows you to retrieve all instances of the phenomenon while disregarding duplicate trace records that contribute to the correlation.

Diagram that shows how the join works.

Syntax

LeftTable | join kind=innerunique [ Hints ] RightTable on Conditions

Returns

Schema: All columns from both tables, including the matching keys.
Rows: All deduplicated rows from the left table that match rows from the right table.

Examples

Review the examples and run them in your Data Explorer query page.

Use the default innerunique join

The example query combines rows from tables X and Y where the keys match, showing only the rows that exist in both tables

let X = datatable(Key:string, Value1:long)
[
    'a',1,
    'b',2,
    'b',3,
    'c',4
];
let Y = datatable(Key:string, Value2:long)
[
    'b',10,
    'c',20,
    'c',30,
    'd',40
];
X | join Y on Key

Output

KeyValue1Key1Value2
b2b10
c4c20
c4c30

The query executed the default join, which is an inner join after deduplicating the left side based on the join key. The deduplication keeps only the first record. The resulting left side of the join after deduplication is:

KeyValue1
a1
b2
c4

Two possible outputs from innerunique join

let t1 = datatable(key: long, value: string)  
    [
    1, "val1.1",  
    1, "val1.2"  
];
let t2 = datatable(key: long, value: string)  
    [  
    1, "val1.3",
    1, "val1.4"  
];
t1
| join kind = innerunique
    t2
    on key

Output

keyvaluekey1value1
1val1.11val1.3
1val1.11val1.4
let t1 = datatable(key: long, value: string)  
    [
    1, "val1.1",  
    1, "val1.2"  
];
let t2 = datatable(key: long, value: string)  
    [  
    1, "val1.3", 
    1, "val1.4"  
];
t1
| join kind = innerunique
    t2
    on key

Output

keyvaluekey1value1
1val1.21val1.3
1val1.21val1.4
  • Kusto is optimized to push filters that come after the join, towards the appropriate join side, left or right, when possible.
  • Sometimes, the flavor used is innerunique and the filter is propagated to the left side of the join. The flavor is automatically propagated and the keys that apply to that filter appear in the output.
  • Use the previous example and add a filter where value == "val1.2" . It gives the second result and will never give the first result for the datasets:
let t1 = datatable(key: long, value: string)  
    [
    1, "val1.1",  
    1, "val1.2"  
];
let t2 = datatable(key: long, value: string)  
    [  
    1, "val1.3", 
    1, "val1.4"  
];
t1
| join kind = innerunique
    t2
    on key
| where value == "val1.2"

Output

keyvaluekey1value1
1val1.21val1.3
1val1.21val1.4

Get extended sign-in activities

Get extended activities from a login that some entries mark as the start and end of an activity.

let Events = MyLogTable | where type=="Event" ;
Events
| where Name == "Start"
| project Name, City, ActivityId, StartTime=timestamp
| join (Events
    | where Name == "Stop"
        | project StopTime=timestamp, ActivityId)
    on ActivityId
| project City, ActivityId, StartTime, StopTime, Duration = StopTime - StartTime
let Events = MyLogTable | where type=="Event" ;
Events
| where Name == "Start"
| project Name, City, ActivityIdLeft = ActivityId, StartTime=timestamp
| join (Events
        | where Name == "Stop"
        | project StopTime=timestamp, ActivityIdRight = ActivityId)
    on $left.ActivityIdLeft == $right.ActivityIdRight
| project City, ActivityId, StartTime, StopTime, Duration = StopTime - StartTime

15.1.1.4 - leftanti join

Learn how to use the leftanti join flavor to merge the rows of two tables.

The leftanti join flavor returns all records from the left side that don’t match any record from the right side. The anti join models the “NOT IN” query.

Diagram that shows how the join works.

Syntax

LeftTable | join kind=leftanti [ Hints ] RightTable on Conditions

Returns

Schema: All columns from the left table.
Rows: All records from the left table that don’t match records from the right table.

Example

The example query combines rows from tables X and Y where there is no match in Y for the keys in X, effectively filtering out any rows in X that have corresponding rows in Y.

let X = datatable(Key:string, Value1:long)
[
    'a',1,
    'b',2,
    'b',3,
    'c',4
];
let Y = datatable(Key:string, Value2:long)
[
    'b',10,
    'c',20,
    'c',30,
    'd',40
];
X | join kind=leftanti Y on Key

Output

KeyValue1
a1

15.1.1.5 - leftouter join

Learn how to use the leftouter join flavor to merge the rows of two tables.

The leftouter join flavor returns all the records from the left side table and only matching records from the right side table.

Diagram that shows how the join works.

Syntax

LeftTable | join kind=leftouter [ Hints ] RightTable on Conditions

Returns

Schema: All columns from both tables, including the matching keys.
Rows: All records from the left table and only matching rows from the right table.

Example

The result of a left outer join for tables X and Y always contains all records of the left table (X), even if the join condition doesn’t find any matching record in the right table (Y).

let X = datatable(Key:string, Value1:long)
[
    'a',1,
    'b',2,
    'b',3,
    'c',4
];
let Y = datatable(Key:string, Value2:long)
[
    'b',10,
    'c',20,
    'c',30,
    'd',40
];
X | join kind=leftouter Y on Key

Output

KeyValue1Key1Value2
a1
b2b10
b3b10
c4c20
c4c30

15.1.1.6 - leftsemi join

Learn how to use the leftsemi join flavor to merge the rows of two tables.

The leftsemi join flavor returns all records from the left side that match a record from the right side. Only columns from the left side are returned.

Diagram that shows how the join works.

Syntax

LeftTable | join kind=leftsemi [ Hints ] RightTable on Conditions

Returns

Schema: All columns from the left table.
Rows: All records from the left table that match records from the right table.

Example

This query filters and returns only those rows from table X that have a matching key in table Y.

let X = datatable(Key:string, Value1:long)
[
    'a',1,
    'b',2,
    'b',3,
    'c',4
];
let Y = datatable(Key:string, Value2:long)
[
    'b',10,
    'c',20,
    'c',30,
    'd',40
];
X | join kind=leftsemi Y on Key

Output

KeyValue1
b2
b3
c4

15.1.1.7 - rightanti join

Learn how to use the rightanti join flavor to merge the rows of two tables.

The rightanti join flavor returns all records from the right side that don’t match any record from the left side. The anti join models the “NOT IN” query.

Diagram that shows how the join works.

Syntax

LeftTable | join kind=rightanti [ Hints ] RightTable on Conditions

Returns

Schema: All columns from the right table.
Rows: All records from the right table that don’t match records from the left table.

Example

This query filters and returns only those rows from table Y that do not have a matching key in table X.

let X = datatable(Key:string, Value1:long)
[
    'a',1,
    'b',2,
    'b',3,
    'c',4
];
let Y = datatable(Key:string, Value2:long)
[
    'b',10,
    'c',20,
    'c',30,
    'd',40
];
X | join kind=rightanti Y on Key

Output

KeyValue1
d40

15.1.1.8 - rightouter join

Learn how to use the rightouter join flavor to merge the rows of two tables.

The rightouter join flavor returns all the records from the right side and only matching records from the left side. This join flavor resembles the leftouter join flavor, but the treatment of the tables is reversed.

Diagram that shows how the join works.

Syntax

LeftTable | join kind=rightouter [ Hints ] RightTable on Conditions

Returns

Schema: All columns from both tables, including the matching keys.
Rows: All records from the right table and only matching rows from the left table.

Example

This query returns all rows from table Y and any matching rows from table X, filling in NULL values where there is no match from X.

let X = datatable(Key:string, Value1:long)
[
    'a',1,
    'b',2,
    'b',3,
    'c',4
];
let Y = datatable(Key:string, Value2:long)
[
    'b',10,
    'c',20,
    'c',30,
    'd',40
];
X | join kind=rightouter Y on Key

Output

KeyValue1Key1Value2
b2b10
b3b10
c4c20
c4c30
d40

15.1.1.9 - rightsemi join

Learn how to use the rightsemi join flavor to merge the rows of two tables.

The rightsemi join flavor returns all records from the right side that match a record from the left side. Only columns from the right side are returned.

Diagram that shows how the join works.

Syntax

LeftTable | join kind=rightsemi [ Hints ] RightTable on Conditions

Returns

Schema: All columns from the right table.
Rows: All records from the right table that match records from the left table.

Example

This query filters and returns only those rows from table Y that have a matching key in table X.

let X = datatable(Key:string, Value1:long)
[
    'a',1,
    'b',2,
    'b',3,
    'c',4
];
let Y = datatable(Key:string, Value2:long)
[
    'b',10,
    'c',20,
    'c',30,
    'd',40
];
X | join kind=rightsemi Y on Key

Output

KeyValue2
b10
c20
c30

15.1.2 - Broadcast join

Learn how to use the broadcast join execution strategy to distribute the join over nodes.

Today, regular joins are executed on a cluster single node. Broadcast join is an execution strategy of join that distributes the join over cluster nodes. This strategy is useful when the left side of the join is small (up to several tens of MBs). In this case, a broadcast join is more performant than a regular join.

Today, regular joins are executed on an Eventhouse single node. Broadcast join is an execution strategy of join that distributes the join over Eventhouse nodes. This strategy is useful when the left side of the join is small (up to several tens of MBs). In this case, a broadcast join is more performant than a regular join.

Use the lookup operator if the right side is smaller than the left side. The lookup operator runs in broadcast strategy by default when the right side is smaller than the left.

If left side of the join is a small dataset, then you may run join in broadcast mode using the following syntax (hint.strategy = broadcast):

leftSide 
| join hint.strategy = broadcast (factTable) on key

The performance improvement is more noticeable in scenarios where the join is followed by other operators such as summarize. See the following query for example:

leftSide 
| join hint.strategy = broadcast (factTable) on Key
| summarize dcount(Messages) by Timestamp, Key

15.1.3 - Cross-cluster join

Learn how to perform the Cross-cluster join operation to join datasets residing on different clusters.

A cross-cluster join involves joining data from datasets that reside in different clusters.

In a cross-cluster join, the query can be executed in three possible locations, each with a specific designation for reference throughout this document:

  • Local cluster: The cluster to which the request is sent, which is also known as the cluster hosting the database in context.
  • Left cluster: The cluster hosting the data on the left side of the join operation.
  • Right cluster: The cluster hosting the data on the right side of the join operation.

The cluster that runs the query fetches the data from the other cluster.

Syntax

[ cluster(ClusterName).database(DatabaseName).]LeftTable |
| join [ hint.remote=Strategy ] (
  [ cluster(ClusterName).database(DatabaseName).]RightTable |
) on Conditions

Parameters

NameTypeRequiredDescription
LeftTablestring✔️The left table or tabular expression whose rows are to be merged. Denoted as $left.
StrategystringDetermines the cluster on which to execute the join. Supported values are: left, right, local, and auto. For more information, see Strategies.
ClusterNamestringIf the data for the join resides outside of the local cluster, use the cluster() function to specify the cluster.
DatabaseNamestringIf the data for the join resides outside of the local database context, use the database() function to specify the database.
RightTablestring✔️The right table or tabular expression whose rows are to be merged. Denoted as $right.
Conditionsstring✔️Determines how rows from LeftTable are matched with rows from RightTable. If the columns you want to match have the same name in both tables, use the syntax ON ColumnName. Otherwise, use the syntax ON $left.LeftColumn == $right.RightColumn. To specify multiple conditions, you can either use the “and” keyword or separate them with commas. If you use commas, the conditions are evaluated using the “and” logical operator.

Strategies

The following list explains the supported values for the Strategy parameter:

  • left: Execute join on the cluster of the left table, or left cluster.
  • right: Execute join on the cluster of the right table, or right cluster.
  • local: Execute join on the cluster of the current cluster, or local cluster.
  • auto: (Default) Kusto makes the remoting decision.

How the auto strategy works

By default, the auto strategy determines where the cross-cluster join is executed based on the following rules:

  • If one of the tables is hosted in the local cluster, then the join is performed on the local cluster. For example, with the auto strategy, this query is executed on the local cluster:

    T | ... | join (cluster("B").database("DB").T2 | ...) on Col1
    
  • If both tables are hosted outside of the local cluster, then join is performed on the right cluster. For example, assuming neither cluster is the local cluster, the join would be executed on the right cluster:

    cluster("B").database("DB").T | ... | join (cluster("C").database("DB2").T2 | ...) on Col1
    

Performance considerations

For optimal performance, we recommend running the query on the cluster that contains the largest table.

In the following example, if the dataset produced by T | ... is smaller than one produced by cluster("B").database("DB").T2 | ... then it would be more efficient to execute the join operation on cluster B, in this case the right cluster instead of on the local cluster.

T | ... | join (cluster("B").database("DB").T2 | ...) on Col1

You can rewrite the query to use hint.remote=right to optimize the performance. In this way, the join operation is performed on the right cluster, even if left table is in the local cluster.

T | ... | join hint.remote=right (cluster("B").database("DB").T2 | ...) on Col1

15.1.4 - join operator

Learn how to use the join operator to merge the rows of two tables.

Merge the rows of two tables to form a new table by matching values of the specified columns from each table.

Kusto Query Language (KQL) offers many kinds of joins that each affect the schema and rows in the resultant table in different ways. For example, if you use an inner join, the table has the same columns as the left table, plus the columns from the right table. For best performance, if one table is always smaller than the other, use it as the left side of the join operator.

The following image provides a visual representation of the operation performed by each join. The color of the shading represents the columns returned, and the areas shaded represent the rows returned.

Diagram showing query join kinds.

Syntax

LeftTable | join [ kind = JoinFlavor ] [ Hints ] (RightTable) on Conditions

Parameters

NameTypeRequiredDescription
LeftTablestring✔️The left table or tabular expression, sometimes called the outer table, whose rows are to be merged. Denoted as $left.
JoinFlavorstringThe type of join to perform: innerunique, inner, leftouter, rightouter, fullouter, leftanti, rightanti, leftsemi, rightsemi. The default is innerunique. For more information about join flavors, see Returns.
HintsstringZero or more space-separated join hints in the form of Name = Value that control the behavior of the row-match operation and execution plan. For more information, see Hints.
RightTablestring✔️The right table or tabular expression, sometimes called the inner table, whose rows are to be merged. Denoted as $right.
Conditionsstring✔️Determines how rows from LeftTable are matched with rows from RightTable. If the columns you want to match have the same name in both tables, use the syntax ON ColumnName. Otherwise, use the syntax ON $left.LeftColumn == $right.RightColumn. To specify multiple conditions, you can either use the “and” keyword or separate them with commas. If you use commas, the conditions are evaluated using the “and” logical operator.

Hints

Hint keyValuesDescription
hint.remoteauto, left, local, rightSee Cross-Cluster Join
hint.strategy=broadcastSpecifies the way to share the query load on cluster nodes.See broadcast join
hint.shufflekey=<key>The shufflekey query shares the query load on cluster nodes, using a key to partition data.See shuffle query
hint.strategy=shuffleThe shuffle strategy query shares the query load on cluster nodes, where each node processes one partition of the data.See shuffle query
NameValuesDescription
hint.remoteauto, left, local, right
hint.strategy=broadcastSpecifies the way to share the query load on cluster nodes.See broadcast join
hint.shufflekey=<key>The shufflekey query shares the query load on cluster nodes, using a key to partition data.See shuffle query
hint.strategy=shuffleThe shuffle strategy query shares the query load on cluster nodes, where each node processes one partition of the data.See shuffle query

Returns

The return schema and rows depend on the join flavor. The join flavor is specified with the kind keyword. The following table shows the supported join flavors. To see examples for a specific join flavor, select the link in the Join flavor column.

Join flavorReturnsIllustration
innerunique (default)Inner join with left side deduplication
Schema: All columns from both tables, including the matching keys
Rows: All deduplicated rows from the left table that match rows from the right table
:::image type=“icon” source=“media/joinoperator/join-innerunique.png” border=“false”:::
innerStandard inner join
Schema: All columns from both tables, including the matching keys
Rows: Only matching rows from both tables
:::image type=“icon” source=“media/joinoperator/join-inner.png” border=“false”:::
leftouterLeft outer join
Schema: All columns from both tables, including the matching keys
Rows: All records from the left table and only matching rows from the right table
:::image type=“icon” source=“media/joinoperator/join-leftouter.png” border=“false”:::
rightouterRight outer join
Schema: All columns from both tables, including the matching keys
Rows: All records from the right table and only matching rows from the left table
:::image type=“icon” source=“media/joinoperator/join-rightouter.png” border=“false”:::
fullouterFull outer join
Schema: All columns from both tables, including the matching keys
Rows: All records from both tables with unmatched cells populated with null
:::image type=“icon” source=“media/joinoperator/join-fullouter.png” border=“false”:::
leftsemiLeft semi join
Schema: All columns from the left table
Rows: All records from the left table that match records from the right table
:::image type=“icon” source=“media/joinoperator/join-leftsemi.png” border=“false”:::
leftanti, anti, leftantisemiLeft anti join and semi variant
Schema: All columns from the left table
Rows: All records from the left table that don’t match records from the right table
:::image type=“icon” source=“media/joinoperator/join-leftanti.png” border=“false”:::
rightsemiRight semi join
Schema: All columns from the right table
Rows: All records from the right table that match records from the left table
:::image type=“icon” source=“media/joinoperator/join-rightsemi.png” border=“false”:::
rightanti, rightantisemiRight anti join and semi variant
Schema: All columns from the right table
Rows: All records from the right table that don’t match records from the left table
:::image type=“icon” source=“media/joinoperator/join-rightanti.png” border=“false”:::

Cross-join

KQL doesn’t provide a cross-join flavor. However, you can achieve a cross-join effect by using a placeholder key approach.

In the following example, a placeholder key is added to both tables and then used for the inner join operation, effectively achieving a cross-join-like behavior:

X | extend placeholder=1 | join kind=inner (Y | extend placeholder=1) on placeholder

15.1.5 - Joining within time window

Learn how to perform a time window join operation to match between two large datasets.

It’s often useful to join between two large datasets on some high-cardinality key, such as an operation ID or a session ID, and further limit the right-hand-side ($right) records that need to match up with each left-hand-side ($left) record by adding a restriction on the “time-distance” between datetime columns on the left and on the right.

The above operation differs from the usual join operation, since for the equi-join part of matching the high-cardinality key between the left and right datasets, the system can also apply a distance function and use it to considerably speed up the join.

Example to identify event sequences without time window

To identify event sequences within a relatively small time window, this example uses a table T with the following schema:

  • SessionId: A column of type string with correlation IDs.
  • EventType: A column of type string that identifies the event type of the record.
  • Timestamp: A column of type datetime indicates when the event described by the record happened.
SessionIdEventTypeTimestamp
0A2017-10-01T00:00:00Z
0B2017-10-01T00:01:00Z
1B2017-10-01T00:02:00Z
1A2017-10-01T00:03:00Z
3A2017-10-01T00:04:00Z
3B2017-10-01T00:10:00Z

The following query creates the dataset and then identifies all the session IDs in which event type A was followed by an event type B within a 1min time window.

let T = datatable(SessionId:string, EventType:string, Timestamp:datetime)
[
    '0', 'A', datetime(2017-10-01 00:00:00),
    '0', 'B', datetime(2017-10-01 00:01:00),
    '1', 'B', datetime(2017-10-01 00:02:00),
    '1', 'A', datetime(2017-10-01 00:03:00),
    '3', 'A', datetime(2017-10-01 00:04:00),
    '3', 'B', datetime(2017-10-01 00:10:00),
];
T
| where EventType == 'A'
| project SessionId, Start=Timestamp
| join kind=inner
    (
    T 
    | where EventType == 'B'
    | project SessionId, End=Timestamp
    ) on SessionId
| where (End - Start) between (0min .. 1min)
| project SessionId, Start, End 

Output

SessionIdStartEnd
02017-10-01 00:00:00.00000002017-10-01 00:01:00.0000000

Example optimized with time window

To optimize this query, we can rewrite it to account for the time window. THe time window is expressed as a join key. Rewrite the query so that the datetime values are “discretized” into buckets whose size is half the size of the time window. Use equi-join to compare the bucket IDs.

The query finds pairs of events within the same session (SessionId) where an ‘A’ event is followed by a ‘B’ event within 1 minute. It projects the session ID, the start time of the ‘A’ event, and the end time of the ‘B’ event.

let T = datatable(SessionId:string, EventType:string, Timestamp:datetime)
[
    '0', 'A', datetime(2017-10-01 00:00:00),
    '0', 'B', datetime(2017-10-01 00:01:00),
    '1', 'B', datetime(2017-10-01 00:02:00),
    '1', 'A', datetime(2017-10-01 00:03:00),
    '3', 'A', datetime(2017-10-01 00:04:00),
    '3', 'B', datetime(2017-10-01 00:10:00),
];
let lookupWindow = 1min;
let lookupBin = lookupWindow / 2.0;
T 
| where EventType == 'A'
| project SessionId, Start=Timestamp, TimeKey = bin(Timestamp, lookupBin)
| join kind=inner
    (
    T 
    | where EventType == 'B'
    | project SessionId, End=Timestamp,
              TimeKey = range(bin(Timestamp-lookupWindow, lookupBin),
                              bin(Timestamp, lookupBin),
                              lookupBin)
    | mv-expand TimeKey to typeof(datetime)
    ) on SessionId, TimeKey 
| where (End - Start) between (0min .. lookupWindow)
| project SessionId, Start, End 

Output

SessionIdStartEnd
02017-10-01 00:00:00.00000002017-10-01 00:01:00.0000000

5 million data query

The next query emulates an extensive dataset of 5M records and approximately 1M Session IDs and runs the query with the time window technique.

let T = range x from 1 to 5000000 step 1
| extend SessionId = rand(1000000), EventType = rand(3), Time=datetime(2017-01-01)+(x * 10ms)
| extend EventType = case(EventType < 1, "A",
                          EventType < 2, "B",
                          "C");
let lookupWindow = 1min;
let lookupBin = lookupWindow / 2.0;
T 
| where EventType == 'A'
| project SessionId, Start=Time, TimeKey = bin(Time, lookupBin)
| join kind=inner
    (
    T 
    | where EventType == 'B'
    | project SessionId, End=Time, 
              TimeKey = range(bin(Time-lookupWindow, lookupBin), 
                              bin(Time, lookupBin),
                              lookupBin)
    | mv-expand TimeKey to typeof(datetime)
    ) on SessionId, TimeKey 
| where (End - Start) between (0min .. lookupWindow)
| project SessionId, Start, End 
| count 

Output

Count
3344

15.2 - Render operator

15.2.1 - visualizations

15.2.1.1 - Anomaly chart visualization

This article describes the anomaly chart visualization.

The anomaly chart visualization is similar to a timechart, but highlights anomalies using the series_decompose_anomalies function.

Syntax

T | render anomalychart [with ( propertyName = propertyValue [, …])]

Parameters

NameTypeRequiredDescription
Tstring✔️Input table name.
propertyName, propertyValuestringA comma-separated list of key-value property pairs. See supported properties.

Supported properties

All properties are optional.

PropertyNamePropertyValue
accumulateWhether the value of each measure gets added to all its predecessors. (true or false)
legendWhether to display a legend or not (visible or hidden).
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
yminThe minimum value to be displayed on Y-axis.
ymaxThe maximum value to be displayed on Y-axis.
titleThe title of the visualization (of type string).
xaxisHow to scale the x-axis (linear or log).
xcolumnWhich column in the result is used for the x-axis.
xtitleThe title of the x-axis (of type string).
yaxisHow to scale the y-axis (linear or log).
ycolumnsComma-delimited list of columns that consist of the values provided per value of the x column.
ysplitHow to split the visualization into multiple y-axis values. For more information, see Multiple y-axes.
ytitleThe title of the y-axis (of type string).
anomalycolumnsComma-delimited list of columns, which will be considered as anomaly series and displayed as points on the chart

ysplit property

This visualization supports splitting into multiple y-axis values. The supported values of this property are:

ysplitDescription
noneA single y-axis is displayed for all series data. (Default)
axesA single chart is displayed with multiple y-axes (one per series).
panelsOne chart is rendered for each ycolumn value. Maximum five panels.

Example

The example in this section shows how to use the syntax to help you get started.

let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t step dt by sid 
| where sid == 'TS1'   //  select a single time series for a cleaner visualization
| extend (anomalies, score, baseline) = series_decompose_anomalies(num, 1.5, -1, 'linefit')
| render anomalychart with(anomalycolumns=anomalies, title='Web app. traffic of a month, anomalies') //use "| render anomalychart with anomalycolumns=anomalies" to render the anomalies as bold points on the series charts.

Screenshot of anomaly chart output.

15.2.1.2 - Area chart visualization

This article describes the area chart visualization.

The area chart visual shows a time-series relationship. The first column of the query should be numeric and is used as the x-axis. Other numeric columns are the y-axes. Unlike line charts, area charts also visually represent volume. Area charts are ideal for indicating the change among different datasets.

Syntax

T | render areachart [with (propertyName = propertyValue [, …])]

Parameters

NameTypeRequiredDescription
Tstring✔️Input table name.
propertyName, propertyValuestringA comma-separated list of key-value property pairs. See supported properties.

Supported properties

All properties are optional.

PropertyNamePropertyValue
accumulateWhether the value of each measure gets added to all its predecessors. (true or false)
kindFurther elaboration of the visualization kind. For more information, see kind property.
legendWhether to display a legend or not (visible or hidden).
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
yminThe minimum value to be displayed on Y-axis.
ymaxThe maximum value to be displayed on Y-axis.
titleThe title of the visualization (of type string).
xaxisHow to scale the x-axis (linear or log).
xcolumnWhich column in the result is used for the x-axis.
xtitleThe title of the x-axis (of type string).
yaxisHow to scale the y-axis (linear or log).
ycolumnsComma-delimited list of columns that consist of the values provided per value of the x column.
ysplitHow to split the y-axis values for multiple visualizations.
ytitleThe title of the y-axis (of type string).

ysplit property

This visualization supports splitting into multiple y-axis values:

ysplitDescription
noneA single y-axis is displayed for all series data. (Default)
axesA single chart is displayed with multiple y-axes (one per series).
panelsOne chart is rendered for each ycolumn value. Maximum five panels.

Supported properties

All properties are optional.

PropertyNamePropertyValue
kindFurther elaboration of the visualization kind. For more information, see kind property.
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
titleThe title of the visualization (of type string).

kind property

This visualization can be further elaborated by providing the kind property. The supported values of this property are:

kind valueDescription
defaultEach “area” stands on its own.
unstackedSame as default.
stackedStack “areas” to the right.
stacked100Stack “areas” to the right and stretch each one to the same width as the others.

Examples

The example in this section shows how to use the syntax to help you get started.

Simple area chart

The following example shows a basic area chart visualization.

demo_series3
| render areachart

Screenshot of area chart visualization.

Area chart using properties

The following example shows an area chart using multiple property settings.

OccupancyDetection
| summarize avg_temp= avg(Temperature), avg_humidity= avg(Humidity) by bin(Timestamp, 1h)
| render areachart
    with ( 
        kind = unstacked,
        legend = visible,
        ytitle ="Sample value",
        ymin = 10,
        ymax =100,
        xtitle = "Time",    
        title ="Humidity and temperature"
    )

Screenshot of area chart visualization with properties.

Area chart using split panels

The following example shows an area chart using split panels. In this example, the ysplit property is set to panels.

StormEvents
| where State in ("TEXAS", "NEBRASKA", "KANSAS") and EventType == "Hail"
| summarize count=count() by State, bin(StartTime, 1d)
| render areachart
    with (
        ysplit= panels,
        legend = visible,
        ycolumns=count,
        yaxis =log,
        ytitle ="Count",
        ymin = 0,
        ymax =100,
        xaxis = linear,
        xcolumn = StartTime,
        xtitle = "Date",    
        title ="Hail events"
    )

Screenshot of area chart visualization with split panels.

15.2.1.3 - Bar chart visualization

This article describes the bar chart visualization.

The bar chart visual needs a minimum of two columns in the query result. By default, the first column is used as the y-axis. This column can contain text, datetime, or numeric data types. The other columns are used as the x-axis and contain numeric data types to be displayed as horizontal lines. Bar charts are used mainly for comparing numeric and nominal discrete values, where the length of each line represents its value.

Syntax

T | render barchart [with (propertyName = propertyValue [, …])]

Parameters

NameTypeRequiredDescription
Tstring✔️Input table name.
propertyName, propertyValuestringA comma-separated list of key-value property pairs. See supported properties.

Supported properties

All properties are optional.

PropertyNamePropertyValue
accumulateWhether the value of each measure gets added to all its predecessors (true or false).
kindFurther elaboration of the visualization kind. For more information, see kind property.
legendWhether to display a legend or not (visible or hidden).
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
yminThe minimum value to be displayed on Y-axis.
ymaxThe maximum value to be displayed on Y-axis.
titleThe title of the visualization (of type string).
xaxisHow to scale the x-axis (linear or log).
xcolumnWhich column in the result is used for the x-axis.
xtitleThe title of the x-axis (of type string).
yaxisHow to scale the y-axis (linear or log).
ycolumnsComma-delimited list of columns that consist of the values provided per value of the x column.
ytitleThe title of the y-axis (of type string).
ysplitHow to split the visualization into multiple y-axis values. For more information, see ysplit property.

ysplit property

This visualization supports splitting into multiple y-axis values:

ysplitDescription
noneA single y-axis is displayed for all series data. This is the default.
axesA single chart is displayed with multiple y-axes (one per series).
panelsOne chart is rendered for each ycolumn value. Maximum five panels.

Supported properties

All properties are optional.

PropertyNamePropertyValue
kindFurther elaboration of the visualization kind. For more information, see kind property.
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
titleThe title of the visualization (of type string).

kind property

This visualization can be further elaborated by providing the kind property. The supported values of this property are:

kind valueDescription
defaultEach “bar” stands on its own.
unstackedSame as default.
stackedStack “bars”.
stacked100Stack “bars” and stretch each one to the same width as the others.

Examples

The example in this section shows how to use the syntax to help you get started.

Render a bar chart

The following query creates a bar chart displaying the number of storm events for each state, filtering only those states with more than 10 events. The chart provides a visual representation of the event distribution across different states.

StormEvents
| summarize event_count=count() by State
| project State, event_count
| render barchart
    with (
    title="Storm count by state",
    ytitle="Storm count",
    xtitle="State",
    legend=hidden
    )

Screenshot of a labeled bar chart.

Render a stacked bar chart

The following query creates a stacked bar chart that shows the total count of storm events by their type for selected states of Texas, California, and Florida. Each bar represents a storm event type, and the stacked bars show the breakdown of storm events by state within each type.

StormEvents
| where State in ("TEXAS", "CALIFORNIA", "FLORIDA")
| summarize EventCount = count() by EventType, State
| order by EventType asc, State desc
| render barchart with (kind=stacked)

Scrrenshot of a stacked bar chart visualization.

Render a stacked100 bar chart

The following query creates a stacked100 bar chart that shows the total count of storm events by their type for selected states of Texas, California, and Florida. The chart shows the distribution of storm events across states within each type. Although the stacks visually sum up to 100, the values actually represent the number of events, not percentages. This visualization is helpful for understanding both the percentages and the actual event counts.

StormEvents
| where State in ("TEXAS", "CALIFORNIA", "FLORIDA")
| summarize EventCount = count() by EventType, State
| order by EventType asc, State desc
| render barchart with (kind=stacked100)

Screenshot of a stacked 100 bar chart visualization.

Use the ysplit property

The following query provides a daily summary of storm-related injuries and deaths, visualized as a bar chart with split axes/panels for better comparison.

StormEvents
| summarize
    TotalInjuries = sum(InjuriesDirect) + sum(InjuriesIndirect),
    TotalDeaths = sum(DeathsDirect) + sum(DeathsIndirect)
    by bin(StartTime, 1d)
| project StartTime, TotalInjuries, TotalDeaths
| render barchart with (ysplit=axes)

Screenshot of column chart using ysplit axes property.

To split the view into separate panels, specify panels instead of axes:

StormEvents
| summarize
    TotalInjuries = sum(InjuriesDirect) + sum(InjuriesIndirect),
    TotalDeaths = sum(DeathsDirect) + sum(DeathsIndirect)
    by bin(StartTime, 1d)
| project StartTime, TotalInjuries, TotalDeaths
| render barchart with (ysplit=panels)

Screenshot of column chart using ysplit panels property.

15.2.1.4 - Card visualization

This article describes the card visualization.

The card visual only shows one element. If there are multiple columns and rows in the output, the first result record is treated as set of scalar values and shows as a card.

Syntax

T | render card [with (propertyName = propertyValue [, …])]

Parameters

NameTypeRequiredDescription
Tstring✔️Input table name.
propertyName, propertyValuestringA comma-separated list of key-value property pairs. See supported properties.

Supported properties

All properties are optional.

PropertyNamePropertyValue
titleThe title of the visualization (of type string).

Example

This query provides a count of flood events in Virginia and displays the result in a card format.

StormEvents
| where State=="VIRGINIA" and EventType=="Flood"
| count
| render card with (title="Floods in Virginia")

Screenshot of card visual.

15.2.1.5 - Column chart visualization

This article describes the column chart visualization.

The column chart visual needs a minimum of two columns in the query result. By default, the first column is used as the x-axis. This column can contain text, datetime, or numeric data types. The other columns are used as the y-axis and contain numeric data types to be displayed as vertical lines. Column charts are used for comparing specific sub category items in a main category range, where the length of each line represents its value.

Syntax

T | render columnchart [with (propertyName = propertyValue [, …])]

Parameters

NameTypeRequiredDescription
Tstring✔️Input table name.
propertyName, propertyValuestringA comma-separated list of key-value property pairs. See supported properties.

Supported properties

All properties are optional.

PropertyNamePropertyValue
accumulateWhether the value of each measure gets added to all its predecessors. (true or false)
kindFurther elaboration of the visualization kind. For more information, see kind property.
legendWhether to display a legend or not (visible or hidden).
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
yminThe minimum value to be displayed on Y-axis.
ymaxThe maximum value to be displayed on Y-axis.
titleThe title of the visualization (of type string).
xaxisHow to scale the x-axis (linear or log).
xcolumnWhich column in the result is used for the x-axis.
xtitleThe title of the x-axis (of type string).
yaxisHow to scale the y-axis (linear or log).
ycolumnsComma-delimited list of columns that consist of the values provided per value of the x column.
ytitleThe title of the y-axis (of type string).
ysplitHow to split the visualization into multiple y-axis values. For more information, see ysplit property.

ysplit property

This visualization supports splitting into multiple y-axis values:

ysplitDescription
noneA single y-axis is displayed for all series data. This is the default.
axesA single chart is displayed with multiple y-axes (one per series).
panelsOne chart is rendered for each ycolumn value. Maximum five panels.

Supported properties

All properties are optional.

PropertyNamePropertyValue
kindFurther elaboration of the visualization kind. For more information, see kind property.
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
titleThe title of the visualization (of type string).

kind property

This visualization can be further elaborated by providing the kind property. The supported values of this property are:

kind valueDefinition
defaultEach “column” stands on its own.
unstackedSame as default.
stackedStack “columns” one atop the other.
stacked100Stack “columns” and stretch each one to the same height as the others.

Examples

The example in this section shows how to use the syntax to help you get started.

Render a column chart

This query provides a visual representation of states with a high frequency of storm events, specifically those with more than 10 events, using a column chart.

StormEvents
| summarize event_count=count() by State
| where event_count > 10
| project State, event_count
| render columnchart

Screenshot of column chart visualization.

Use the ysplit property

This query provides a daily summary of storm-related injuries and deaths, visualized as a column chart with split axes/panels for better comparison.

StormEvents
| summarize
    TotalInjuries = sum(InjuriesDirect) + sum(InjuriesIndirect),
    TotalDeaths = sum(DeathsDirect) + sum(DeathsIndirect)
    by bin(StartTime, 1d)
| project StartTime, TotalInjuries, TotalDeaths
| render columnchart with (ysplit=axes)

Screenshot of column chart using ysplit axes property.

To split the view into separate panels, specify panels instead of axes:

StormEvents
| summarize
    TotalInjuries = sum(InjuriesDirect) + sum(InjuriesIndirect),
    TotalDeaths = sum(DeathsDirect) + sum(DeathsIndirect)
    by bin(StartTime, 1d)
| project StartTime, TotalInjuries, TotalDeaths
| render columnchart with (ysplit=panels)

Screenshot of column chart using ysplit panels property.

Example

This query helps you identify states with a significant number of storm events and presents the information in a clear, visual format.

StormEvents
| summarize event_count=count() by State
| where event_count > 10
| project State, event_count
| render columnchart

Screenshot of column chart visualization.

15.2.1.6 - Ladder chart visualization

This article describes the ladder chart visualization.

The last two columns are the x-axis, and the other columns are the y-axis.

Syntax

T | render ladderchart [with (propertyName = propertyValue [, …])]

Parameters

NameTypeRequiredDescription
Tstring✔️Input table name
propertyName, propertyValuestringA comma-separated list of key-value property pairs. See supported properties.

Supported properties

All properties are optional.

PropertyNamePropertyValue
accumulateWhether the value of each measure gets added to all its predecessors. (true or false)
legendWhether to display a legend or not (visible or hidden).
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
yminThe minimum value to be displayed on Y-axis.
ymaxThe maximum value to be displayed on Y-axis.
titleThe title of the visualization (of type string).
xaxisHow to scale the x-axis (linear or log).
xcolumnWhich column in the result is used for the x-axis.
xtitleThe title of the x-axis (of type string).
yaxisHow to scale the y-axis (linear or log).
ycolumnsComma-delimited list of columns that consist of the values provided per value of the x column.
ytitleThe title of the y-axis (of type string).

Examples

The example in this section shows how to use the syntax to help you get started.

The examples in this article use publicly available tables in the help cluster, such as the StormEvents table in the Samples database.

Dates of storms by state

This query outputs a state-wise visualization of the duration of rain-related storm events, displayed as a ladder chart to help you analyze the temporal distribution of these events.

StormEvents
| where EventType  has "rain"
| summarize min(StartTime), max(EndTime) by State
| render ladderchart

Screenshot of ladderchart showing dates of storms by state.

Dates of storms by event type

This query outputs a visualization of the duration of various storm events in Washington, displayed as a ladder chart to help you analyze the temporal distribution of these events by type.

StormEvents
| where State == "WASHINGTON"
| summarize min(StartTime), max(EndTime) by EventType
| render ladderchart

Screenshot of ladderchart showing dates of storms by event type.

Dates of storms by state and event type

This query outputs a visualization of the duration of various storm events in states starting with “W”, displayed as a ladder chart to help you analyze the temporal distribution of these events by state and event type.

StormEvents
| where State startswith "W"
| summarize min(StartTime), max(EndTime) by State, EventType
| render ladderchart with (series=State, EventType)

Screenshot of ladderchart showing dates of storms by state and event type.

15.2.1.7 - Line chart visualization

This article describes the line chart visualization.

The line chart visual is the most basic type of chart. The first column of the query should be numeric and is used as the x-axis. Other numeric columns are the y-axes. Line charts track changes over short and long periods of time. When smaller changes exist, line graphs are more useful than bar graphs.

Syntax

T | render linechart [with ( propertyName = propertyValue [, …] )]

Parameters

NameTypeRequiredDescription
Tstring✔️Input table name.
propertyName, propertyValuestringA comma-separated list of key-value property pairs. See supported properties.

Supported properties

All properties are optional.

PropertyNamePropertyValue
accumulateWhether the value of each measure gets added to all its predecessors (true or false).
legendWhether to display a legend or not (visible or hidden).
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
yminThe minimum value to be displayed on Y-axis.
ymaxThe maximum value to be displayed on Y-axis.
titleThe title of the visualization (of type string).
xaxisHow to scale the x-axis (linear or log).
xcolumnWhich column in the result is used for the x-axis.
xtitleThe title of the x-axis (of type string).
yaxisHow to scale the y-axis (linear or log).
ycolumnsComma-delimited list of columns that consist of the values provided per value of the x column.
ysplitHow to split the visualization into multiple y-axis values. For more information, see ysplit property.
ytitleThe title of the y-axis (of type string).

ysplit property

This visualization supports splitting into multiple y-axis values:

ysplitDescription
noneA single y-axis is displayed for all series data. (Default)
axesA single chart is displayed with multiple y-axes (one per series).
panelsOne chart is rendered for each ycolumn value. Maximum five panels.

Examples

The example in this section shows how to use the syntax to help you get started.

Render a line chart

This query retrieves storm events in Virginia, focusing on the start time and property damage, and then displays this information in a line chart.

StormEvents
| where State=="VIRGINIA"
| project StartTime, DamageProperty
| render linechart 

Screenshot of line chart visualization output.

Label a line chart

This query retrieves storm events in Virginia, focusing on the start time and property damage, and then displays this information in a line chart with specified titles for better clarity and presentation.

StormEvents
| where State=="VIRGINIA"
| project StartTime, DamageProperty
| render linechart
    with (
    title="Property damage from storms in Virginia",
    xtitle="Start time of storm",
    ytitle="Property damage"
    )

Screenshot of line chart with labels.

Limit values displayed on the y-axis

This query retrieves storm events in Virginia, focusing on the start time and property damage, and then displays this information in a line chart with specified y-axis limits for better visualization of the data.

StormEvents
| where State=="VIRGINIA"
| project StartTime, DamageProperty
| render linechart with (ymin=7000, ymax=300000)

Screenshot of line chart with limitations on y-axis values.

View multiple y-axes

This query retrieves hail events in Texas, Nebraska, and Kansas. It counts the number of hail events per day for each state, and then displays this information in a line chart with separate panels for each state.

StormEvents
| where State in ("TEXAS", "NEBRASKA", "KANSAS") and EventType == "Hail"
| summarize count() by State, bin(StartTime, 1d)
| render linechart with (ysplit=panels)

Screenshot of the time chart query result with the ysplit panels property.

15.2.1.8 - Pie chart visualization

This article describes the pie chart visualization.

The pie chart visual needs a minimum of two columns in the query result. By default, the first column is used as the color axis. This column can contain text, datetime, or numeric data types. Other columns will be used to determine the size of each slice and contain numeric data types. Pie charts are used for presenting a composition of categories and their proportions out of a total.

The pie chart visual can also be used in the context of Geospatial visualizations.

Syntax

T | render piechart [with (propertyName = propertyValue [, …])]

Parameters

NameTypeRequiredDescription
Tstring✔️Input table name.
propertyName, propertyValuestringA comma-separated list of key-value property pairs. See supported properties.

Supported properties

All properties are optional.

PropertyNamePropertyValue
accumulateWhether the value of each measure gets added to all its predecessors. (true or false)
kindFurther elaboration of the visualization kind. For more information, see kind property.
legendWhether to display a legend or not (visible or hidden).
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
titleThe title of the visualization (of type string).
xaxisHow to scale the x-axis (linear or log).
xcolumnWhich column in the result is used for the x-axis.
xtitleThe title of the x-axis (of type string).
yaxisHow to scale the y-axis (linear or log).
ycolumnsComma-delimited list of columns that consist of the values provided per value of the x column.
ytitleThe title of the y-axis (of type string).
PropertyNamePropertyValue
kindFurther elaboration of the visualization kind. For more information, see kind property.
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
titleThe title of the visualization (of type string).

kind property

This visualization can be further elaborated by providing the kind property. The supported values of this property are:

kind valueDescription
mapExpected columns are [Longitude, Latitude] or GeoJSON point, color-axis and numeric. Supported in Kusto Explorer desktop. For more information, see Geospatial visualizations

Example

This query provides a visual representation of the top 10 states with the highest number of storm events, displayed as a pie chart

StormEvents
| summarize statecount=count() by State
| sort by statecount 
| limit 10
| render piechart with(title="Storm Events by State")

Screenshot of pie chart visualization output.

15.2.1.9 - Pivot chart visualization

This article describes the pivot chart visualization.

Displays a pivot table and chart. You can interactively select data, columns, rows, and various chart types.

Syntax

T | render pivotchart

Parameters

NameTypeRequiredDescription
Tstring✔️Input table name.

Example

This query provides a detailed analysis of sales for Contoso computer products within the specified date range, visualized as a pivot chart.

SalesFact
| join kind= inner Products on ProductKey
| where ProductCategoryName has "Computers" and ProductName has "Contoso"
| where DateKey between (datetime(2006-12-31) .. datetime(2007-02-01))
| project SalesAmount, ProductName, DateKey
| render pivotchart

Output

Screenshot of query result showing a pivot chart visualization.

15.2.1.10 - Plotly visualization

This article describes how to visualize data using the Plotly graphics library.

The Plotly graphics library supports ~80 chart types that are useful for advanced charting including geographic, scientific, machine learning, 3d, animation, and many other chart types. For more information, see Plotly.

To render a Plotly visual in Kusto Query Language, the query must generate a table with a single string cell containing Plotly JSON. This Plotly JSON string can be generated by one of the following two methods:

Write your own Plotly visualization in Python

In this method, you dynamically create the Plotly JSON string in Python using the Plotly package. This process requires use of the python() plugin. The Python script is run on the existing nodes using the inline python() plugin. It generates a Plotly JSON that is rendered by the client application.

All types of Plotly visualizations are supported.

Example

The following query uses inline Python to create a 3D scatter chart:

OccupancyDetection
| project Temperature, Humidity, CO2, Occupancy
| where rand() < 0.1
| evaluate python(typeof(plotly:string),
```if 1:
    import plotly.express as px
    fig = px.scatter_3d(df, x='Temperature', y='Humidity', z='CO2', color='Occupancy')
    fig.update_layout(title=dict(text="Occupancy detection, plotly 5.11.0"))
    plotly_obj = fig.to_json()
    result = pd.DataFrame(data = [plotly_obj], columns = ["plotly"])
```)

Screenshot of plotly visual type.

The Plotly graphics library supports ~80 chart types including basic charts, scientific, statistical, financial, maps, 3D, animations, and more. To render a Plotly visual in KQL, the query must generate a table with a single string cell containing Plotly JSON.

Since python isn’t available in this service, you create this Plotly JSON using a preprepared template.

Use a preprepared Plotly template

In this method, a preprepared Plotly JSON for specific visualization can be reused by replacing the data objects with the required data to be rendered. The templates can be stored in a standard table, and the data replacement logic can be packed in a stored function.

Currently, the supported templates are: plotly_anomaly_fl() and plotly_scatter3d_fl(). Refer to these documents for syntax and usage.

Example

let plotly_scatter3d_fl=(tbl:(*), x_col:string, y_col:string, z_col:string, aggr_col:string='', chart_title:string='3D Scatter chart')
{
    let scatter3d_chart = toscalar(PlotlyTemplate | where name == "scatter3d" | project plotly);
    let tbl_ex = tbl | extend _x = column_ifexists(x_col, 0.0), _y = column_ifexists(y_col, 0.0), _z = column_ifexists(z_col, 0.0), _aggr = column_ifexists(aggr_col, 'ALL');
    tbl_ex
    | serialize 
    | summarize _x=pack_array(make_list(_x)), _y=pack_array(make_list(_y)), _z=pack_array(make_list(_z)) by _aggr
    | summarize _aggr=make_list(_aggr), _x=make_list(_x), _y=make_list(_y), _z=make_list(_z)
    | extend plotly = scatter3d_chart
    | extend plotly=replace_string(plotly, '$CLASS1$', tostring(_aggr[0]))
    | extend plotly=replace_string(plotly, '$CLASS2$', tostring(_aggr[1]))
    | extend plotly=replace_string(plotly, '$CLASS3$', tostring(_aggr[2]))
    | extend plotly=replace_string(plotly, '$X_NAME$', x_col)
    | extend plotly=replace_string(plotly, '$Y_NAME$', y_col)
    | extend plotly=replace_string(plotly, '$Z_NAME$', z_col)
    | extend plotly=replace_string(plotly, '$CLASS1_X$', tostring(_x[0]))
    | extend plotly=replace_string(plotly, '$CLASS1_Y$', tostring(_y[0]))
    | extend plotly=replace_string(plotly, '$CLASS1_Z$', tostring(_z[0]))
    | extend plotly=replace_string(plotly, '$CLASS2_X$', tostring(_x[1]))
    | extend plotly=replace_string(plotly, '$CLASS2_Y$', tostring(_y[1]))
    | extend plotly=replace_string(plotly, '$CLASS2_Z$', tostring(_z[1]))
    | extend plotly=replace_string(plotly, '$CLASS3_X$', tostring(_x[2]))
    | extend plotly=replace_string(plotly, '$CLASS3_Y$', tostring(_y[2]))
    | extend plotly=replace_string(plotly, '$CLASS3_Z$', tostring(_z[2]))
    | extend plotly=replace_string(plotly, '$TITLE$', chart_title)
    | project plotly
};
Iris
| invoke plotly_scatter3d_fl(x_col='SepalLength', y_col='PetalLength', z_col='SepalWidth', aggr_col='Class', chart_title='3D scatter chart using plotly_scatter3d_fl()')
| render plotly

Screenshot of output of plotly example.

15.2.1.11 - Scatter chart visualization

This article describes the scatter chart visualization.

In a scatter chart visual, the first column is the x-axis and should be a numeric column. Other numeric columns are y-axes. Scatter plots are used to observe relationships between variables. The scatter chart visual can also be used in the context of Geospatial visualizations.

Syntax

T | render scatterchart [with (propertyName = propertyValue [, …])]

Parameters

NameTypeRequiredDescription
Tstring✔️Input table name.
propertyName, propertyValuestringA comma-separated list of key-value property pairs. See supported properties.

Supported properties

All properties are optional.

PropertyNamePropertyValue
accumulateWhether the value of each measure gets added to all its predecessors. (true or false)
kindFurther elaboration of the visualization kind. For more information, see kind property.
legendWhether to display a legend or not (visible or hidden).
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
yminThe minimum value to be displayed on Y-axis.
ymaxThe maximum value to be displayed on Y-axis.
titleThe title of the visualization (of type string).
xaxisHow to scale the x-axis (linear or log).
xcolumnWhich column in the result is used for the x-axis.
xtitleThe title of the x-axis (of type string).
yaxisHow to scale the y-axis (linear or log).
ycolumnsComma-delimited list of columns that consist of the values provided per value of the x column.
ytitleThe title of the y-axis (of type string).
PropertyNamePropertyValue
kindFurther elaboration of the visualization kind. For more information, see kind property.
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
titleThe title of the visualization (of type string).

kind property

This visualization can be further elaborated by providing the kind property. The supported values of this property are:

kind valueDescription
mapExpected columns are [Longitude, Latitude] or GeoJSON point. Series column is optional. For more information, see Geospatial visualizations.

Example

This query provides a scatter chart that helps you analyze the correlation between state populations and the total property damage caused by storm events.

StormEvents
| summarize sum(DamageProperty)by State
| lookup PopulationData on State
| project-away State
| render scatterchart with (xtitle="State population", title="Property damage by state", legend=hidden)

Screenshot of scatter chart visualization output.

15.2.1.12 - Stacked area chart visualization

This article describes the stacked area chart visualization.

The stacked area chart visual shows a continuous relationship. This visual is similar to the Area chart, but shows the area under each element of a series. The first column of the query should be numeric and is used as the x-axis. Other numeric columns are the y-axes. Unlike line charts, area charts also visually represent volume. Area charts are ideal for indicating the change among different datasets.

Syntax

T | render stackedareachart [with (propertyName = propertyValue [, …])]

Supported parameters

NameTypeRequiredDescription
Tstring✔️Input table name.
propertyName, propertyValuestringA comma-separated list of key-value property pairs. See supported properties.

Supported properties

All properties are optional.

PropertyNamePropertyValue
accumulateWhether the value of each measure gets added to all its predecessors. (true or false)
legendWhether to display a legend or not (visible or hidden).
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
yminThe minimum value to be displayed on Y-axis.
ymaxThe maximum value to be displayed on Y-axis.
titleThe title of the visualization (of type string).
xaxisHow to scale the x-axis (linear or log).
xcolumnWhich column in the result is used for the x-axis.
xtitleThe title of the x-axis (of type string).
yaxisHow to scale the y-axis (linear or log).
ycolumnsComma-delimited list of columns that consist of the values provided per value of the x column.
ytitleThe title of the y-axis (of type string).

Example

The following query summarizes data from the nyc_taxi table by number of passengers and visualizes the data in a stacked area chart. The x-axis shows the pickup time in two day intervals, and the stacked areas represent different passenger counts.

nyc_taxi
| summarize count() by passenger_count, bin(pickup_datetime, 2d)
| render stackedareachart with (xcolumn=pickup_datetime, series=passenger_count)

Output

Screenshot of stacked area chart visual output.

15.2.1.13 - Table visualization

This article describes the table visualization.

Default - results are shown as a table.

Syntax

T | render table [with (propertyName = propertyValue [, …])]

Parameters

NameTypeRequiredDescription
Tstring✔️Input table name.
propertyName, propertyValuestringA comma-separated list of key-value property pairs. See supported properties.

Supported properties

All properties are optional.

PropertyNamePropertyValue
accumulateWhether the value of each measure gets added to all its predecessors. (true or false)
legendWhether to display a legend or not (visible or hidden).
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
yminThe minimum value to be displayed on Y-axis.
ymaxThe maximum value to be displayed on Y-axis.
titleThe title of the visualization (of type string).
xaxisHow to scale the x-axis (linear or log).
xcolumnWhich column in the result is used for the x-axis.
xtitleThe title of the x-axis (of type string).
yaxisHow to scale the y-axis (linear or log).
ycolumnsComma-delimited list of columns that consist of the values provided per value of the x column.
ytitleThe title of the y-axis (of type string).
PropertyNamePropertyValue
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
titleThe title of the visualization (of type string).

Example

This query outputs a snapshot of the first 10 storm event records, displayed in a table format.

StormEvents
| take 10 
| render table 

Screenshot of table visualization output.

15.2.1.14 - Time chart visualization

This article describes the time chart visualization.

A time chart visual is a type of line graph. The first column of the query is the x-axis, and should be a datetime. Other numeric columns are y-axes. One string column values are used to group the numeric columns and create different lines in the chart. Other string columns are ignored. The time chart visual is like a line chart except the x-axis is always time.

Syntax

T | render timechart [with (propertyName = propertyValue [, …])]

Parameters

NameTypeRequiredDescription
Tstring✔️Input table name.
propertyName, propertyValuestringA comma-separated list of key-value property pairs. See supported properties.

Supported properties

All properties are optional.

PropertyNamePropertyValue
accumulateWhether the value of each measure gets added to all its predecessors (true or false).
legendWhether to display a legend or not (visible or hidden).
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
yminThe minimum value to be displayed on Y-axis.
ymaxThe maximum value to be displayed on Y-axis.
titleThe title of the visualization (of type string).
xaxisHow to scale the x-axis (linear or log).
xcolumnWhich column in the result is used for the x-axis.
xtitleThe title of the x-axis (of type string).
yaxisHow to scale the y-axis (linear or log).
ycolumnsComma-delimited list of columns that consist of the values provided per value of the x column.
ysplitHow to split the visualization into multiple y-axis values. For more information, see ysplit property.
ytitleThe title of the y-axis (of type string).

ysplit property

This visualization supports splitting into multiple y-axis values:

ysplitDescription
noneA single y-axis is displayed for all series data. (Default)
axesA single chart is displayed with multiple y-axes (one per series).
panelsOne chart is rendered for each ycolumn value. Maximum five panels.

Examples

The example in this section shows how to use the syntax to help you get started.

Render a timechart

The following example renders a timechart with a title “Web app. traffic over a month, decomposing” that decomposes the data into baseline, seasonal, trend, and residual components.

let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t step dt by sid 
| where sid == 'TS1'   //  select a single time series for a cleaner visualization
| extend (baseline, seasonal, trend, residual) = series_decompose(num, -1, 'linefit')  //  decomposition of a set of time series to seasonal, trend, residual, and baseline (seasonal+trend)
| render timechart with(title='Web app. traffic over a month, decomposition')

Screenshot of timechart visualization output.

Label a timechart

The following example renders a timechart that depicts crop damage grouped by week. The timechart x axis label is “Date” and the y axis label is “Crop damage.”

StormEvents
| where StartTime between (datetime(2007-01-01) .. datetime(2007-12-31)) 
    and DamageCrops > 0
| summarize EventCount = count() by bin(StartTime, 7d)
| render timechart
    with (
    title="Crop damage over time",
    xtitle="Date",
    ytitle="Crop damage",
    legend=hidden
    )

Screenshot of timechart with labels.

View multiple y-axes

The following example renders daily hail events in the states of Texas, Nebraska, and Kansas. The visualization uses the ysplit property to render each state’s events in separate panels for comparison.

StormEvents
| where State in ("TEXAS", "NEBRASKA", "KANSAS") and EventType == "Hail"
| summarize count() by State, bin(StartTime, 1d)
| render timechart with (ysplit=panels)

Screenshot of the time chart query result with the ysplit panels property.

Supported properties

All properties are optional.

PropertyNamePropertyValue
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
titleThe title of the visualization (of type string).

Example

The following example renders a timechart with a title “Web app. traffic over a month, decomposing” that decomposes the data into baseline, seasonal, trend, and residual components.

let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t step dt by sid 
| where sid == 'TS1'   //  select a single time series for a cleaner visualization
| extend (baseline, seasonal, trend, residual) = series_decompose(num, -1, 'linefit')  //  decomposition of a set of time series to seasonal, trend, residual, and baseline (seasonal+trend)
| render timechart with(title='Web app. traffic of a month, decomposition')

Screenshot of timechart visualization output.

15.2.1.15 - Time pivot visualization

This article describes the time pivot visualization.

The time pivot visualization is an interactive navigation over the events time-line pivoting on time axis.

Syntax

T | render timepivot [with (propertyName = propertyValue [, …])]

Parameters

NameTypeRequiredDescription
Tstring✔️Input table name.
propertyName, propertyValuestringA comma-separated list of key-value property pairs. See supported properties.

Supported properties

All properties are optional.

PropertyNamePropertyValue
accumulateWhether the value of each measure gets added to all its predecessors. (true or false)
legendWhether to display a legend or not (visible or hidden).
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
yminThe minimum value to be displayed on Y-axis.
ymaxThe maximum value to be displayed on Y-axis.
titleThe title of the visualization (of type string).
xaxisHow to scale the x-axis (linear or log).
xcolumnWhich column in the result is used for the x-axis.
xtitleThe title of the x-axis (of type string).
yaxisHow to scale the y-axis (linear or log).
ycolumnsComma-delimited list of columns that consist of the values provided per value of the x column.
ytitleThe title of the y-axis (of type string).

Example

This query outputs a visualization of flood events in the specified Midwestern states, displayed as a time pivot chart.

let midwesternStates = dynamic([
    "ILLINOIS", "INDIANA", "IOWA", "KANSAS", "MICHIGAN", "MINNESOTA",
    "MISSOURI", "NEBRASKA", "NORTH DAKOTA", "OHIO", "SOUTH DAKOTA", "WISCONSIN"
]);
StormEvents
| where EventType == "Flood" and State in (midwesternStates)
| render timepivot with (xcolumn=State)

Output

:::image type=“content” source=“media/visualization-timepivot/time-pivot-visualization.jpg” lightbox=“media/visualization-timepivot/time-pivot-visualization.jpg” alt-text=“Screenshot of timepivot in Kusto.Explorer.”:::

15.2.1.16 - Treemap visualization

Learn how to use the treemap visualization to visualize data.

Treemaps display hierarchical data as a set of nested rectangles. Each level of the hierarchy is represented by a colored rectangle (branch) containing smaller rectangles (leaves).

Syntax

T | render treemap [with (propertyName = propertyValue [, …])]

Parameters

NameTypeRequiredDescription
Tstring✔️Input table name.
propertyName, propertyValuestringA comma-separated list of key-value property pairs. See supported properties.

Supported properties

All properties are optional.

PropertyNamePropertyValue
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.

Example

This query counts the number of storm events for each type and state, sorts them in descending order, limits the results to the top 30, and then visualizes the data as a treemap.

StormEvents
| summarize StormEvents=count() by EventType, State
| sort by StormEvents
| limit 30
| render treemap with(title="Storm Events by EventType and State")

Screenshot of treemap visualization output.

15.2.2 - render operator

Learn how to use the render operator to instruct the user agent to render a visualization of the query results.

Instructs the user agent to render a visualization of the query results.

The render operator must be the last operator in the query, and can only be used with queries that produce a single tabular data stream result. The render operator doesn’t modify data. It injects an annotation (“Visualization”) into the result’s extended properties. The annotation contains the information provided by the operator in the query. The interpretation of the visualization information is done by the user agent. Different agents, such as Kusto.Explorer or Azure Data Explorer web UI, may support different visualizations.

The data model of the render operator looks at the tabular data as if it has three kinds of columns:

  • The x axis column (indicated by the xcolumn property).

  • The series columns (any number of columns indicated by the series property.) For each record, the combined values of these columns define a single series, and the chart has as many series as there are distinct combined values.

  • The y axis columns (any number of columns indicated by the ycolumns property). For each record, the series has as many measurements (“points” in the chart) as there are y-axis columns.

    by the query. In particular, having “uninteresting” columns in the schema of the result might translate into them guessing wrong. Try projecting-away such columns when that happens.

Syntax

T | render visualization [with ( propertyName = propertyValue [, …])]

Parameters

NameTypeRequiredDescription
Tstring✔️Input table name.
visualizationstring✔️Indicates the kind of visualization to use. Must be one of the supported values in the following list.
propertyName, propertyValuestringA comma-separated list of key-value property pairs. See supported properties.

Visualization

visualizationDescriptionIllustration
anomalychartSimilar to timechart, but highlights anomalies using series_decompose_anomalies function.:::image type=“icon” source=“media/renderoperator/anomaly-chart.png” border=“false”:::
areachartArea graph.:::image type=“icon” source=“media/renderoperator/area-chart.png” border=“false”:::
barchartdisplayed as horizontal strips.:::image type=“icon” source=“media/renderoperator/bar-chart.png” border=“false”:::
cardFirst result record is treated as set of scalar values and shows as a card.:::image type=“icon” source=“media/renderoperator/card.png” border=“false”:::
columnchartLike barchart with vertical strips instead of horizontal strips.:::image type=“icon” source=“media/renderoperator/column-chart.png” border=“false”:::
ladderchartLast two columns are the x-axis, other columns are y-axis.:::image type=“icon” source=“media/renderoperator/ladder-chart.png” border=“false”:::
linechartLine graph.:::image type=“icon” source=“media/renderoperator/line-chart.png” border=“false”:::
piechartFirst column is color-axis, second column is numeric.:::image type=“icon” source=“media/renderoperator/pie-chart.png” border=“false”:::
pivotchartDisplays a pivot table and chart. User can interactively select data, columns, rows and various chart types.:::image type=“icon” source=“media/renderoperator/pivot-chart.png” border=“false”:::
scatterchartPoints graph.:::image type=“icon” source=“media/renderoperator/scatter-chart.png” border=“false”:::
stackedareachartStacked area graph.:::image type=“icon” source=“media/renderoperator/stacked-area-chart.png” border=“false”:::
tableDefault - results are shown as a table.:::image type=“icon” source=“media/renderoperator/table-visualization.png” border=“false”:::
timechartLine graph. First column is x-axis, and must be datetime. Other (numeric) columns are y-axes.:::image type=“icon” source=“media/renderoperator/visualization-timechart.png” border=“false”:::
timepivotInteractive navigation over the events time-line (pivoting on time axis):::image type=“icon” source=“media/renderoperator/visualization-time-pivot.png” border=“false”:::
treemapDisplays hierarchical data as a set of nested rectangles.:::image type=“icon” source=“media/renderoperator/tree-map.png” border=“false”:::
VisualizationDescriptionIllustration
areachartArea graph. First column is the x-axis and should be a numeric column. Other numeric columns are y-axes.:::image type=“icon” source=“media/renderoperator/area-chart.png” border=“false”:::
barchartFirst column is the x-axis and can be text, datetime or numeric. Other columns are numeric, displayed as horizontal strips.:::image type=“icon” source=“media/renderoperator/bar-chart.png” border=“false”:::
columnchartLike barchart with vertical strips instead of horizontal strips.:::image type=“icon” source=“media/renderoperator/column-chart.png” border=“false”:::
piechartFirst column is color-axis, second column is numeric.:::image type=“icon” source=“media/renderoperator/pie-chart.png” border=“false”:::
scatterchartPoints graph. First column is the x-axis and should be a numeric column. Other numeric columns are y-axes.:::image type=“icon” source=“media/renderoperator/scatter-chart.png” border=“false”:::
tableDefault - results are shown as a table.:::image type=“icon” source=“media/renderoperator/table-visualization.png” border=“false”:::
timechartLine graph. First column is x-axis, and should be datetime. Other (numeric) columns are y-axes. There’s one string column whose values are used to “group” the numeric columns and create different lines in the chart (further string columns are ignored).:::image type=“icon” source=“media/renderoperator/visualization-timechart.png” border=“false”:::
visualizationDescriptionIllustration
anomalychartSimilar to timechart, but highlights anomalies using series_decompose_anomalies function.:::image type=“icon” source=“media/renderoperator/anomaly-chart.png” border=“false”:::
areachartArea graph.:::image type=“icon” source=“media/renderoperator/area-chart.png” border=“false”:::
barchartdisplayed as horizontal strips.:::image type=“icon” source=“media/renderoperator/bar-chart.png” border=“false”:::
cardFirst result record is treated as set of scalar values and shows as a card.:::image type=“icon” source=“media/renderoperator/card.png” border=“false”:::
columnchartLike barchart with vertical strips instead of horizontal strips.:::image type=“icon” source=“media/renderoperator/column-chart.png” border=“false”:::
linechartLine graph.:::image type=“icon” source=“media/renderoperator/line-chart.png” border=“false”:::
piechartFirst column is color-axis, second column is numeric.:::image type=“icon” source=“media/renderoperator/pie-chart.png” border=“false”:::
scatterchartPoints graph.:::image type=“icon” source=“media/renderoperator/scatter-chart.png” border=“false”:::
stackedareachartStacked area graph.:::image type=“icon” source=“media/renderoperator/stacked-area-chart.png” border=“false”:::
tableDefault - results are shown as a table.:::image type=“icon” source=“media/renderoperator/table-visualization.png” border=“false”:::
timechartLine graph. First column is x-axis, and must be datetime. Other (numeric) columns are y-axes.:::image type=“icon” source=“media/renderoperator/visualization-timechart.png” border=“false”:::

Supported properties

PropertyName/PropertyValue indicate additional information to use when rendering. All properties are optional. The supported properties are:

PropertyNamePropertyValue
accumulateWhether the value of each measure gets added to all its predecessors. (true or false)
kindFurther elaboration of the visualization kind. For more information, see kind property.
legendWhether to display a legend or not (visible or hidden).
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
yminThe minimum value to be displayed on Y-axis.
ymaxThe maximum value to be displayed on Y-axis.
titleThe title of the visualization (of type string).
xaxisHow to scale the x-axis (linear or log).
xcolumnWhich column in the result is used for the x-axis.
xtitleThe title of the x-axis (of type string).
yaxisHow to scale the y-axis (linear or log).
ycolumnsComma-delimited list of columns that consist of the values provided per value of the x column.
ysplitHow to split the visualization into multiple y-axis values. For more information, see y-split property.
ytitleThe title of the y-axis (of type string).
anomalycolumnsProperty relevant only for anomalychart. Comma-delimited list of columns, which will be considered as anomaly series and displayed as points on the chart
PropertyNamePropertyValue
kindFurther elaboration of the visualization kind. For more information, see kind property.
seriesComma-delimited list of columns whose combined per-record values define the series that record belongs to.
titleThe title of the visualization (of type string).

kind property

This visualization can be further elaborated by providing the kind property. The supported values of this property are:

VisualizationkindDescription
areachartdefaultEach “area” stands on its own.
unstackedSame as default.
stackedStack “areas” to the right.
stacked100Stack “areas” to the right and stretch each one to the same width as the others.
barchartdefaultEach “bar” stands on its own.
unstackedSame as default.
stackedStack “bars”.
stacked100Stack “bars” and stretch each one to the same width as the others.
columnchartdefaultEach “column” stands on its own.
unstackedSame as default.
stackedStack “columns” one atop the other.
stacked100Stack “columns” and stretch each one to the same height as the others.
scatterchartmapExpected columns are [Longitude, Latitude] or GeoJSON point. Series column is optional. For more information, see Geospatial visualizations.
piechartmapExpected columns are [Longitude, Latitude] or GeoJSON point, color-axis and numeric. Supported in Kusto Explorer desktop. For more information, see Geospatial visualizations.

ysplit property

Some visualizations support splitting into multiple y-axis values:

ysplitDescription
noneA single y-axis is displayed for all series data. (Default)
axesA single chart is displayed with multiple y-axes (one per series).
panelsOne chart is rendered for each ycolumn value. Maximum five panels.

How to render continuous data

Several visualizations are used for rendering sequences of values, for example, linechart, timechart, and areachart. These visualizations have the following conceptual model:

  • One column in the table represents the x-axis of the data. This column can be explicitly defined using the xcolumn property. If not defined, the user agent picks the first column that is appropriate for the visualization.
    • For example: in the timechart visualization, the user agent uses the first datetime column.
    • If this column is of type dynamic and it holds an array, the individual values in the array will be treated as the values of the x-axis.
  • One or more columns in the table represent one or more measures that vary by the x-axis. These columns can be explicitly defined using the ycolumns property. If not defined, the user agent picks all columns that are appropriate for the visualization.
    • For example: in the timechart visualization, the user agent uses all columns with a numeric value that haven’t been specified otherwise.
    • If the x-axis is an array, the values of each y-axis should also be an array of a similar length, with each y-axis occurring in a single column.
  • Zero or more columns in the table represent a unique set of dimensions that group together the measures. These columns can be specified by the series property, or the user agent will pick them automatically from the columns that are otherwise unspecified.

three kinds of columns: property). For each record, the series has as many measurements (“points” in the chart) as there are y-axis columns.

Example

InsightsMetrics
| where Computer == "DC00.NA.contosohotels.com"
| where Namespace  == "Processor" and Name == "UtilizationPercentage"
| summarize avg(Val) by Computer, bin(TimeGenerated, 1h)
| render timechart

15.3 - Summarize operator

15.3.1 - Kusto partition & compose intermediate aggregation results

Learn how to use the hll() and tdigest() functions to partition and compose intermediate results of aggregations.

Suppose you want to calculate the count of distinct users every day over the last seven days. You can run summarize dcount(user) once a day with a span filtered to the last seven days. This method is inefficient, because each time the calculation is run, there’s a six-day overlap with the previous calculation. You can also calculate an aggregate for each day, and then combine these aggregates. This method requires you to “remember” the last six results, but it’s much more efficient.

Partitioning queries as described is easy for simple aggregates, such as count() and sum(). It can also be useful for complex aggregates, such as dcount() and percentiles(). This article explains how Kusto supports such calculations.

The following examples show how to use hll/tdigest and demonstrate that using these commands is highly performant in some scenarios:

range x from 1 to 1000000 step 1
| summarize hll(x,4)
| project sizeInMb = estimate_data_size(hll_x) / pow(1024,2)

Output

sizeInMb
1.0000524520874

Ingesting this object into a table before applying this kind of policy will ingest null:

.set-or-append MyTable <| range x from 1 to 1000000 step 1
| summarize hll(x,4)
MyTable
| project isempty(hll_x)

Output

Column1
1

To avoid ingesting null, use the special encoding policy type bigobject, which overrides the MaxValueSize to 2 MB like this:

.alter column MyTable.hll_x policy encoding type='bigobject'

Ingesting a value now to the same table above:

.set-or-append MyTable <| range x from 1 to 1000000 step 1
| summarize hll(x,4)

ingests the second value successfully:

MyTable
| project isempty(hll_x)

Output

Column1
1
0

Example: Count with binned timestamp

There’s a table, PageViewsHllTDigest, containing hll values of Pages viewed in each hour. You want these values binned to 12h. Merge the hll values using the hll_merge() aggregate function, with the timestamp binned to 12h. Use the function dcount_hll to return the final dcount value:

PageViewsHllTDigest
| summarize merged_hll = hll_merge(hllPage) by bin(Timestamp, 12h)
| project Timestamp , dcount_hll(merged_hll)

Output

Timestampdcount_hll_merged_hll
2016-05-01 12:00:00.000000020056275
2016-05-02 00:00:00.000000038797623
2016-05-02 12:00:00.000000039316056
2016-05-03 00:00:00.000000013685621

To bin timestamp for 1d:

PageViewsHllTDigest
| summarize merged_hll = hll_merge(hllPage) by bin(Timestamp, 1d)
| project Timestamp , dcount_hll(merged_hll)

Output

Timestampdcount_hll_merged_hll
2016-05-01 00:00:00.000000020056275
2016-05-02 00:00:00.000000064135183
2016-05-03 00:00:00.000000013685621

The same query may be done over the values of tdigest, which represent the BytesDelivered in each hour:

PageViewsHllTDigest
| summarize merged_tdigests = merge_tdigest(tdigestBytesDel) by bin(Timestamp, 12h)
| project Timestamp , percentile_tdigest(merged_tdigests, 95, typeof(long))

Output

Timestamppercentile_tdigest_merged_tdigests
2016-05-01 12:00:00.0000000170200
2016-05-02 00:00:00.0000000152975
2016-05-02 12:00:00.0000000181315
2016-05-03 00:00:00.0000000146817

Example: Temporary table

Kusto limits are reached with datasets that are too large, where you need to run periodic queries over the dataset, but run the regular queries to calculate percentile() or dcount() over large datasets.

To solve this problem, newly added data may be added to a temp table as hll or tdigest values using hll() when the required operation is dcount or tdigest() when the required operation is percentile using set/append or update policy. In this case, the intermediate results of dcount or tdigest are saved into another dataset, which should be smaller than the target large one.

To solve this problem, newly added data may be added to a temp table as hll or tdigest values using hll() when the required operation is dcount. In this case, the intermediate results of dcount are saved into another dataset, which should be smaller than the target large one.

When you need to get the final results of these values, the queries may use hll/tdigest mergers: hll-merge()/tdigest_merge(). Then, after getting the merged values, percentile_tdigest() / dcount_hll() may be invoked on these merged values to get the final result of dcount or percentiles.

Assuming there’s a table, PageViews, into which data is ingested daily, every day on which you want to calculate the distinct count of pages viewed per minute later than date = datetime(2016-05-01 18:00:00.0000000).

Run the following query:

PageViews
| where Timestamp > datetime(2016-05-01 18:00:00.0000000)
| summarize percentile(BytesDelivered, 90), dcount(Page,2) by bin(Timestamp, 1d)

Output

Timestamppercentile_BytesDelivered_90dcount_Page
2016-05-01 00:00:00.00000008363420056275
2016-05-02 00:00:00.00000008277064135183
2016-05-03 00:00:00.00000007292013685621

This query aggregates all the values every time you run this query (for example, if you want to run it many times a day).

If you save the hll and tdigest values (which are the intermediate results of dcount and percentile) into a temp table, PageViewsHllTDigest, using an update policy or set/append commands, you may only merge the values and then use dcount_hll/percentile_tdigest using the following query:

PageViewsHllTDigest
| summarize  percentile_tdigest(merge_tdigest(tdigestBytesDel), 90), dcount_hll(hll_merge(hllPage)) by bin(Timestamp, 1d)

Output

Timestamppercentile_tdigest_merge_tdigests_tdigestBytesDeldcount_hll_hll_merge_hllPage
2016-05-01 00:00:00.00000008422420056275
2016-05-02 00:00:00.00000008348664135183
2016-05-03 00:00:00.00000007224713685621

This query should be more performant, as it runs over a smaller table. In this example, the first query runs over ~215M records, while the second one runs over just 32 records:

Example: Intermediate results

The Retention Query. Assume you have a table that summarizes when each Wikipedia page was viewed (sample size is 10M), and you want to find for each date1 date2 the percentage of pages reviewed in both date1 and date2 relative to the pages viewed on date1 (date1 < date2).

The trivial way uses join and summarize operators:

// Get the total pages viewed each day
let totalPagesPerDay = PageViewsSample
| summarize by Page, Day = startofday(Timestamp)
| summarize count() by Day;
// Join the table to itself to get a grid where 
// each row shows foreach page1, in which two dates
// it was viewed.
// Then count the pages between each two dates to
// get how many pages were viewed between date1 and date2.
PageViewsSample
| summarize by Page, Day1 = startofday(Timestamp)
| join kind = inner
(
    PageViewsSample
    | summarize by Page, Day2 = startofday(Timestamp)
)
on Page
| where Day2 > Day1
| summarize count() by Day1, Day2
| join kind = inner
    totalPagesPerDay
on $left.Day1 == $right.Day
| project Day1, Day2, Percentage = count_*100.0/count_1

Output

Day1Day2Percentage
2016-05-01 00:00:00.00000002016-05-02 00:00:00.000000034.0645725975255
2016-05-01 00:00:00.00000002016-05-03 00:00:00.000000016.618368960101
2016-05-02 00:00:00.00000002016-05-03 00:00:00.000000014.6291376489636

The above query took ~18 seconds.

When you use the hll(), hll_merge(), and dcount_hll() functions, the equivalent query will end after ~1.3 seconds and show that the hll functions speeds up the query above by ~14 times:

let Stats=PageViewsSample | summarize pagehll=hll(Page, 2) by day=startofday(Timestamp); // saving the hll values (intermediate results of the dcount values)
let day0=toscalar(Stats | summarize min(day)); // finding the min date over all dates.
let dayn=toscalar(Stats | summarize max(day)); // finding the max date over all dates.
let daycount=tolong((dayn-day0)/1d); // finding the range between max and min
Stats
| project idx=tolong((day-day0)/1d), day, pagehll
| mv-expand pidx=range(0, daycount) to typeof(long)
// Extend the column to get the dcount value from hll'ed values for each date (same as totalPagesPerDay from the above query)
| extend key1=iff(idx < pidx, idx, pidx), key2=iff(idx < pidx, pidx, idx), pages=dcount_hll(pagehll)
// For each two dates, merge the hll'ed values to get the total dcount over each two dates, 
// This helps to get the pages viewed in both date1 and date2 (see the description below about the intersection_size)
| summarize (day1, pages1)=arg_min(day, pages), (day2, pages2)=arg_max(day, pages), union_size=dcount_hll(hll_merge(pagehll)) by key1, key2
| where day2 > day1
// To get pages viewed in date1 and also date2, look at the merged dcount of date1 and date2, subtract it from pages of date1 + pages on date2.
| project pages1, day1,day2, intersection_size=(pages1 + pages2 - union_size)
| project day1, day2, Percentage = intersection_size*100.0 / pages1

Output

day1day2Percentage
2016-05-01 00:00:00.00000002016-05-02 00:00:00.000000033.2298494510578
2016-05-01 00:00:00.00000002016-05-03 00:00:00.000000016.9773830213667
2016-05-02 00:00:00.00000002016-05-03 00:00:00.000000014.5160020350006

15.3.2 - summarize operator

Learn how to use the summarize operator to produce a table that summarizes the content of the input table.

Produces a table that aggregates the content of the input table.

Syntax

T | summarize [ SummarizeParameters ] [[Column =] Aggregation [, …]] [by [Column =] GroupExpression [, …]]

Parameters

NameTypeRequiredDescription
ColumnstringThe name for the result column. Defaults to a name derived from the expression.
Aggregationstring✔️A call to an aggregation function such as count() or avg(), with column names as arguments.
GroupExpressionscalar✔️A scalar expression that can reference the input data. The output will have as many records as there are distinct values of all the group expressions.
SummarizeParametersstringZero or more space-separated parameters in the form of Name = Value that control the behavior. See supported parameters.

Supported parameters

NameDescription
hint.num_partitionsSpecifies the number of partitions used to share the query load on cluster nodes. See shuffle query
hint.shufflekey=<key>The shufflekey query shares the query load on cluster nodes, using a key to partition data. See shuffle query
hint.strategy=shuffleThe shuffle strategy query shares the query load on cluster nodes, where each node will process one partition of the data. See shuffle query

Returns

The input rows are arranged into groups having the same values of the by expressions. Then the specified aggregation functions are computed over each group, producing a row for each group. The result contains the by columns and also at least one column for each computed aggregate. (Some aggregation functions return multiple columns.)

The result has as many rows as there are distinct combinations of by values (which may be zero). If there are no group keys provided, the result has a single record.

To summarize over ranges of numeric values, use bin() to reduce ranges to discrete values.

Default values of aggregations

The following table summarizes the default values of aggregations:

OperatorDefault value
count(), countif(), dcount(), dcountif(), count_distinct(), sum(), sumif(), variance(), varianceif(), stdev(), stdevif()0
make_bag(), make_bag_if(), make_list(), make_list_if(), make_set(), make_set_if()empty dynamic array ([])
All othersnull

Examples

The example in this section shows how to use the syntax to help you get started.

Summarize price by fruit and supplier.

Unique combination

The following query determines what unique combinations of State and EventType there are for storms that resulted in direct injury. There are no aggregation functions, just group-by keys. The output will just show the columns for those results.

StormEvents
| where InjuriesDirect > 0
| summarize by State, EventType

Output

The following table shows only the first 5 rows. To see the full output, run the query.

StateEventType
TEXASThunderstorm Wind
TEXASFlash Flood
TEXASWinter Weather
TEXASHigh Wind
TEXASFlood

Minimum and maximum timestamp

Finds the minimum and maximum heavy rain storms in Hawaii. There’s no group-by clause, so there’s just one row in the output.

StormEvents
| where State == "HAWAII" and EventType == "Heavy Rain"
| project Duration = EndTime - StartTime
| summarize Min = min(Duration), Max = max(Duration)

Output

MinMax
01:08:0011:55:00

Distinct count

The following query calculates the number of unique storm event types for each state and sorts the results by the number of unique storm types:

StormEvents
| summarize TypesOfStorms=dcount(EventType) by State
| sort by TypesOfStorms

Output

The following table shows only the first 5 rows. To see the full output, run the query.

StateTypesOfStorms
TEXAS27
CALIFORNIA26
PENNSYLVANIA25
GEORGIA24
ILLINOIS23

Histogram

The following example calculates a histogram storm event types that had storms lasting longer than 1 day. Because Duration has many values, use bin() to group its values into 1-day intervals.

StormEvents
| project EventType, Duration = EndTime - StartTime
| where Duration > 1d
| summarize EventCount=count() by EventType, Length=bin(Duration, 1d)
| sort by Length

Output

EventTypeLengthEventCount
Drought30.00:00:001646
Wildfire30.00:00:0011
Heat30.00:00:0014
Flood30.00:00:0020
Heavy Rain29.00:00:0042

Aggregates default values

When the input of summarize operator has at least one empty group-by key, its result is empty, too.

When the input of summarize operator doesn’t have an empty group-by key, the result is the default values of the aggregates used in the summarize For more information, see Default values of aggregations.

datatable(x:long)[]
| summarize any_x=take_any(x), arg_max_x=arg_max(x, *), arg_min_x=arg_min(x, *), avg(x), buildschema(todynamic(tostring(x))), max(x), min(x), percentile(x, 55), hll(x) ,stdev(x), sum(x), sumif(x, x > 0), tdigest(x), variance(x)

Output

any_xarg_max_xarg_min_xavg_xschema_xmax_xmin_xpercentile_x_55hll_xstdev_xsum_xsumif_xtdigest_xvariance_x
NaN0000

The result of avg_x(x) is NaN due to dividing by 0.

datatable(x:long)[]
| summarize  count(x), countif(x > 0) , dcount(x), dcountif(x, x > 0)

Output

count_xcountif_dcount_xdcountif_x
0000
datatable(x:long)[]
| summarize  make_set(x), make_list(x)

Output

set_xlist_x
[][]

The aggregate avg sums all the non-nulls and counts only those which participated in the calculation (won’t take nulls into account).

range x from 1 to 4 step 1
| extend y = iff(x == 1, real(null), real(5))
| summarize sum(y), avg(y)

Output

sum_yavg_y
155

The regular count will count nulls:

range x from 1 to 2 step 1
| extend y = iff(x == 1, real(null), real(5))
| summarize count(y)

Output

count_y
2
range x from 1 to 2 step 1
| extend y = iff(x == 1, real(null), real(5))
| summarize make_set(y), make_set(y)

Output

set_yset_y1
[5.0][5.0]

15.4 - as operator

Learn how to use the as operator to bind a name to the operator’s input tabular expression.

Binds a name to the operator’s input tabular expression. This operator allows the query to reference the value of the tabular expression multiple times without breaking the query and binding a name through the let statement.

To optimize multiple uses of the as operator within a single query, see Named expressions.

Syntax

T | as [hint.materialized = Materialized] Name

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular expression to rename.
Namestring✔️The temporary name for the tabular expression.
hint.materializedboolIf Materialized is set to true, the value of the tabular expression output is wrapped by a materialize() function call. Otherwise, the value is recalculated on every reference.

Examples

In the following two examples, the generated TableName column consists of ‘T1’ and ‘T2’.

range x from 1 to 5 step 1 
| as T1 
| union withsource=TableName (range x from 1 to 5 step 1 | as T2)

Alternatively, you can write the same example as follows:

union withsource=TableName (range x from 1 to 5 step 1 | as T1), (range x from 1 to 5 step 1 | as T2)

Output

TableNamex
T11
T12
T13
T14
T15
T21
T22
T23
T24
T25

In the following example, the ’left side’ of the join is: MyLogTable filtered by type == "Event" and Name == "Start" and the ‘right side’ of the join is: MyLogTable filtered by type == "Event" and Name == "Stop"

MyLogTable  
| where type == "Event"
| as T
| where Name == "Start"
| join (
    T
    | where Name == "Stop"
) on ActivityId

15.5 - consume operator

Learn how to use the consume operator to consume the tabular data stream handed to the operator.

Consumes the tabular data stream handed to the operator.

The consume operator is mostly used for triggering the query side-effect without actually returning the results back to the caller.

The consume operator can be used for estimating the cost of a query without actually delivering the results back to the client. (The estimation isn’t exact for various reasons; for example, consume is calculated distributively, so T | consume won’t transmit the table’s data between the nodes of the cluster.)

Syntax

consume [decodeblocks = DecodeBlocks]

Parameters

NameTypeRequiredDescription
DecodeBlocksboolIf set to true, or if the request property perftrace is set to true, the consume operator won’t just enumerate the records at its input, but actually force each value in those records to be decompressed and decoded.

15.6 - count operator

Learn how to use the count operator to return the number of records in the input record set.

Returns the number of records in the input record set.

Syntax

T | count

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose records are to be counted.

Returns

This function returns a table with a single record and column of type long. The value of the only cell is the number of records in T.

Example

When you use the count operator with a table name, like StormEvents, it will return the total number of records in that table.

StormEvents | count

Output

Count
59066

For information about the count() aggregation function, see count() (aggregation function).

15.7 - datatable operator

Learn how to use the datatable operator to define a table with given schema and data.

Returns a table whose schema and values are defined in the query itself.

Syntax

datatable( ColumnName : ColumnType [, …]) [ ScalarValue [, …] ]

Parameters

NameTypeRequiredDescription
ColumnNamestring✔️The name for a column.
ColumnTypestring✔️The type of data in the column.
ScalarValuescalar✔️The value to insert into the table. The total number of values must be a multiple of the number of columns in the table. Each value is assigned to a column based on its position. Specifically, the n’th value is assigned to the column at position n % NumColumns, where NumColumns is the total number of columns.

Returns

This operator returns a data table of the given schema and data.

Example

This example creates a table with Date, Event, and MoreData columns, filters rows with Event descriptions longer than 4 characters, and adds a new column key2 to each row from the MoreData dynamic object.

datatable(Date:datetime, Event:string, MoreData:dynamic) [
    datetime(1910-06-11), "Born", dynamic({"key1":"value1", "key2":"value2"}),
    datetime(1930-01-01), "Enters Ecole Navale", dynamic({"key1":"value3", "key2":"value4"}),
    datetime(1953-01-01), "Published first book", dynamic({"key1":"value5", "key2":"value6"}),
    datetime(1997-06-25), "Died", dynamic({"key1":"value7", "key2":"value8"}),
]
| where strlen(Event) > 4
| extend key2 = MoreData.key2

Output

DateEventMoreDatakey2
1930-01-01 00:00:00.0000000Enters Ecole Navale{
“key1”: “value3”,
“key2”: “value4”
}
value4
1953-01-01 00:00:00.0000000Published first book{
“key1”: “value5”,
“key2”: “value6”
}
value6

15.8 - distinct operator

Learn how to use the distinct operator to create a table with the distinct combination of the columns of the input table.

Produces a table with the distinct combination of the provided columns of the input table.

Syntax

T | distinct ColumnName[,ColumnName2, ...]

Parameters

NameTypeRequiredDescription
ColumnNamestring✔️The column name to search for distinct values.

Example

Shows distinct combination of states and type of events that led to over 45 direct injuries.

StormEvents
| where InjuriesDirect > 45
| distinct State, EventType

Output

StateEventType
TEXASWinter Weather
KANSASTornado
MISSOURIExcessive Heat
OKLAHOMAThunderstorm Wind
OKLAHOMAExcessive Heat
ALABAMATornado
ALABAMAHeat
TENNESSEEHeat
CALIFORNIAWildfire

If the group by keys are of high cardinalities, try summarize by ... with the shuffle strategy.

15.9 - evaluate plugin operator

Learn how to use the evaluate plugin operator to invoke plugins.

Invokes a service-side query extension (plugin).

The evaluate operator is a tabular operator that allows you to invoke query language extensions known as plugins. Unlike other language constructs, plugins can be enabled or disabled. Plugins aren’t “bound” by the relational nature of the language. In other words, they may not have a predefined, statically determined, output schema.

Syntax

[T |] evaluate [ evaluateParameters ] PluginName ([ PluginArgs ])

Parameters

NameTypeRequiredDescription
TstringA tabular input to the plugin. Some plugins don’t take any input and act as a tabular data source.
evaluateParametersstringZero or more space-separated evaluate parameters in the form of Name = Value that control the behavior of the evaluate operation and execution plan. Each plugin may decide differently how to handle each parameter. Refer to each plugin’s documentation for specific behavior.
PluginNamestring✔️The mandatory name of the plugin being invoked.
PluginArgsstringZero or more comma-separated arguments to provide to the plugin.

Evaluate parameters

The following parameters are supported:

NameValuesDescription
hint.distributionsingle, per_node, per_shardDistribution hints
hint.pass_filterstrue, falseAllow evaluate operator to passthrough any matching filters before the plugin. Filter is considered as ‘matched’ if it refers to a column existing before the evaluate operator. Default: false
hint.pass_filters_columncolumn_nameAllow plugin operator to passthrough filters referring to column_name before the plugin. Parameter can be used multiple times with different column names.

Plugins

The following plugins are supported:

Distribution hints

Distribution hints specify how the plugin execution will be distributed across multiple cluster nodes. Each plugin may implement a different support for the distribution. The plugin’s documentation specifies the distribution options supported by the plugin.

Possible values:

  • single: A single instance of the plugin will run over the entire query data.
  • per_node: If the query before the plugin call is distributed across nodes, then an instance of the plugin will run on each node over the data that it contains.
  • per_shard: If the data before the plugin call is distributed across shards, then an instance of the plugin will run over each shard of the data.

15.10 - extend operator

Learn how to use the extend operator to create calculated columns and append them to the result set.

Creates calculated columns and append them to the result set.

Syntax

T | extend [ColumnName | (ColumnName[, …]) =] Expression [, …]

Parameters

NameTypeRequiredDescription
Tstring✔️Tabular input to extend.
ColumnNamestringName of the column to add or update.
Expressionstring✔️Calculation to perform over the input.
  • If ColumnName is omitted, the output column name of Expression is automatically generated.
  • If Expression returns more than one column, a list of column names can be specified in parentheses. Then, Expression’s output columns is given the specified names. If a list of the column names isn’t specified, all Expression’s output columns with generated names are added to the output.

Returns

A copy of the input tabular result set, such that:

  1. Column names noted by extend that already exist in the input are removed and appended as their new calculated values.
  2. Column names noted by extend that don’t exist in the input are appended as their new calculated values.

not have an index. In most cases, if the new column is set to be exactly the same as an existing table column that has an index, Kusto can automatically use the existing index. However, in some complex scenarios this propagation is not done. In such cases, if the goal is to rename a column, use the project-rename operator instead.

Example

StormEvents
| project EndTime, StartTime
| extend Duration = EndTime - StartTime

The following table shows only the first 10 results. To see the full output, run the query.

EndTimeStartTimeDuration
2007-01-01T00:00:00Z2007-01-01T00:00:00Z00:00:00
2007-01-01T00:25:00Z2007-01-01T00:25:00Z00:00:00
2007-01-01T02:24:00Z2007-01-01T02:24:00Z00:00:00
2007-01-01T03:45:00Z2007-01-01T03:45:00Z00:00:00
2007-01-01T04:35:00Z2007-01-01T04:35:00Z00:00:00
2007-01-01T04:37:00Z2007-01-01T03:37:00Z01:00:00
2007-01-01T05:00:00Z2007-01-01T00:00:00Z05:00:00
2007-01-01T05:00:00Z2007-01-01T00:00:00Z05:00:00
2007-01-01T06:00:00Z2007-01-01T00:00:00Z06:00:00
2007-01-01T06:00:00Z2007-01-01T00:00:00Z06:00:00

15.11 - externaldata operator

Learn how to use the externaldata operator to return a data table of the given schema whose data was parsed from the specified storage artifact.

The externaldata operator returns a table whose schema is defined in the query itself, and whose data is read from an external storage artifact, such as a blob in Azure Blob Storage or a file in Azure Data Lake Storage.

Syntax

externaldata (columnName:columnType [, …] ) [ storageConnectionString [, …] ] [with ( propertyName = propertyValue [, …])]

Parameters

NameTypeRequiredDescription
columnName, columnTypestring✔️A list of column names and their types. This list defines the schema of the table.
storageConnectionStringstring✔️A storage connection string of the storage artifact to query.
propertyName, propertyValuestringA list of optional supported properties that determines how to interpret the data retrieved from storage.

Supported properties

PropertyTypeDescription
formatstringThe data format. If unspecified, an attempt is made to detect the data format from file extension. The default is CSV. All ingestion data formats are supported.
ignoreFirstRecordboolIf set to true, the first record in every file is ignored. This property is useful when querying CSV files with headers.
ingestionMappingstringIndicates how to map data from the source file to the actual columns in the operator result set. See data mappings.

Returns

The externaldata operator returns a data table of the given schema whose data was parsed from the specified storage artifact, indicated by the storage connection string.

Examples

The examples query data in an external storage file.

Fetch a list of user IDs stored in Azure Blob Storage

The following example shows how to find all records in a table whose UserID column falls into a known set of IDs, held (one per line) in an external storage file. Since the data format isn’t specified, the detected data format is TXT.

Users
| where UserID in ((externaldata (UserID:string) [
    @"https://storageaccount.blob.core.windows.net/storagecontainer/users.txt" 
      h@"?...SAS..." // Secret token needed to access the blob
    ]))
| ...

Query multiple data files

The following example queries multiple data files stored in external storage.

externaldata(Timestamp:datetime, ProductId:string, ProductDescription:string)
[
  h@"https://mycompanystorage.blob.core.windows.net/archivedproducts/2019/01/01/part-00000-7e967c99-cf2b-4dbb-8c53-ce388389470d.csv.gz?...SAS...",
  h@"https://mycompanystorage.blob.core.windows.net/archivedproducts/2019/01/02/part-00000-ba356fa4-f85f-430a-8b5a-afd64f128ca4.csv.gz?...SAS...",
  h@"https://mycompanystorage.blob.core.windows.net/archivedproducts/2019/01/03/part-00000-acb644dc-2fc6-467c-ab80-d1590b23fc31.csv.gz?...SAS..."
]
with(format="csv")
| summarize count() by ProductId

The above example can be thought of as a quick way to query multiple data files without defining an external table.

Query hierarchical data formats

To query hierarchical data format, such as JSON, Parquet, Avro, or ORC, ingestionMapping must be specified in the operator properties. In this example, there’s a JSON file stored in Azure Blob Storage with the following contents:

{
  "timestamp": "2019-01-01 10:00:00.238521",   
  "data": {    
    "tenant": "e1ef54a6-c6f2-4389-836e-d289b37bcfe0",   
    "method": "RefreshTableMetadata"   
  }   
}   
{
  "timestamp": "2019-01-01 10:00:01.845423",   
  "data": {   
    "tenant": "9b49d0d7-b3e6-4467-bb35-fa420a25d324",   
    "method": "GetFileList"   
  }   
}
...

To query this file using the externaldata operator, a data mapping must be specified. The mapping dictates how to map JSON fields to the operator result set columns:

externaldata(Timestamp: datetime, TenantId: guid, MethodName: string)
[ 
   h@'https://mycompanystorage.blob.core.windows.net/events/2020/09/01/part-0000046c049c1-86e2-4e74-8583-506bda10cca8.json?...SAS...'
]
with(format='multijson', ingestionMapping='[{"Column":"Timestamp","Properties":{"Path":"$.timestamp"}},{"Column":"TenantId","Properties":{"Path":"$.data.tenant"}},{"Column":"MethodName","Properties":{"Path":"$.data.method"}}]')

The MultiJSON format is used here because single JSON records are spanned into multiple lines.

For more info on mapping syntax, see data mappings.

15.12 - facet operator

Learn how to use the facet operator to return a table for each specified column.

Returns a set of tables, one for each column specified in the facet clause. Each table contains the list of values taken by its column. An additional table can be created by using the with clause. Facet result tables can’t be renamed or referenced by any additional operators.

Syntax

T | facet by ColumnName [, ColumnName2, …] [with ( filterPipe )]

Parameters

NameTypeRequiredDescription
ColumnNamestring✔️The column name, or list of column names, to be summarized.
filterPipestringA query expression applied to the input table.

Returns

Multiple tables: one for the with clause, and one for each column.

Example

StormEvents
| where State startswith "A" and EventType has "Heavy"
| facet by State, EventType
    with 
    (
    where StartTime between(datetime(2007-01-04) .. 7d) 
    | project State, StartTime, Source, EpisodeId, EventType
    | take 5
    )

The following is the table generated by the with clause.

StateStartTimeSourceEpisodeIdEventType
ALASKA2007-01-04 12:00:00.0000000COOP Observer2192Heavy Snow
ALASKA2007-01-04 15:00:00.0000000Trained Spotter2192Heavy Snow
ALASKA2007-01-04 15:00:00.0000000Trained Spotter2192Heavy Snow
ALASKA2007-01-04 15:00:00.0000000Trained Spotter2192Heavy Snow
ALASKA2007-01-06 18:00:00.0000000COOP Observer2193Heavy Snow

The following table is the State facet output table.

Statecount_State
ALABAMA19
ARIZONA33
ARKANSAS1
AMERICAN SAMOA1
ALASKA58

The following table is the EventType facet output table.

EventTypecount_EventType
Heavy Rain34
Heavy Snow78

15.13 - find operator

Learn how to use the find operator to find rows that match a predicate across a set of tables.

Finds rows that match a predicate across a set of tables.

The scope of the find operator can also be cross-database or cross-cluster.

find in (Table1, Table2, Table3) where Fruit=="apple"

find in (database('*').*) where Fruit == "apple"

find in (cluster('cluster_name').database('MyDB*').*) where Fruit == "apple"
find in (Table1, Table2, Table3) where Fruit=="apple"

Syntax

  • find [withsource= ColumnName] [in (Tables)] where Predicate [project-smart | project ColumnName[: ColumnType , … ] [, pack_all()]]

  • find Predicate [project-smart | project ColumnName[: ColumnType , … ] [, pack_all()]]

Parameters

NameTypeRequiredDescription
ColumnNamestringBy default, the output includes a column called source_ whose values indicate which source table contributed to each row. If specified, ColumnName is used instead of source_. After wildcard matching, if the query references tables from more than one database including the default database, the value of this column has a table name qualified with the database. Similarly cluster and database qualifications are present in the value if more than one cluster is referenced.
Predicatebool✔️This boolean expression is evaluated for each row in each input table. For more information, see predicate-syntax details.
TablesstringZero or more comma-separated table references. By default, find looks in all the tables in the current database. You can use:
1. The name of a table, such as Events
2. A query expression, such as `(Events
project-smart or projectstringIf not specified, project-smart is used by default. For more information, see output-schema details.
  • withsource=ColumnName: Optional. By default, the output includes a column called source_ whose values indicate which source table contributed each row. If specified, ColumnName is used instead of source_.

  • Predicate: A boolean expression over the columns of the input tables Table [, Table, …]. It’s evaluated for each row in each input table. For more information, see predicate-syntax details.

  • Tables: Optional. Zero or more comma-separated table references. By default find searches all tables for:

    • The name of a table, such as Events
    • A query expression, such as (Events | where id==42)
    • A set of tables specified with a wildcard. For example, E* would form the union of all the tables whose names begin with E.
  • project-smart | project: If not specified project-smart is used by default. For more information, see output-schema details.

Returns

Transformation of rows in Table [, Table, …] for which Predicate is true. The rows are transformed according to the output schema.

Output schema

source_ column

The find operator output always includes a source_ column with the source table name. The column can be renamed using the withsource parameter.

results columns

Source tables that don’t contain any column used by the predicate evaluation, are filtered out.

When you use project-smart, the columns that appear in the output are:

  • Columns that appear explicitly in the predicate.
  • Columns that are common to all the filtered tables.

The rest of the columns are packed into a property bag and appear in an extra pack column. A column that is referenced explicitly by the predicate and appears in multiple tables with multiple types, has a different column in the result schema for each such type. Each of the column names is constructed from the original column name and type, separated by an underscore.

When using project ColumnName[: ColumnType , … ] [, pack_all()]:

  • The result table includes the columns specified in the list. If a source table doesn’t contain a certain column, the values in the corresponding rows are null.
  • When you specify a ColumnType with a ColumnName, this column in the “result” has the given type, and the values are cast to that type if needed. The casting doesn’t have an effect on the column type when evaluating the Predicate.
  • When pack_all() is used, all the columns, including the projected columns, are packed into a property bag and appear in an extra column, by default ‘column1’. In the property bag, the source column name serves as the property name and the column’s value serves as the property value.

Predicate syntax

The find operator supports an alternative syntax for the * has term, and using just term, searches a term across all input columns.

For a summary of some filtering functions, see where operator.

Considerations

  • If the project clause references a column that appears in multiple tables and has multiple types, a type must follow this column reference in the project clause
  • If a column appears in multiple tables and has multiple types and project-smart is in use, there’s a corresponding column for each type in the find’s result, as described in union
  • When you use project-smart, changes in the predicate, in the source tables set, or in the tables schema, might result in a change to the output schema. If a constant result schema is needed, use project instead
  • find scope can’t include functions. To include a function in the find scope, define a let statement with view keyword.

Performance tips

  • Use tables as opposed to tabular expressions. If tabular expression, the find operator falls back to a union query that can result in degraded performance.
  • If a column that appears in multiple tables and has multiple types, is part of the project clause, prefer adding a ColumnType to the project clause over modifying the table before passing it to find.
  • Add time-based filters to the predicate. Use a datetime column value or ingestion_time().
  • Search in specific columns rather than a full text search.
  • It’s better not to reference columns that appear in multiple tables and have multiple types. If the predicate is valid when resolving such columns type for more than one type, the query falls back to union. For example, see examples of cases where find acts as a union.

Examples

Term lookup across all tables

The query finds all rows from all tables in the current database in which any column includes the word Hernandez. The resulting records are transformed according to the output schema. The output includes rows from the Customers table and the SalesTable table of the ContosoSales database.

find "Hernandez"

Output

This table shows the first three rows of the output.

source_pack_
Customers{“CityName”:“Ballard”,“CompanyName”:“NULL”,“ContinentName”:“North America”,“CustomerKey”:5023,“Education”:“Partial High School”,“FirstName”:“Devin”,“Gender”:“M”,“LastName”:“Hernandez”,“MaritalStatus”:“S”,“Occupation”:“Clerical”,“RegionCountryName”:“United States”,“StateProvinceName”:“Washington”}
Customers{“CityName”:“Ballard”,“CompanyName”:“NULL”,“ContinentName”:“North America”,“CustomerKey”:7814,“Education”:“Partial College”,“FirstName”:“Kristy”,“Gender”:“F”,“LastName”:“Hernandez”,“MaritalStatus”:“S”,“Occupation”:“Professional”,“RegionCountryName”:“United States”,“StateProvinceName”:“Washington”}
Customers{“CityName”:“Ballard”,“CompanyName”:“NULL”,“ContinentName”:“North America”,“CustomerKey”:7888,“Education”:“Partial High School”,“FirstName”:“Kari”,“Gender”:“F”,“LastName”:“Hernandez”,“MaritalStatus”:“S”,“Occupation”:“Clerical”,“RegionCountryName”:“United States”,“StateProvinceName”:“Washington”}

Term lookup across all tables matching a name pattern

The query finds all rows from all tables in the current database whose name starts with C, and in which any column includes the word Hernandez. The resulting records are transformed according to the output schema. Now, the output only contains records from the Customers table.

find in (C*) where * has "Hernandez"

Output

This table shows the first three rows of the output.

source_pack_
ConferenceSessions{“conference”:“Build 2021”,“sessionid”:“CON-PRT103”,“session_title”:“Roundtable: Advanced Kusto query language topics”,“session_type”:“Roundtable”,“owner”:“Avner Aharoni”,“participants”:“Alexander Sloutsky, Tzvia Gitlin-Troyna”,“URL”:“https://sessions.mybuild.microsoft.com/sessions/details/4d4887e9-f08d-4f88-99ac-41e5feb869e7","level":200,"session_location":"Online","starttime":"2021-05-26T08:30:00.0000000Z","duration":60,"time_and_duration":"Wednesday, May 26\n8:30 AM - 9:30 AM GMT”,“kusto_affinity”:“Focused”}
ConferenceSessions{“conference”:“Ignite 2018”,“sessionid”:“THR3115”,“session_title”:“Azure Log Analytics: Deep dive into the Azure Kusto query language. “,“session_type”:“Theater”,“owner”:“Jean Francois Berenguer”,“participants”:””,“URL”:“https://myignite.techcommunity.microsoft.com/sessions/66329","level":300,"session_location":"","starttime":null,"duration":null,"time_and_duration":"","kusto_affinity":"Focused"}
ConferenceSessions{“conference”:“Build 2021”,“sessionid”:“CON-PRT103”,“session_title”:“Roundtable: Advanced Kusto query language topics”,“session_type”:“Roundtable”,“owner”:“Avner Aharoni”,“participants”:“Alexander Sloutsky, Tzvia Gitlin-Troyna”,“URL”:“https://sessions.mybuild.microsoft.com/sessions/details/4d4887e9-f08d-4f88-99ac-41e5feb869e7","level":200,"session_location":"Online","starttime":"2021-05-26T08:30:00.0000000Z","duration":60,"time_and_duration":"Wednesday, May 26\n8:30 AM - 9:30 AM GMT”,“kusto_affinity”:“Focused”}

Term lookup across the cluster

The query finds all rows from all tables in all databases in the cluster in which any column includes the word Kusto. This query is a cross-database query. The resulting records are transformed according to the output schema.

find in (database('*').*) where * has "Kusto"

Output

This table shows the first three rows of the output.

source_pack_
database(“Samples”).ConferenceSessions{“conference”:“Build 2021”,“sessionid”:“CON-PRT103”,“session_title”:“Roundtable: Advanced Kusto query language topics”,“session_type”:“Roundtable”,“owner”:“Avner Aharoni”,“participants”:“Alexander Sloutsky, Tzvia Gitlin-Troyna”,“URL”:“https://sessions.mybuild.microsoft.com/sessions/details/4d4887e9-f08d-4f88-99ac-41e5feb869e7","level":200,"session_location":"Online","starttime":"2021-05-26T08:30:00.0000000Z","duration":60,"time_and_duration":"Wednesday, May 26\n8:30 AM - 9:30 AM GMT”,“kusto_affinity”:“Focused”}
database(“Samples”).ConferenceSessions{“conference”:“Ignite 2018”,“sessionid”:“THR3115”,“session_title”:“Azure Log Analytics: Deep dive into the Azure Kusto query language. “,“session_type”:“Theater”,“owner”:“Jean Francois Berenguer”,“participants”:””,“URL”:“https://myignite.techcommunity.microsoft.com/sessions/66329","level":300,"session_location":"","starttime":null,"duration":null,"time_and_duration":"","kusto_affinity":"Focused"}
database(“Samples”).ConferenceSessions{“conference”:“Build 2021”,“sessionid”:“CON-PRT103”,“session_title”:“Roundtable: Advanced Kusto query language topics”,“session_type”:“Roundtable”,“owner”:“Avner Aharoni”,“participants”:“Alexander Sloutsky, Tzvia Gitlin-Troyna”,“URL”:“https://sessions.mybuild.microsoft.com/sessions/details/4d4887e9-f08d-4f88-99ac-41e5feb869e7","level":200,"session_location":"Online","starttime":"2021-05-26T08:30:00.0000000Z","duration":60,"time_and_duration":"Wednesday, May 26\n8:30 AM - 9:30 AM GMT”,“kusto_affinity”:“Focused”}

Term lookup matching a name pattern in the cluster

The query finds all rows from all tables whose name starts with K in all databases whose name start with B and in which any column includes the word Kusto. The resulting records are transformed according to the output schema.

find in (database("S*").C*) where * has "Kusto"

Output

This table shows the first three rows of the output.

source_pack_
ConferenceSessions{“conference”:“Build 2021”,“sessionid”:“CON-PRT103”,“session_title”:“Roundtable: Advanced Kusto query language topics”,“session_type”:“Roundtable”,“owner”:“Avner Aharoni”,“participants”:“Alexander Sloutsky, Tzvia Gitlin-Troyna”,“URL”:“https://sessions.mybuild.microsoft.com/sessions/details/4d4887e9-f08d-4f88-99ac-41e5feb869e7","level":200,"session_location":"Online","starttime":"2021-05-26T08:30:00.0000000Z","duration":60,"time_and_duration":"Wednesday, May 26\n8:30 AM - 9:30 AM GMT”,“kusto_affinity”:“Focused”}
ConferenceSessions{“conference”:“Build 2021”,“sessionid”:“CON-PRT103”,“session_title”:“Roundtable: Advanced Kusto query language topics”,“session_type”:“Roundtable”,“owner”:“Avner Aharoni”,“participants”:“Alexander Sloutsky, Tzvia Gitlin-Troyna”,“URL”:“https://sessions.mybuild.microsoft.com/sessions/details/4d4887e9-f08d-4f88-99ac-41e5feb869e7","level":200,"session_location":"Online","starttime":"2021-05-26T08:30:00.0000000Z","duration":60,"time_and_duration":"Wednesday, May 26\n8:30 AM - 9:30 AM GMT”,“kusto_affinity”:“Focused”}
ConferenceSessions{“conference”:“Build 2021”,“sessionid”:“CON-PRT103”,“session_title”:“Roundtable: Advanced Kusto query language topics”,“session_type”:“Roundtable”,“owner”:“Avner Aharoni”,“participants”:“Alexander Sloutsky, Tzvia Gitlin-Troyna”,“URL”:“https://sessions.mybuild.microsoft.com/sessions/details/4d4887e9-f08d-4f88-99ac-41e5feb869e7","level":200,"session_location":"Online","starttime":"2021-05-26T08:30:00.0000000Z","duration":60,"time_and_duration":"Wednesday, May 26\n8:30 AM - 9:30 AM GMT”,“kusto_affinity”:“Focused”}

Term lookup in several clusters

The query finds all rows from all tables whose name starts with K in all databases whose name start with B and in which any column includes the word Kusto. The resulting records are transformed according to the output schema.

find in (cluster("cluster1").database("B*").K*, cluster("cluster2").database("C*".*))
where * has "Kusto"

Term lookup across all tables

The query finds all rows from all tables in which any column includes the word Kusto. The resulting records are transformed according to the output schema.

find "Kusto"

Examples of find output results

The following examples show how find can be used over two tables: EventsTable1 and EventsTable2. Assume we have the next content of these two tables:

EventsTable1

Session_IdLevelEventTextVersion
acbd207d-51aa-4df7-bfa7-be70eb68f04eInformationSome Text1v1.0.0
acbd207d-51aa-4df7-bfa7-be70eb68f04eErrorSome Text2v1.0.0
28b8e46e-3c31-43cf-83cb-48921c3986fcErrorSome Text3v1.0.1
8f057b11-3281-45c3-a856-05ebb18a3c59InformationSome Text4v1.1.0

EventsTable2

Session_IdLevelEventTextEventName
f7d5f95f-f580-4ea6-830b-5776c8d64fddInformationSome Other Text1Event1
acbd207d-51aa-4df7-bfa7-be70eb68f04eInformationSome Other Text2Event2
acbd207d-51aa-4df7-bfa7-be70eb68f04eErrorSome Other Text3Event3
15eaeab5-8576-4b58-8fc6-478f75d8fee4ErrorSome Other Text4Event4

Search in common columns, project common, and uncommon columns, and pack the rest

The query searches for specific records in EventsTable1 and EventsTable2 based on a given Session_Id and an Error Level. It then projects three specific columns: EventText, Version, and EventName, and packs all other remaining columns into a dynamic object.

find in (EventsTable1, EventsTable2) 
     where Session_Id == 'acbd207d-51aa-4df7-bfa7-be70eb68f04e' and Level == 'Error' 
     project EventText, Version, EventName, pack_all()

Output

source_EventTextVersionEventNamepack_
EventsTable1Some Text2v1.0.0{“Session_Id”:“acbd207d-51aa-4df7-bfa7-be70eb68f04e”, “Level”:“Error”}
EventsTable2Some Other Text3Event3{“Session_Id”:“acbd207d-51aa-4df7-bfa7-be70eb68f04e”, “Level”:“Error”}

Search in common and uncommon columns

The query searches for records that either have Version as ‘v1.0.0’ or EventName as ‘Event1’, and then it projects (selects) four specific columns: Session_Id, EventText, Version, and EventName from those filtered results.

find Version == 'v1.0.0' or EventName == 'Event1' project Session_Id, EventText, Version, EventName

Output

source_Session_IdEventTextVersionEventName
EventsTable1acbd207d-51aa-4df7-bfa7-be70eb68f04eSome Text1v1.0.0
EventsTable1acbd207d-51aa-4df7-bfa7-be70eb68f04eSome Text2v1.0.0
EventsTable2f7d5f95f-f580-4ea6-830b-5776c8d64fddSome Other Text1Event1

Use abbreviated notation to search across all tables in the current database

This query searches the database for any records with a Session_Id that matches ‘acbd207d-51aa-4df7-bfa7-be70eb68f04e’. It retrieves records from all tables and columns that contain this specific Session_Id.

find Session_Id == 'acbd207d-51aa-4df7-bfa7-be70eb68f04e'

Output

source_Session_IdLevelEventTextpack_
EventsTable1acbd207d-51aa-4df7-bfa7-be70eb68f04eInformationSome Text1{“Version”:“v1.0.0”}
EventsTable1acbd207d-51aa-4df7-bfa7-be70eb68f04eErrorSome Text2{“Version”:“v1.0.0”}
EventsTable2acbd207d-51aa-4df7-bfa7-be70eb68f04eInformationSome Other Text2{“EventName”:“Event2”}
EventsTable2acbd207d-51aa-4df7-bfa7-be70eb68f04eErrorSome Other Text3{“EventName”:“Event3”}

Return the results from each row as a property bag

This query searches the database for records with the specified Session_Id and returns all columns of those records as a single dynamic object.

find Session_Id == 'acbd207d-51aa-4df7-bfa7-be70eb68f04e' project pack_all()

Output

source_pack_
EventsTable1{“Session_Id”:“acbd207d-51aa-4df7-bfa7-be70eb68f04e”, “Level”:“Information”, “EventText”:“Some Text1”, “Version”:“v1.0.0”}
EventsTable1{“Session_Id”:“acbd207d-51aa-4df7-bfa7-be70eb68f04e”, “Level”:“Error”, “EventText”:“Some Text2”, “Version”:“v1.0.0”}
EventsTable2{“Session_Id”:“acbd207d-51aa-4df7-bfa7-be70eb68f04e”, “Level”:“Information”, “EventText”:“Some Other Text2”, “EventName”:“Event2”}
EventsTable2{“Session_Id”:“acbd207d-51aa-4df7-bfa7-be70eb68f04e”, “Level”:“Error”, “EventText”:“Some Other Text3”, “EventName”:“Event3”}

Examples of cases where find acts as union

The find operator in Kusto can sometimes act like a union operator, mainly when it’s used to search across multiple tables.

Using a nontabular expression as find operand

The query first creates a view that filters EventsTable1 to only include error-level records. Then, it searches within this filtered view and the EventsTable2 table for records with a specific Session_Id.

let PartialEventsTable1 = view() { EventsTable1 | where Level == 'Error' };
find in (PartialEventsTable1, EventsTable2) 
     where Session_Id == 'acbd207d-51aa-4df7-bfa7-be70eb68f04e'

Referencing a column that appears in multiple tables and has multiple types

For this example, create two tables by running:

.create tables 
  Table1 (Level:string, Timestamp:datetime, ProcessId:string),
  Table2 (Level:string, Timestamp:datetime, ProcessId:int64)
  • The following query is executed as union.
find in (Table1, Table2) where ProcessId == 1001

The output result schema is (Level:string, Timestamp, ProcessId_string, ProcessId_int).

  • The following query is executed as union, but produces a different result schema.
find in (Table1, Table2) where ProcessId == 1001 project Level, Timestamp, ProcessId:string 

The output result schema is (Level:string, Timestamp, ProcessId_string)

15.14 - fork operator

Learn how to use the fork operator to run multiple consumer operators in parallel.

Runs multiple consumer operators in parallel.

Syntax

T | fork [name=](subquery) [name=](subquery)

Parameters

NameTypeRequiredDescription
subquerystring✔️A downstream pipeline of supported query operators.
namestringA temporary name for the subquery result table.

Supported query operators

Returns

Multiple result tables, one for each of the subquery arguments.

Tips

  • Use materialize as a replacement for join or union on fork legs. The input stream is cached by materialize and then the cached expression can be used in join/union legs.

  • Use batch with materialize of tabular expression statements instead of the fork operator.

Examples

The examples output multiple tables, with named and umnamed columns.

Unnamed subqueries

StormEvents
| where State == "FLORIDA"
| fork
    ( where DeathsDirect + DeathsIndirect > 1)
    ( where InjuriesDirect + InjuriesIndirect > 1)

Output

This output shows the first few rows and columns of the result table.

GenericResult

StartTimeEndTimeEpisodeIdEventIdStateEventTypeInjuriesDirectInjuriesIndirect
2007-02-02T03:17:00Z2007-02-02T03:25:00Z346418948FLORIDATornado100
2007-02-02T03:37:00Z2007-02-02T03:55:00Z346418950FLORIDATornado90
2007-03-13T08:20:00Z2007-03-13T08:20:00Z409422961FLORIDADense Fog30
2007-09-11T15:26:00Z2007-09-11T15:26:00Z957853798FLORIDARip Current00

GenericResult

StartTimeEndTimeEpisodeIdEventIdStateEventTypeInjuriesDirectInjuriesIndirect
2007-02-02T03:10:00Z2007-02-02T03:16:00Z254517515FLORIDATornado150
2007-02-02T03:17:00Z2007-02-02T03:25:00Z346418948FLORIDATornado100
2007-02-02T03:37:00Z2007-02-02T03:55:00Z346418950FLORIDATornado90
2007-02-02T03:55:00Z2007-02-02T04:10:00Z346420318FLORIDATornado420

Named subqueries

In the following examples, the result table is named “StormsWithDeaths” and “StormsWithInjuries”.

StormEvents
| where State == "FLORIDA"
| fork
    (where DeathsDirect + DeathsIndirect > 1 | as StormsWithDeaths)
    (where InjuriesDirect + InjuriesIndirect > 1 | as StormsWithInjuries)
StormEvents
| where State == "FLORIDA"
| fork
    StormsWithDeaths = (where DeathsDirect + DeathsIndirect > 1)
    StormsWithInjuries = (where InjuriesDirect + InjuriesIndirect > 1)

Output

This output shows the first few rows and columns of the result table.

StormsWithDeaths

StartTimeEndTimeEpisodeIdEventIdStateEventTypeInjuriesDirectInjuriesIndirect
2007-02-02T03:17:00Z2007-02-02T03:25:00Z346418948FLORIDATornado100
2007-02-02T03:37:00Z2007-02-02T03:55:00Z346418950FLORIDATornado90
2007-03-13T08:20:00Z2007-03-13T08:20:00Z409422961FLORIDADense Fog30
2007-09-11T15:26:00Z2007-09-11T15:26:00Z957853798FLORIDARip Current00

StormsWithInjuries

StartTimeEndTimeEpisodeIdEventIdStateEventTypeInjuriesDirectInjuriesIndirect
2007-02-02T03:10:00Z2007-02-02T03:16:00Z254517515FLORIDATornado150
2007-02-02T03:17:00Z2007-02-02T03:25:00Z346418948FLORIDATornado100
2007-02-02T03:37:00Z2007-02-02T03:55:00Z346418950FLORIDATornado90
2007-02-02T03:55:00Z2007-02-02T04:10:00Z346420318FLORIDATornado420
SamplePowerRequirementHistorizedData
| fork
    Dataset2 = (where twinId  <> "p_sol_01" | summarize count() by twinId, name)
    Dataset3 = (summarize count() by WeekOfYear = week_of_year(timestamp))

Fork operator

It is possible to use almost all the known features of the KQL language inside every single “sub” result set. For instance, the join operator inside a sub-statement does not work. This is not allowed by the engine.

15.15 - getschema operator

Learn how to use the getschema operator to create a tabular schema of the input.

Produce a table that represents a tabular schema of the input.

Syntax

T | getschema

Example

StormEvents
| getschema

Output

ColumnNameColumnOrdinalDataTypeColumnType
StartTime0System.DateTimedatetime
EndTime1System.DateTimedatetime
EpisodeId2System.Int32int
EventId3System.Int32int
State4System.Stringstring
EventType5System.Stringstring
InjuriesDirect6System.Int32int
InjuriesIndirect7System.Int32int
DeathsDirect8System.Int32int
DeathsIndirect9System.Int32int
DamageProperty10System.Int32int
DamageCrops11System.Int32int
Source12System.Stringstring
BeginLocation13System.Stringstring
EndLocation14System.Stringstring
BeginLat15System.Doublereal
BeginLon16System.Doublereal
EndLat17System.Doublereal
EndLon18System.Doublereal
EpisodeNarrative19System.Stringstring
EventNarrative20System.Stringstring
StormSummary21System.Objectdynamic

15.16 - invoke operator

Learn how to use the invoke operator to invoke a lambda expression that receives the source of invoke as a tabular parameter argument

Invokes a lambda expression that receives the source of invoke as a tabular argument.

Syntax

T | invoke function([param1, param2])

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular source.
functionstring✔️The name of the lambda let expression or stored function name to be evaluated.
param1, param2stringAny additional lambda arguments to pass to the function.

Returns

Returns the result of the evaluated expression.

Example

This example shows how to use the invoke operator to call lambda let expression:

// clipped_average(): calculates percentiles limits, and then makes another 
//                    pass over the data to calculate average with values inside the percentiles
let clipped_average = (T:(x: long), lowPercentile:double, upPercentile:double)
{
   let high = toscalar(T | summarize percentiles(x, upPercentile));
   let low = toscalar(T | summarize percentiles(x, lowPercentile));
   T 
   | where x > low and x < high
   | summarize avg(x) 
};
range x from 1 to 100 step 1
| invoke clipped_average(5, 99)

Output

avg_x
52

15.17 - lookup operator

Learn how to use the lookup operator to extend columns of a fact table.

Extends the columns of a fact table with values looked-up in a dimension table.

For example, the following query results in a table that extends the FactTable ($left) with data from the DimensionTable ($right) by performing a lookup. The lookup matches each pair (CommonColumn, Col1) from FactTable with each pair (CommonColumn, Col2) in the DimensionTable. For the differences between fact and dimension tables, see fact and dimension tables.

FactTable | lookup kind=leftouter (DimensionTable) on CommonColumn, $left.Col1 == $right.Col2

The lookup operator performs an operation similar to the join operator with the following differences:

  • The result doesn’t repeat columns from the $right table that are the basis for the join operation.
  • Only two kinds of lookup are supported, leftouter and inner, with leftouter being the default.
  • In terms of performance, the system by default assumes that the $left table is the larger (facts) table, and the $right table is the smaller (dimensions) table. This is exactly opposite to the assumption used by the join operator.
  • The lookup operator automatically broadcasts the $right table to the $left table (essentially, behaves as if hint.broadcast was specified). This limits the size of the $right table.

Syntax

LeftTable | lookup [kind = (leftouter|inner)] (RightTable) on Attributes

Parameters

NameTypeRequiredDescription
LeftTablestring✔️The table or tabular expression that is the basis for the lookup. Denoted as $left.
RightTablestring✔️The table or tabular expression that is used to “populate” new columns in the fact table. Denoted as $right.
Attributesstring✔️A comma-delimited list of one or more rules that describe how rows from LeftTable are matched to rows from RightTable. Multiple rules are evaluated using the and logical operator. See Rules.
kindstringDetermines how to treat rows in LeftTable that have no match in RightTable. By default, leftouter is used, which means all those rows appear in the output with null values used for the missing values of RightTable columns added by the operator. If inner is used, such rows are omitted from the output. Other kinds of join aren’t supported by the lookup operator.

Rules

Rule kindSyntaxPredicate
Equality by nameColumnNamewhere LeftTable.ColumnName == RightTable.ColumnName
Equality by value$left.LeftColumn == $right.RightColumnwhere $left.LeftColumn == $right.*RightColumn

Returns

A table with:

  • A column for every column in each of the two tables, including the matching keys. The columns of the right side are automatically renamed if there are name conflicts.
  • A row for every match between the input tables. A match is a row selected from one table that has the same value for all the on fields as a row in the other table.
  • The Attributes (lookup keys) appear only once in the output table.
  • If kind is unspecified or kind=leftouter, then in addition to the inner matches, there’s a row for every row on the left (and/or right), even if it has no match. In that case, the unmatched output cells contain nulls.
  • If kind=inner, then there’s a row in the output for every combination of matching rows from left and right.

Example

The following example shows how to perform a left outer join between the FactTable and DimTable, based on matching values in the Personal and Family columns.

let FactTable=datatable(Row:string,Personal:string,Family:string) [
  "1", "Rowan",   "Murphy",
  "2", "Ellis",   "Turner",
  "3", "Ellis",   "Turner",
  "4", "Maya",  "Robinson",
  "5", "Quinn",    "Campbell"
];
let DimTable=datatable(Personal:string,Family:string,Alias:string) [
  "Rowan",  "Murphy",   "rowanm",
  "Ellis",  "Turner", "ellist",
  "Maya", "Robinson", "mayar",
  "Quinn",   "Campbell",    "quinnc"
];
FactTable
| lookup kind=leftouter DimTable on Personal, Family

Output

RowPersonalFamilyAlias
1RowanMurphyrowanm
2EllisTurnerellist
3EllisTurnerellist
4MayaRobinsonmayar
5QuinnCampbellquinnc

15.18 - mv-apply operator

Learn how to use the mv-apply operator to apply a subquery to each record and union the results of each subquery.

Applies a subquery to each record, and returns the union of the results of all subqueries.

For example, assume a table T has a column Metric of type dynamic whose values are arrays of real numbers. The following query locates the two biggest values in each Metric value, and return the records corresponding to these values.

T | mv-apply Metric to typeof(real) on 
(
   top 2 by Metric desc
)

The mv-apply operator has the following processing steps:

  1. Uses the mv-expand operator to expand each record in the input into subtables (order is preserved).
  2. Applies the subquery for each of the subtables.
  3. Adds zero or more columns to the resulting subtable. These columns contain the values of the source columns that aren’t expanded, and are repeated where needed.
  4. Returns the union of the results.

The mv-apply operator gets the following inputs:

  1. One or more expressions that evaluate into dynamic arrays to expand. The number of records in each expanded subtable is the maximum length of each of those dynamic arrays. Null values are added where multiple expressions are specified and the corresponding arrays have different lengths.

  2. Optionally, the names to assign the values of the expressions after expansion. These names become the columns names in the subtables. If not specified, the original name of the column is used when the expression is a column reference. A random name is used otherwise.

    [!NOTE] It is recommended to use the default column names.

  3. The data types of the elements of those dynamic arrays, after expansion. These become the column types of the columns in the subtables. If not specified, dynamic is used.

  4. Optionally, the name of a column to add to the subtables that specifies the 0-based index of the element in the array that resulted in the subtable record.

  5. Optionally, the maximum number of array elements to expand.

The mv-apply operator can be thought of as a generalization of the mv-expand operator (in fact, the latter can be implemented by the former, if the subquery includes only projections.)

Syntax

T | mv-apply [ItemIndex] ColumnsToExpand [RowLimit] on ( SubQuery )

Where ItemIndex has the syntax:

with_itemindex = IndexColumnName

ColumnsToExpand is a comma-separated list of one or more elements of the form:

[Name =] ArrayExpression [to typeof (Typename)]

RowLimit is simply:

limit RowLimit

and SubQuery has the same syntax of any query statement.

Parameters

NameTypeRequiredDescription
ItemIndexstringIndicates the name of a column of type long that’s appended to the input as part of the array-expansion phase and indicates the 0-based array index of the expanded value.
NamestringThe name to assign the array-expanded values of each array-expanded expression. If not specified, the name of the column is used if available. A random name is generated if ArrayExpression isn’t a simple column name.
ArrayExpressiondynamic✔️The array whose values are array-expanded. If the expression is the name of a column in the input, the input column is removed from the input and a new column of the same name, or ColumnName if specified, appears in the output.
TypenamestringThe name of the type that the individual elements of the dynamic array ArrayExpression take. Elements that don’t conform to this type are replaced by a null value. If unspecified, dynamic is used by default.
RowLimitintA limit on the number of records to generate from each record of the input. If unspecified, 2147483647 is used.
SubQuerystringA tabular query expression with an implicit tabular source that gets applied to each array-expanded subtable.

Examples

Review the examples and run them in your Data Explorer query page.

Getting the largest element from the array

The query outputs the smallest even number (2) and the smallest odd number (1).

let _data =
    range x from 1 to 8 step 1
    | summarize l=make_list(x) by xMod2 = x % 2;
_data
| mv-apply element=l to typeof(long) on 
    (
    top 1 by element
    )

Output

xMod2lelement
1[1, 3, 5, 7]7
0[2, 4, 6, 8]8

Calculating the sum of the largest two elements in an array

The query outputs the sum of the top 2 even numbers (6 + 8 = 14) and the sum of the top 2 odd numbers (5 + 7 = 12).

let _data =
    range x from 1 to 8 step 1
    | summarize l=make_list(x) by xMod2 = x % 2;
_data
| mv-apply l to typeof(long) on
    (
    top 2 by l
    | summarize SumOfTop2=sum(l)
    )

Output

xMod2lSumOfTop2
1[1,3,5,7]12
0[2,4,6,8]14

Select elements in arrays

The query identifies the top 2 elements from each dynamic array based on the Arr2 values and summarizes them into new lists.

datatable (Val:int, Arr1:dynamic, Arr2:dynamic)
[ 1, dynamic(['A1', 'A2', 'A3']),       dynamic([10, 30, 7]), 
  7, dynamic(['B1', 'B2', 'B5']),       dynamic([15, 11, 50]),
  3, dynamic(['C1', 'C2', 'C3', 'C4']), dynamic([6, 40, 20, 8])
] 
| mv-apply NewArr1=Arr1, NewArr2=Arr2 to typeof(long) on (
 top 2 by NewArr2
 | summarize NewArr1=make_list(NewArr1), NewArr2=make_list(NewArr2)
)

Output

Val1Arr1Arr2NewArr1NewArr2
1[“A1”,“A2”,“A3”][10,30,7][“A2’,“A1”][30,10]
7[“B1”,“B2”,“B5”][15,11,50][“B5”,“B1”][50,15]
3[“C1”,“C2”,“C3”,“C4”][6,40,20,8][“C2”,“C3”][40,20]

Using with_itemindex for working with a subset of the array

The query results in a table with rows where the index is 3 or greater, including the index and element values from the original lists of even and odd numbers.

let _data =
    range x from 1 to 10 step 1
    | summarize l=make_list(x) by xMod2 = x % 2;
_data
| mv-apply with_itemindex=index element=l to typeof(long) on 
    (
    // here you have 'index' column
    where index >= 3
    )
| project index, element

Output

indexelement
37
49
38
410

Using mutiple columns to join element of 2 arrays

The query combines elements from two dynamic arrays into a new concatenated format and then summarizes them into lists.

datatable (Val: int, Arr1: dynamic, Arr2: dynamic)
[
    1, dynamic(['A1', 'A2', 'A3']), dynamic(['B1', 'B2', 'B3']), 
    5, dynamic(['C1', 'C2']), dynamic(['D1', 'D2'])
] 
| mv-apply Arr1, Arr2 on (
    extend Out = strcat(Arr1, "_", Arr2)
    | summarize Arr1 = make_list(Arr1), Arr2 = make_list(Arr2), Out= make_list(Out)
    )

Output

ValArr1Arr2Out
1[“A1”,“A2”,“A3”][“B1”,“B2”,“B3”][“A1_B1”,“A2_B2”,“A3_B3”]
5[“C1”,“C2”][“D1”,“D2”][“C1_D1”,“C2_D2”]

Applying mv-apply to a property bag

This query dynamically removes properties from the packed values object based on the criteria that their values do not start with “555”. The final result contains the original columns with unwanted properties removed.

datatable(SourceNumber: string, TargetNumber: string, CharsCount: long)
[
    '555-555-1234', '555-555-1212', 46,
    '555-555-1212', '', int(null)
]
| extend values = pack_all()
| mv-apply removeProperties = values on 
    (
    mv-expand kind = array values
    | where values[1] !startswith "555"
    | summarize propsToRemove = make_set(values[0])
    )
| extend values = bag_remove_keys(values, propsToRemove)
| project-away propsToRemove

Output

SourceNumberTargetNumberCharsCountvalues
555-555-1234555-555-121246{
“SourceNumber”: “555-555-1234”,
“TargetNumber”: “555-555-1212”
}
555-555-1212  {
“SourceNumber”: “555-555-1212”
}

15.19 - mv-expand operator

Learn how to use the mv-expand operator to expand multi-value dynamic arrays or property bags into multiple records.

Expands multi-value dynamic arrays or property bags into multiple records.

mv-expand can be described as the opposite of the aggregation operators that pack multiple values into a single dynamic-typed array or property bag, such as summarizemake-list() and make-series. Each element in the (scalar) array or property bag generates a new record in the output of the operator. All columns of the input that aren’t expanded are duplicated to all the records in the output.

Syntax

T |mv-expand [kind=(bag | array)] [with_itemindex= IndexColumnName] ColumnName [to typeof( Typename)] [, ColumnName …] [limit Rowlimit]

T |mv-expand [kind=(bag | array)] [Name =] ArrayExpression [to typeof(Typename)] [, [Name =] ArrayExpression [to typeof(Typename)] …] [limit Rowlimit]

Parameters

NameTypeRequiredDescription
ColumnName, ArrayExpressionstring✔️A column reference, or a scalar expression with a value of type dynamic that holds an array or a property bag. The individual top-level elements of the array or property bag get expanded into multiple records.
When ArrayExpression is used and Name doesn’t equal any input column name, the expanded value is extended into a new column in the output. Otherwise, the existing ColumnName is replaced.
NamestringA name for the new column.
Typenamestring✔️Indicates the underlying type of the array’s elements, which becomes the type of the column produced by the mv-expand operator. The operation of applying type is cast-only and doesn’t include parsing or type-conversion. Array elements that don’t conform with the declared type become null values.
RowLimitintThe maximum number of rows generated from each original row. The default is 2147483647. mvexpand is a legacy and obsolete form of the operator mv-expand. The legacy version has a default row limit of 128.
IndexColumnNamestringIf with_itemindex is specified, the output includes another column named IndexColumnName that contains the index starting at 0 of the item in the original expanded collection.

Returns

For each record in the input, the operator returns zero, one, or many records in the output, as determined in the following way:

  1. Input columns that aren’t expanded appear in the output with their original value. If a single input record is expanded into multiple output records, the value is duplicated to all records.

  2. For each ColumnName or ArrayExpression that is expanded, the number of output records is determined for each value as explained in modes of expansion. For each input record, the maximum number of output records is calculated. All arrays or property bags are expanded “in parallel” so that missing values (if any) are replaced by null values. Elements are expanded into rows in the order that they appear in the original array/bag.

  3. If the dynamic value is null, then a single record is produced for that value (null). If the dynamic value is an empty array or property bag, no record is produced for that value. Otherwise, as many records are produced as there are elements in the dynamic value.

The expanded columns are of type dynamic, unless they’re explicitly typed by using the to typeof() clause.

Modes of expansion

Two modes of property bag expansions are supported:

  • kind=bag or bagexpansion=bag: Property bags are expanded into single-entry property bags. This mode is the default mode.
  • kind=array or bagexpansion=array: Property bags are expanded into two-element [key,value] array structures, allowing uniform access to keys and values. This mode also allows, for example, running a distinct-count aggregation over property names.

Examples

The examples in this section show how to use the syntax to help you get started.

Single column - array expansion

datatable (a: int, b: dynamic)
[
    1, dynamic([10, 20]),
    2, dynamic(['a', 'b'])
]
| mv-expand b

Output

ab
110
120
2a
2b

Single column - bag expansion

A simple expansion of a single column:

datatable (a: int, b: dynamic)
[
    1, dynamic({"prop1": "a1", "prop2": "b1"}),
    2, dynamic({"prop1": "a2", "prop2": "b2"})
]
| mv-expand b

Output

ab
1{“prop1”: “a1”}
1{“prop2”: “b1”}
2{“prop1”: “a2”}
2{“prop2”: “b2”}

Single column - bag expansion to key-value pairs

A simple bag expansion to key-value pairs:

datatable (a: int, b: dynamic)
[
    1, dynamic({"prop1": "a1", "prop2": "b1"}),
    2, dynamic({"prop1": "a2", "prop2": "b2"})
]
| mv-expand kind=array b 
| extend key = b[0], val=b[1]

Output

abkeyval
1[“prop1”,“a1”]prop1a1
1[“prop2”,“b1”]prop2b1
2[“prop1”,“a2”]prop1a2
2[“prop2”,“b2”]prop2b2

Zipped two columns

Expanding two columns will first ‘zip’ the applicable columns and then expand them:

datatable (a: int, b: dynamic, c: dynamic)[
    1, dynamic({"prop1": "a", "prop2": "b"}), dynamic([5, 4, 3])
]
| mv-expand b, c

Output

abc
1{“prop1”:“a”}5
1{“prop2”:“b”}4
13

Cartesian product of two columns

If you want to get a Cartesian product of expanding two columns, expand one after the other:

datatable (a: int, b: dynamic, c: dynamic)
[
    1, dynamic({"prop1": "a", "prop2": "b"}), dynamic([5, 6])
]
| mv-expand b
| mv-expand c

Output

abc
1{ “prop1”: “a”}5
1{ “prop1”: “a”}6
1{ “prop2”: “b”}5
1{ “prop2”: “b”}6

Convert output

To force the output of an mv-expand to a certain type (default is dynamic), use to typeof:

datatable (a: string, b: dynamic, c: dynamic)[
    "Constant", dynamic([1, 2, 3, 4]), dynamic([6, 7, 8, 9])
]
| mv-expand b, c to typeof(int)
| getschema 

Output

ColumnNameColumnOrdinalDateTypeColumnType
a0System.Stringstring
b1System.Objectdynamic
c2System.Int32int

Notice column b is returned as dynamic while c is returned as int.

Using with_itemindex

Expansion of an array with with_itemindex:

range x from 1 to 4 step 1
| summarize x = make_list(x)
| mv-expand with_itemindex=Index x

Output

xIndex
10
21
32
43

15.20 - parse operator

Learn how to use the parse operator to parse the value of a string expression into one or more calculated columns.

Evaluates a string expression and parses its value into one or more calculated columns. The calculated columns return null values for unsuccessfully parsed strings. If there’s no need to use rows where parsing doesn’t succeed, prefer using the parse-where operator.

Syntax

T | parse [ kind=kind [ flags=regexFlags ]] expression with [ * ] stringConstant columnName [: columnType] [ * ] ,

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input to parse.
kindstring✔️One of the supported kind values. The default value is simple.
regexFlagsstringIf kind is regex, then you can specify regex flags to be used like U for ungreedy, m for multi-line mode, s for match new line \n, and i for case-insensitive. More flags can be found in Flags.
expressionstring✔️An expression that evaluates to a string.
stringConstantstring✔️A string constant for which to search and parse.
columnNamestring✔️The name of a column to assign a value to, extracted from the string expression.
columnTypestringThe scalar value that indicates the type to convert the value to. The default is string.

Supported kind values

TextDescription
simpleThis is the default value. stringConstant is a regular string value and the match is strict. All string delimiters should appear in the parsed string, and all extended columns must match the required types.
regexstringConstant can be a regular expression and the match is strict. All string delimiters, which can be a regex for this mode, should appear in the parsed string, and all extended columns must match the required types.
relaxedstringConstant is a regular string value and the match is relaxed. All string delimiters should appear in the parsed string, but extended columns might partially match the required types. Extended columns that didn’t match the required types get the value null.

Regex mode

In regex mode, parse translates the pattern to a regex. Use regular expressions to do the matching and use numbered captured groups that are handled internally. For example:

parse kind=regex Col with * <regex1> var1:string <regex2> var2:long

In the parse statement, the regex internally generated by the parse is .*?<regex1>(.*?)<regex2>(\-\d+).

  • * was translated to .*?.

  • string was translated to .*?.

  • long was translated to \-\d+.

Returns

The input table extended according to the list of columns that are provided to the operator.

Examples

The examples in this section show how to use the syntax to help you get started.

The parse operator provides a streamlined way to extend a table by using multiple extract applications on the same string expression. This result is useful, when the table has a string column that contains several values that you want to break into individual columns. For example, a column that’s produced by a developer trace ("printf"/"Console.WriteLine") statement.

Parse and extend results

In the following example, the column EventText of table Traces contains strings of the form Event: NotifySliceRelease (resourceName={0}, totalSlices={1}, sliceNumber={2}, lockTime={3}, releaseTime={4}, previousLockTime={5}). The operation extends the table with six columns: resourceName, totalSlices, sliceNumber, lockTime, releaseTime, and previousLockTime.

let Traces = datatable(EventText: string)
    [
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=23, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=02/17/2016 08:40:00, previousLockTime=02/17/2016 08:39:00)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces  
| parse EventText with * "resourceName=" resourceName ", totalSlices=" totalSlices: long * "sliceNumber=" sliceNumber: long * "lockTime=" lockTime ", releaseTime=" releaseTime: date "," * "previousLockTime=" previousLockTime: date ")" *  
| project resourceName, totalSlices, sliceNumber, lockTime, releaseTime, previousLockTime

Output

resourceNametotalSlicessliceNumberlockTimereleaseTimepreviousLockTime
PipelineScheduler271502/17/2016 08:40:002016-02-17 08:40:00.00000002016-02-17 08:39:00.0000000
PipelineScheduler272302/17/2016 08:40:012016-02-17 08:40:01.00000002016-02-17 08:39:01.0000000
PipelineScheduler272002/17/2016 08:40:012016-02-17 08:40:01.00000002016-02-17 08:39:01.0000000
PipelineScheduler271602/17/2016 08:41:002016-02-17 08:41:00.00000002016-02-17 08:40:00.0000000
PipelineScheduler272202/17/2016 08:41:012016-02-17 08:41:00.00000002016-02-17 08:40:01.0000000

Extract email alias and DNS

In the following example, entries from the Contacts table are parsed to extract the alias and domain from an email address, and the domain from a website URL. The query returns the EmailAddress, EmailAlias, and WebsiteDomain columns, where the fullEmail column combines the parsed email aliases and domains.

let Leads=datatable(Contacts: string)
    [
    "Event: LeadContact (email=john@contosohotel.com, Website=https:contosohotel.com)",
	"Event: LeadContact (email=abi@fourthcoffee.com, Website=https:www.fourthcoffee.com)",
	"Event: LeadContact (email=nevena@treyresearch.com, Website=https:treyresearch.com)",
	"Event: LeadContact (email=faruk@tailspintoys.com, Website=https:tailspintoys.com)",
	"Event: LeadContact (email=ebere@relecloud.com, Website=https:relecloud.com)",
];
Leads
| parse Contacts with * "email=" alias:string "@" domain: string ", Website=https:" WebsiteDomain: string ")"
| project EmailAddress=strcat(alias, "@", domain), EmailAlias=alias, WebsiteDomain

Output

EmailAddressEmailAliasWebsiteDomain
nevena@treyresearch.comnevenatreyresearch.com
john@contosohotel.comjohncontosohotel.com
faruk@tailspintoys.comfaruktailspintoys.com
ebere@relecloud.comebererelecloud.com
abi@fourthcoffee.comabiwww.fourthcoffee.com

Regex mode

In the following example, regular expressions are used to parse and extract data from the EventText column. The extracted data is projected into new fields.

let Traces=datatable(EventText: string)
    [
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=23, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=02/17/2016 08:40:00, previousLockTime=02/17/2016 08:39:00)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces  
| parse kind=regex EventText with "(.*?)[a-zA-Z]*=" resourceName @", totalSlices=\s*\d+\s*.*?sliceNumber=" sliceNumber: long  ".*?(previous)?lockTime=" lockTime ".*?releaseTime=" releaseTime ".*?previousLockTime=" previousLockTime: date "\\)"  
| project resourceName, sliceNumber, lockTime, releaseTime, previousLockTime

Output

resourceNamesliceNumberlockTimereleaseTimepreviousLockTime
PipelineScheduler1502/17/2016 08:40:00,02/17/2016 08:40:00,2016-02-17 08:39:00.0000000
PipelineScheduler2302/17/2016 08:40:01,02/17/2016 08:40:01,2016-02-17 08:39:01.0000000
PipelineScheduler2002/17/2016 08:40:01,02/17/2016 08:40:01,2016-02-17 08:39:01.0000000
PipelineScheduler1602/17/2016 08:41:00,02/17/2016 08:41:00,2016-02-17 08:40:00.0000000
PipelineScheduler2202/17/2016 08:41:01,02/17/2016 08:41:00,2016-02-17 08:40:01.0000000

Regex mode with regex flags

In the following example resourceName is extracted.

let Traces=datatable(EventText: string)
    [
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=23, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=02/17/2016 08:40:00, previousLockTime=02/17/2016 08:39:00)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces
| parse kind=regex EventText with * "resourceName=" resourceName ',' *
| project resourceName

Output

resourceName
PipelineScheduler, totalSlices=27, sliceNumber=23, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01
PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=02/17/2016 08:40:00
PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01
PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00
PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00

If there are records where resourceName sometimes appears as lower-case and sometimes as upper-case, you might get nulls for some values.

The results in the previous example are unexpected, and include full event data since the default mode is greedy. To extract only resourceName, run the previous query with the non-greedy U, and disable case-sensitive i regex flags.

let Traces=datatable(EventText: string)
    [
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=23, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=02/17/2016 08:40:00, previousLockTime=02/17/2016 08:39:00)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces
| parse kind=regex flags=Ui EventText with * "RESOURCENAME=" resourceName ',' *
| project resourceName

Output

resourceName
PipelineScheduler
PipelineScheduler
PipelineScheduler
PipelineScheduler
PipelineScheduler

If the parsed string has newlines, use the flag s, to parse the text.

let Traces=datatable(EventText: string)
    [
    "Event: NotifySliceRelease (resourceName=PipelineScheduler\ntotalSlices=27\nsliceNumber=23\nlockTime=02/17/2016 08:40:01\nreleaseTime=02/17/2016 08:40:01\npreviousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler\ntotalSlices=27\nsliceNumber=15\nlockTime=02/17/2016 08:40:00\nreleaseTime=02/17/2016 08:40:00\npreviousLockTime=02/17/2016 08:39:00)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler\ntotalSlices=27\nsliceNumber=20\nlockTime=02/17/2016 08:40:01\nreleaseTime=02/17/2016 08:40:01\npreviousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler\ntotalSlices=27\nsliceNumber=22\nlockTime=02/17/2016 08:41:01\nreleaseTime=02/17/2016 08:41:00\npreviousLockTime=02/17/2016 08:40:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler\ntotalSlices=27\nsliceNumber=16\nlockTime=02/17/2016 08:41:00\nreleaseTime=02/17/2016 08:41:00\npreviousLockTime=02/17/2016 08:40:00)"
];
Traces
| parse kind=regex flags=s EventText with * "resourceName=" resourceName: string "(.*?)totalSlices=" totalSlices: long "(.*?)lockTime=" lockTime: datetime "(.*?)releaseTime=" releaseTime: datetime "(.*?)previousLockTime=" previousLockTime: datetime "\\)" 
| project-away EventText

Output

resourceNametotalSliceslockTimereleaseTimepreviousLockTime
PipelineScheduler
272016-02-17 08:40:00.00000002016-02-17 08:40:00.00000002016-02-17 08:39:00.0000000
PipelineScheduler
272016-02-17 08:40:01.00000002016-02-17 08:40:01.00000002016-02-17 08:39:01.0000000
PipelineScheduler
272016-02-17 08:40:01.00000002016-02-17 08:40:01.00000002016-02-17 08:39:01.0000000
PipelineScheduler
272016-02-17 08:41:00.00000002016-02-17 08:41:00.00000002016-02-17 08:40:00.0000000
PipelineScheduler
272016-02-17 08:41:01.00000002016-02-17 08:41:00.00000002016-02-17 08:40:01.0000000

Relaxed mode

In the following relaxed mode example, the extended column totalSlices must be of type long. However, in the parsed string, it has the value nonValidLongValue. For the extended column, releaseTime, the value nonValidDateTime can’t be parsed as datetime. These two extended columns result in null values while the other columns, such as sliceNumber, still result in the correct values.

If you use option kind = simple for the following query, you get null results for all extended columns. This option is strict on extended columns, and is the difference between relaxed and simple mode.

let Traces=datatable(EventText: string)
    [
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=23, lockTime=02/17/2016 08:40:01, releaseTime=nonValidDateTime 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=nonValidDateTime, previousLockTime=02/17/2016 08:39:00)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=nonValidLongValue, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=nonValidDateTime 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=nonValidLongValue, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces
| parse kind=relaxed EventText with * "resourceName=" resourceName ", totalSlices=" totalSlices: long ", sliceNumber=" sliceNumber: long * "lockTime=" lockTime ", releaseTime=" releaseTime: date "," * "previousLockTime=" previousLockTime: date ")" *
| project-away EventText

Output

resourceNametotalSlicessliceNumberlockTimereleaseTimepreviousLockTime
PipelineScheduler271502/17/2016 08:40:002016-02-17 08:39:00.0000000
PipelineScheduler272302/17/2016 08:40:012016-02-17 08:39:01.0000000
PipelineScheduler2002/17/2016 08:40:012016-02-17 08:39:01.0000000
PipelineScheduler1602/17/2016 08:41:002016-02-17 08:41:00.00000002016-02-17 08:40:00.0000000
PipelineScheduler272202/17/2016 08:41:012016-02-17 08:41:00.00000002016-02-17 08:40:01.0000000

15.21 - parse-kv operator

Learn how to use the parse-kv operator to represent structured information extracted from a string expression in a key/value form.

Extracts structured information from a string expression and represents the information in a key/value form.

The following extraction modes are supported:

Syntax

Specified delimiter

T | parse-kv Expression as ( KeysList ) with ( pair_delimiter = PairDelimiter , kv_delimiter = KvDelimiter [, quote = QuoteChars … [, escape = EscapeChar …]] [, greedy = true] )

Nonspecified delimiter

T | parse-kv Expression as ( KeysList ) with ( [quote = QuoteChars … [, escape = EscapeChar …]] )

Regex

T | parse-kv Expression as ( KeysList ) with ( regex = RegexPattern) )

Parameters

NameTypeRequiredDescription
Expressionstring✔️The expression from which to extract key values.
KeysListstring✔️A comma-separated list of key names and their value data types. The order of the keys doesn’t have to match the order in which they appear in the text.
PairDelimiterstringA delimiter that separates key value pairs from each other.
KvDelimiterstringA delimiter that separates keys from values.
QuoteCharsstringA one- or two-character string literal representing opening and closing quotes that key name or the extracted value may be wrapped with. The parameter can be repeated to specify a separate set of opening/closing quotes.
EscapeCharstringA one-character string literal describing a character that may be used for escaping special characters in a quoted value. The parameter can be repeated if multiple escape characters are used.
RegexPatternstringA regular expression containing two capturing groups exactly. The first group represents the key name, and the second group represents the key value.

Returns

The original input tabular expression T, extended with columns per specified keys to extract.

Examples

The examples in this section show how to use the syntax to help you get started.

Extraction with well-defined delimiters

In this query, keys and values are separated by well defined delimiters. These delimeters are comma and colon characters.

print str="ThreadId:458745723, Machine:Node001, Text: The service is up, Level: Info"
| parse-kv str as (Text: string, ThreadId:long, Machine: string) with (pair_delimiter=',', kv_delimiter=':')
| project-away str

Output

TextThreadIdMachine
The service is up458745723Node001

Extraction with value quoting

Sometimes key names or values are wrapped in quotes, which allow the values themselves to contain delimiter characters. The following examples show how a quote argument is used for extracting such values.

print str='src=10.1.1.123 dst=10.1.1.124 bytes=125 failure="connection aborted" "event time"=2021-01-01T10:00:54'
| parse-kv str as (['event time']:datetime, src:string, dst:string, bytes:long, failure:string) with (pair_delimiter=' ', kv_delimiter='=', quote='"')
| project-away str

Output

event timesrcdstbytesfailure
2021-01-01 10:00:54.000000010.1.1.12310.1.1.124125connection aborted

This query uses different opening and closing quotes:

print str='src=10.1.1.123 dst=10.1.1.124 bytes=125 failure=(connection aborted) (event time)=(2021-01-01 10:00:54)'
| parse-kv str as (['event time']:datetime, src:string, dst:string, bytes:long, failure:string) with (pair_delimiter=' ', kv_delimiter='=', quote='()')
| project-away str

Output

event timesrcdstbytesfailure
2021-01-01 10:00:54.000000010.1.1.12310.1.1.124125connection aborted

The values themselves may contain properly escaped quote characters, as the following example shows:

print str='src=10.1.1.123 dst=10.1.1.124 bytes=125 failure="the remote host sent \\"bye!\\"" time=2021-01-01T10:00:54'
| parse-kv str as (['time']:datetime, src:string, dst:string, bytes:long, failure:string) with (pair_delimiter=' ', kv_delimiter='=', quote='"', escape='\\')
| project-away str

Output

timesrcdstbytesfailure
2021-01-01 10:00:54.000000010.1.1.12310.1.1.124125the remote host sent “bye!”

Extraction in greedy mode

There are cases when unquoted values may contain pair delimiters. In this case, use the greedy mode to indicate to the operator to scan until the next key appearance (or end of string) when looking for the value ending.

The following examples compare how the operator works with and without the greedy mode specified:

print str='name=John Doe phone=555 5555 city=New York'
| parse-kv str as (name:string, phone:string, city:string) with (pair_delimiter=' ', kv_delimiter='=')
| project-away str

Output

namephonecity
John555New
print str='name=John Doe phone=555 5555 city=New York'
| parse-kv str as (name:string, phone:string, city:string) with (pair_delimiter=' ', kv_delimiter='=', greedy=true)
| project-away str

Output

namephonecity
John Doe555 5555New York

Extraction with no well-defined delimiters

In the following example, any nonalphanumeric character is considered a valid delimiter:

print str="2021-01-01T10:00:34 [INFO] ThreadId:458745723, Machine:Node001, Text: Started"
| parse-kv str as (Text: string, ThreadId:long, Machine: string)
| project-away str

Output

TextThreadIdMachine
Started458745723Node001

Values quoting and escaping is allowed in this mode as shown in the following example:

print str="2021-01-01T10:00:34 [INFO] ThreadId:458745723, Machine:Node001, Text: 'The service \\' is up'"
| parse-kv str as (Text: string, ThreadId:long, Machine: string) with (quote="'", escape='\\')
| project-away str

Output

TextThreadIdMachine
The service ’ is up458745723Node001

Extraction using regex

When no delimiters define text structure enough, regular expression-based extraction can be useful.

print str=@'["referer url: https://hostname.com/redirect?dest=/?h=1234", "request url: https://hostname.com/?h=1234", "advertiser id: 24fefbca-cf27-4d62-a623-249c2ad30c73"]'
| parse-kv str as (['referer url']:string, ['request url']:string, ['advertiser id']: guid) with (regex=@'"([\w ]+)\s*:\s*([^"]*)"')
| project-away str

Output

referer urlrequest urladvertiser id
https://hostname.com/redirect?dest=/?h=1234https://hostname.com/?h=123424fefbca-cf27-4d62-a623-249c2ad30c73

15.22 - parse-where operator

Learn how to use the parse-where operator to parse the value of a string expression into one or more calculated columns.

Evaluates a string expression, and parses its value into one or more calculated columns. The result is only the successfully parsed strings.

parse-where parses the strings in the same way as parse, and filters out strings that were not parsed successfully.

See parse operator, which produces nulls for unsuccessfully parsed strings.

Syntax

T | parse-where [kind=kind [flags= regexFlags]] expression with * (stringConstant columnName [: columnType]) *

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input to parse.
kindstring✔️One of the supported kind values. The default value is simple.
regexFlagsstringIf kind is regex, then you can specify regex flags to be used like U for ungreedy, m for multi-line mode, s for match new line \n, and i for case-insensitive. More flags can be found in Flags.
expressionstring✔️An expression that evaluates to a string.
stringConstantstring✔️A string constant for which to search and parse.
columnNamestring✔️The name of a column to assign a value to, extracted from the string expression.
columnTypestringThe scalar value that indicates the type to convert the value to. The default is the string.

Supported kind values

TextDescription
simpleThis is the default value. stringConstant is a regular string value and the match is strict. All string delimiters should appear in the parsed string, and all extended columns must match the required types.
regexstringConstant may be a regular expression and the match is strict. All string delimiters, which can be a regex for this mode, should appear in the parsed string, and all extended columns must match the required types.

Regex mode

In regex mode, parse will translate the pattern to a regex and use regular expressions in order to do the matching using numbered captured groups that are handled internally. For example:

parse-where kind=regex Col with * <regex1> var1:string <regex2> var2:long

The regex that will be generated by the parse internally is .*?<regex1>(.*?)<regex2>(\-\d+).

  • * was translated to .*?.
  • string was translated to .*?.
  • long was translated to \-\d+.

Returns

The input table, which is extended according to the list of columns that are provided to the operator.

Examples

The examples in this section show how to use the syntax to help you get started.

The parse-where operator provides a streamlined way to extend a table by using multiple extract applications on the same string expression. This is most useful when the table has a string column that contains several values that you want to break into individual columns. For example, you can break up a column that was produced by a developer trace ("printf"/"Console.WriteLine") statement.

Using parse

In the example below, the column EventText of table Traces contains strings of the form Event: NotifySliceRelease (resourceName={0}, totalSlices= {1}, sliceNumber={2}, lockTime={3}, releaseTime={4}, previousLockTime={5}). The operation below will extend the table with six columns: resourceName , totalSlices, sliceNumber, lockTime, releaseTime, previousLockTime, Month, and Day.

A few of the strings don’t have a full match.

Using parse, the calculated columns will have nulls.

let Traces = datatable(EventText: string)
    [
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=invalid_number, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=invalid_datetime, previousLockTime=02/17/2016 08:39:00)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=invalid_number, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces  
| parse EventText with * "resourceName=" resourceName ", totalSlices=" totalSlices: long * "sliceNumber=" sliceNumber: long * "lockTime=" lockTime ", releaseTime=" releaseTime: date "," * "previousLockTime=" previouLockTime: date ")" *  
| project
    resourceName,
    totalSlices,
    sliceNumber,
    lockTime,
    releaseTime,
    previouLockTime

Output

resourceNametotalSlicessliceNumberlockTimereleaseTimepreviousLockTime
PipelineScheduler272002/17/2016 08:40:012016-02-17 08:40:01.00000002016-02-17 08:39:01.0000000
PipelineScheduler272202/17/2016 08:41:012016-02-17 08:41:00.00000002016-02-17 08:40:01.0000000

Using parse-where

Using ‘parse-where’ will filter-out unsuccessfully parsed strings from the result.

let Traces = datatable(EventText: string)
    [
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=invalid_number, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=invalid_datetime, previousLockTime=02/17/2016 08:39:00)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=invalid_number, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces  
| parse-where EventText with * "resourceName=" resourceName ", totalSlices=" totalSlices: long * "sliceNumber=" sliceNumber: long * "lockTime=" lockTime ", releaseTime=" releaseTime: date "," * "previousLockTime=" previousLockTime: date ")" *  
| project
    resourceName,
    totalSlices,
    sliceNumber,
    lockTime,
    releaseTime,
    previousLockTime

Output

resourceNametotalSlicessliceNumberlockTimereleaseTimepreviousLockTime
PipelineScheduler272002/17/2016 08:40:012016-02-17 08:40:01.00000002016-02-17 08:39:01.0000000
PipelineScheduler272202/17/2016 08:41:012016-02-17 08:41:00.00000002016-02-17 08:40:01.0000000

Regex mode using regex flags

To get the resourceName and totalSlices, use the following query:

let Traces = datatable(EventText: string)
    [
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=non_valid_integer, sliceNumber=11, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=02/17/2016 08:40:00, previousLockTime=02/17/2016 08:39:00)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=non_valid_integer, sliceNumber=44, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces
| parse-where kind = regex EventText with * "RESOURCENAME=" resourceName "," * "totalSlices=" totalSlices: long "," *
| project resourceName, totalSlices

Output

resourceNametotalSlices

parse-where with case-insensitive regex flag

In the above query, the default mode was case-sensitive, so the strings were parsed successfully. No result was obtained.

To get the required result, run parse-where with a case-insensitive (i) regex flag.

Only three strings will be parsed successfully, so the result is three records (some totalSlices hold invalid integers).

let Traces = datatable(EventText: string)
    [
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=non_valid_integer, sliceNumber=11, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=02/17/2016 08:40:00, previousLockTime=02/17/2016 08:39:00)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=non_valid_integer, sliceNumber=44, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)",
    "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)"
];
Traces
| parse-where kind = regex flags=i EventText with * "RESOURCENAME=" resourceName "," * "totalSlices=" totalSlices: long "," *
| project resourceName, totalSlices

Output

resourceNametotalSlices
PipelineScheduler27
PipelineScheduler27
PipelineScheduler27

15.23 - partition operator

Learn how to use the partition operator to partition the records of the input table into multiple subtables.

The partition operator partitions the records of its input table into multiple subtables according to values in a key column. The operator runs a subquery on each subtable, and produces a single output table that is the union of the results of all subqueries.

The partition operator is useful when you need to perform a subquery only on a subset of rows that belong to the same partition key, and not a query of the whole dataset. These subqueries could include aggregate functions, window functions, top N and others.

The partition operator supports several strategies of subquery operation:

  • Native - use with an implicit data source with thousands of key partition values.
  • Shuffle - use with an implicit source with millions of key partition values.
  • Legacy - use with an implicit or explicit source for 64 or less key partition values.

Syntax

T | partition [ hint.strategy=Strategy ] [ Hints ] by Column ( TransformationSubQuery )

T | partition [ hint.strategy=legacy ] [ Hints ] by Column { SubQueryWithSource }

Parameters

NameTypeRequiredDescription
Tstring✔️The input tabular source.
StrategystringThe value legacy, shuffle, or native. This hint defines the execution strategy of the partition operator.

If no strategy is specified, the legacy strategy is used. For more information, see Strategies.
Columnstring✔️The name of a column in T whose values determine how to partition the input tabular source.
TransformationSubQuerystring✔️A tabular transformation expression. The source is implicitly the subtables produced by partitioning the records of T. Each subtable is homogenous on the value of Column.

The expression must provide only one tabular result and shouldn’t have other types of statements, such as let statements.
SubQueryWithSourcestring✔️A tabular expression that includes its own tabular source, such as a table reference. This syntax is only supported with the legacy strategy. The subquery can only reference the key column, Column, from T. To reference the column, use the syntax toscalar(Column).

The expression must provide only one tabular result and shouldn’t have other types of statements, such as let statements.
HintsstringZero or more space-separated parameters in the form of: HintName = Value that control the behavior of the operator. See the supported hints per strategy type.

Supported hints

Hint nameTypeStrategyDescription
hint.shufflekeystringshuffleThe partition key used to run the partition operator with the shuffle strategy.
hint.materializedboollegacyIf set to true, materializes the source of the partition operator. The default value is false.
hint.concurrencyintlegacyDetermines how many partitions to run in parallel. The default value is 16.
hint.spreadintlegacyDetermines how to distribute the partitions among cluster nodes. The default value is 1.

For example, if there are N partitions and the spread hint is set to P, then the N partitions are processed by P different cluster nodes equally, in parallel/sequentially depending on the concurrency hint.

Returns

The operator returns a union of the results of the individual subqueries.

Strategies

The partition operator supports several strategies of subquery operation: native, shuffle, and legacy.

Native strategy

This strategy should be applied when the number of distinct values of the partition key isn’t large, roughly in the thousands.

The subquery must be a tabular transformation that doesn’t specify a tabular source. The source is implicit and is assigned according to the subtable partitions. Only certain supported operators can be used in the subquery. There’s no restriction on the number of partitions.

To use this strategy, specify hint.strategy=native.

Shuffle strategy

This strategy should be applied when the number of distinct values of the partition key is large, in the millions.

The subquery must be a tabular transformation that doesn’t specify a tabular source. The source is implicit and is assigned according to the subtable partitions. Only certain supported operators can be used in the subquery. There’s no restriction on the number of partitions.

To use this strategy, specify hint.strategy=shuffle. For more information about shuffle strategy and performance, see shuffle query.

Supported operators for the native and shuffle strategies

The following list of operators can be used in subqueries with the native or shuffle strategies:

Legacy strategy

For historical reasons, the legacy strategy is the default strategy. However, we recommend favoring the native or shuffle strategies, as the legacy approach is limited to 64 partitions and is less efficient.

In some scenarios, the legacy strategy might be necessary due to its support for including a tabular source in the subquery. In such cases, the subquery can only reference the key column, Column, from the input tabular source, T. To reference the column, use the syntax toscalar(Column).

If the subquery is a tabular transformation without a tabular source, the source is implicit and is based on the subtable partitions.

To use this strategy, specify hint.strategy=legacy or omit any other strategy indication.

Examples

The examples in this section show how to use the syntax to help you get started.

Find top values

In some cases, it’s more performant and easier to write a query using the partition operator than using the top-nested operator. The following query runs a subquery calculating summarize and top for each State starting with W: “WYOMING”, “WASHINGTON”, “WEST VIRGINIA”, and “WISCONSIN”.

StormEvents
| where State startswith 'W'
| partition hint.strategy=native by State 
    (
    summarize Events=count(), Injuries=sum(InjuriesDirect) by EventType, State
    | top 3 by Events 
    ) 

Output

EventTypeStateEventsInjuries
HailWYOMING1080
High WindWYOMING815
Winter StormWYOMING720
Heavy SnowWASHINGTON820
High WindWASHINGTON5813
WildfireWASHINGTON290
Thunderstorm WindWEST VIRGINIA1801
HailWEST VIRGINIA1030
Winter WeatherWEST VIRGINIA880
Thunderstorm WindWISCONSIN4161
Winter StormWISCONSIN3100
HailWISCONSIN3031

Native strategy

The following query returns the top 2 EventType values by TotalInjuries for each State that starts with ‘W’:

StormEvents
| where State startswith 'W'
| partition hint.strategy = native by State
    (
    summarize TotalInjueries = sum(InjuriesDirect) by EventType
    | top 2 by TotalInjueries
    )

Output

EventTypeTotalInjueries
Tornado4
Hail1
Thunderstorm Wind1
Excessive Heat0
High Wind13
Lightning5
High Wind5
Avalanche3

Shuffle strategy

The following query returns the top 3 DamagedProperty values foreach EpisodeId and the columns EpisodeId and State.

StormEvents
| partition hint.strategy=shuffle by EpisodeId
    (
    top 3 by DamageProperty
    | project EpisodeId, State, DamageProperty
    )
| count

Output

Count
22345

Legacy strategy with explicit source

The following query runs two subqueries:

  • When x == 1, the query returns all rows from StormEvents that have InjuriesIndirect == 1.
  • When x == 2, the query returns all rows from StormEvents that have InjuriesIndirect == 2.

The final result is the union of these two subqueries.

range x from 1 to 2 step 1
| partition hint.strategy=legacy by x {StormEvents | where x == InjuriesIndirect}
| count 

Output

Count
113

Partition reference

The following example shows how to use the as operator to give a “name” to each data partition and then reuse that name within the subquery. This approach is only relevant to the legacy strategy.

T
| partition by Dim
(
    as Partition
    | extend MetricPct = Metric * 100.0 / toscalar(Partition | summarize sum(Metric))
)

15.24 - print operator

Learn how to use the print operator to output a single row with one or more scalar expression results as columns.

Outputs a single row with one or more scalar expression results as columns.

Syntax

print [ColumnName =] ScalarExpression [, …]

Parameters

NameTypeRequiredDescription
ColumnNamestringThe name to assign to the output column.
ScalarExpressionstring✔️The expression to evaluate.

Returns

A table with one or more columns and a single row. Each column returns the corresponding value of the evaluated ScalarExpression.

Examples

The examples in this section show how to use the syntax to help you get started.

The following example outputs a row with two columns. One column contains the sum of a series of numbers and the other column contains the value of the variable, x.

print 0 + 1 + 2 + 3 + 4 + 5, x = "Wow!"

Output

print_0x
15Wow!

The following example outputs the results of the strcat() function as a concatenated string.

print banner=strcat("Hello", ", ", "World!")

Output

banner
Hello, World!

15.25 - Project operator

Learn how to use the project operator to select columns to include, rename or drop, and to insert new computed columns in the output table.

Select the columns to include, rename or drop, and insert new computed columns.

The order of the columns in the result is specified by the order of the arguments. Only the columns specified in the arguments are included in the result. Any other columns in the input are dropped.

Syntax

T | project [ColumnName | (ColumnName[,]) =] Expression [, …]

or

T | project ColumnName [= Expression] [, …]

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input for which to project certain columns.
ColumnNamestringA column name or comma-separated list of column names to appear in the output.
ExpressionstringThe scalar expression to perform over the input.
  • Either ColumnName or Expression must be specified.
  • If there’s no Expression, then a column of ColumnName must appear in the input.
  • If ColumnName is omitted, the output column name of Expression will be automatically generated.
  • If Expression returns more than one column, a list of column names can be specified in parentheses. If a list of the column names isn’t specified, all Expression’s output columns with generated names will be added to the output.

Returns

A table with columns that were named as arguments. Contains same number of rows as the input table.

Examples

The examples in this section show how to use the syntax to help you get started.

Only show specific columns

Only show the EventId, State, EventType of the StormEvents table.

StormEvents
| project EventId, State, EventType

Output

The table shows the first 10 results.

EventIdStateEventType
61032ATLANTIC SOUTHWaterspout
60904FLORIDAHeavy Rain
60913FLORIDATornado
64588GEORGIAThunderstorm Wind
68796MISSISSIPPIThunderstorm Wind
68814MISSISSIPPITornado
68834MISSISSIPPIThunderstorm Wind
68846MISSISSIPPIHail
73241AMERICAN SAMOAFlash Flood
64725KENTUCKYFlood

Potential manipulations using project

The following query renames the BeginLocation column and creates a new column called TotalInjuries from a calculation over two existing columns.

StormEvents
| project StartLocation = BeginLocation, TotalInjuries = InjuriesDirect + InjuriesIndirect
| where TotalInjuries > 5

Output

The table shows the first 10 results.

StartLocationTotalInjuries
LYDIA15
ROYAL15
GOTHENBURG9
PLAINS8
KNOXVILLE9
CAROL STREAM11
HOLLY9
RUFFIN9
ENTERPRISE MUNI ARPT50
COLLIERVILLE6

15.26 - project-away operator

Learn how to use the project-away operator to select columns from the input table to exclude from the output table.

Select what columns from the input table to exclude from the output table.

Syntax

T | project-away ColumnNameOrPattern [, …]

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input from which to remove columns.
ColumnNameOrPatternstring✔️One or more column names or column wildcard-patterns to be removed from the output.

Returns

A table with columns that weren’t named as arguments. Contains same number of rows as the input table.

Examples

The input table PopulationData has 2 columns: State and Population. Project-away the Population column and you’re left with a list of state names.

PopulationData
| project-away Population

Output

The following table shows only the first 10 results.

State
ALABAMA
ALASKA
ARIZONA
ARKANSAS
CALIFORNIA
COLORADO
CONNECTICUT
DELAWARE
DISTRICT OF COLUMBIA
FLORIDA

Project-away using a column name pattern

This query removes columns starting with the word “session”.

ConferenceSessions
| project-away session*

Output

The table shows only the first 10 results.

conferenceownerparticipantsURLlevelstarttimedurationtime_and_durationkusto_affinity
PASS Summit 2019Avner Aharonihttps://www.eventbrite.com/e/near-real-time-interact-analytics-on-big-data-using-azure-data-explorer-fg-tickets-775327756192019-11-07T19:15:00ZThu, Nov 7, 11:15 AM-12:15 PM PSTFocused
PASS SummitRohan KumarAriel Pisetzkyhttps://www.pass.org/summit/2018/Learn/Keynotes.aspx2018-11-07T08:15:00Z90Wed, Nov 7, 8:15-9:45 amMention
Intelligent Cloud 2019Rohan KumarHenning Rauch2019-04-09T09:00:00Z90Tue, Apr 9, 9:00-10:30 AMMention
Ignite 2019Jie Fenghttps://myignite.techcommunity.microsoft.com/sessions/839401002019-11-06T14:35:00Z20Wed, Nov 6, 9:35 AM - 9:55 AMMention
Ignite 2019Bernhard RodeLe Hai Dang, Ricardo Niepelhttps://myignite.techcommunity.microsoft.com/sessions/815962002019-11-06T16:45:00Z45Wed, Nov 6, 11:45 AM-12:30 PMMention
Ignite 2019Tzvia GitlinTroynahttps://myignite.techcommunity.microsoft.com/sessions/839334002019-11-06T17:30:00Z75Wed, Nov 6, 12:30 PM-1:30 PMFocused
Ignite 2019Jie Fenghttps://myignite.techcommunity.microsoft.com/sessions/810573002019-11-06T20:30:00Z45Wed, Nov 6, 3:30 PM-4:15 PMMention
Ignite 2019Manoj Rahejahttps://myignite.techcommunity.microsoft.com/sessions/839393002019-11-07T18:15:00Z20Thu, Nov 7, 1:15 PM-1:35 PMFocused
Ignite 2019Uri Barashhttps://myignite.techcommunity.microsoft.com/sessions/810603002019-11-08T17:30:00Z45Fri, Nov8, 10:30 AM-11:15 AMFocused
Ignite 2018Manoj Rahejahttps://azure.microsoft.com/resources/videos/ignite-2018-azure-data-explorer-%E2%80%93-query-billions-of-records-in-seconds/20020Focused

15.27 - project-keep operator

Learn how to use the project-keep operator to select columns from the input to keep in the output.

Select what columns from the input to keep in the output. Only the columns that are specified as arguments will be shown in the result. The other columns are excluded.

Syntax

T | project-keep ColumnNameOrPattern [, …]

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input from which to keep columns.
ColumnNameOrPatternstring✔️One or more column names or column wildcard-patterns to be kept in the output.

Returns

A table with columns that were named as arguments. Contains same number of rows as the input table.

Example

This query returns columns from the ConferenceSessions table that contain the word “session”.

ConferenceSessions
| project-keep session*

Output

The output table shows only the first 10 results.

sessionidsession_titlesession_typesession_location
COM64Focus Group: Azure Data ExplorerFocus GroupOnline
COM65Focus Group: Azure Data ExplorerFocus GroupOnline
COM08Ask the Team: Azure Data ExplorerAsk the TeamOnline
COM137Focus Group: Built-In Dashboard and Smart Auto Scaling Capabilities in Azure Data ExplorerFocus GroupOnline
CON-PRT157Roundtable: Monitoring and managing your Azure Data Explorer deploymentsRoundtableOnline
CON-PRT103Roundtable: Advanced Kusto query language topicsRoundtableOnline
CON-PRT157Roundtable: Monitoring and managing your Azure Data Explorer deploymentsRoundtableOnline
CON-PRT103Roundtable: Advanced Kusto query language topicsRoundtableOnline
CON-PRT130Roundtable: Data exploration and visualization with Azure Data ExplorerRoundtableOnline
CON-PRT130Roundtable: Data exploration and visualization with Azure Data ExplorerRoundtableOnline

15.28 - project-rename operator

Learn how to use the project-rename operator to rename columns in the output table.

Renames columns in the output table.

Syntax

T | project-rename NewColumnName = ExistingColumnName [, …]

Parameters

NameTypeRequiredDescription
Tstring✔️The input tabular data.
NewColumnNamestring✔️The new column name.
ExistingColumnNamestring✔️The name of the existing column to rename.

Returns

A table that has the columns in the same order as in an existing table, with columns renamed.

Example

If you have a table with columns a, b, and c, and you want to rename a to new_a and b to new_b while keeping the same order, the query would look like this:

print a='alpha', b='bravo', c='charlie'
| project-rename new_a=a, new_b=b, new_c=c

Output

new_anew_bnew_c
alphabravocharlie

15.29 - project-reorder operator

Learn how to use the project-reorder operator to reorder columns in the output table.

Reorders columns in the output table.

Syntax

T | project-reorder ColumnNameOrPattern [asc | desc | granny-asc | granny-desc] [, …]

Parameters

NameTypeRequiredDescription
Tstring✔️The input tabular data.
ColumnNameOrPatternstring✔️The name of the column or column wildcard pattern by which to order the columns.
asc, desc, granny-asc, granny-descstringIndicates how to order the columns when a wildcard pattern is used. asc or desc orders columns by column name in ascending or descending manner, respectively. granny-asc or granny-desc orders by ascending or descending, respectively, while secondarily sorting by the next numeric value. For example, a20 comes before a100 when granny-asc is specified.

Returns

A table that contains columns in the order specified by the operator arguments. project-reorder doesn’t rename or remove columns from the table, therefore, all columns that existed in the source table, appear in the result table.

Examples

The examples in this section show how to use the syntax to help you get started.

Reorder with b first

Reorder a table with three columns (a, b, c) so the second column (b) will appear first.

print a='a', b='b', c='c'
|  project-reorder b

Output

bac
bac

Reorder with a first

Reorder columns of a table so that columns starting with a will appear before other columns.

print b = 'b', a2='a2', a3='a3', a1='a1'
|  project-reorder a* asc

Output

a1a2a3b
a1a2a3b

15.30 - Queries

Learn how to use queries to explore and process data in the context of databases.

A query is a read-only operation against data ingested into your cluster. Queries always run in the context of a particular database in the cluster. They may also refer to data in another database, or even in another cluster.

As ad-hoc query of data is the top-priority scenario for Kusto, the Kusto Query Language syntax is optimized for non-expert users authoring and running queries over their data and being able to understand unambiguously what each query does (logically).

The language syntax is that of a data flow, where “data” means “tabular data” (data in one or more rows/columns rectangular shape). At a minimum, a query consists of source data references (references to Kusto tables) and one or more query operators applied in sequence, indicated visually by the use of a pipe character (|) to delimit operators.

For example:

StormEvents 
| where State == 'FLORIDA' and StartTime > datetime(2000-01-01)
| count

Each filter prefixed by the pipe character | is an instance of an operator, with some parameters. The input to the operator is the table that is the result of the preceding pipeline. In most cases, any parameters are scalar expressions over the columns of the input. In a few cases, the parameters are the names of input columns, and in a few cases, the parameter is a second table. The result of a query is always a table, even if it only has one column and one row.

T is used in query to denote the preceding pipeline or source table.

15.31 - range operator

Learn how to use the range operator to generate a single-column table of values.

Generates a single-column table of values.

Syntax

range columnName from start to stop step step

Parameters

NameTypeRequiredDescription
columnNamestring✔️The name of the single column in the output table.
startint, long, real, datetime, or timespan✔️The smallest value in the output.
stopint, long, real, datetime, or timespan✔️The highest value being generated in the output or a bound on the highest value if step is over this value.
stepint, long, real, datetime, or timespan✔️The difference between two consecutive values.

Returns

A table with a single column called columnName, whose values are start, start + step, … up to and until stop.

Examples

The example in this section shows how to use the syntax to help you get started.

Range over the past seven days

The following example creates a table with entries for the current time stamp extended over the past seven days, once a day.

range LastWeek from ago(7d) to now() step 1d

Output

LastWeek
2015-12-05 09:10:04.627
2015-12-06 09:10:04.627
2015-12-12 09:10:04.627

Combine different stop times

The following example shows how to extend ranges to use multiple stop times by using the union operator.

let Range1 = range Time from datetime(2024-01-01) to datetime(2024-01-05) step 1d;
let Range2 = range Time from datetime(2024-01-06) to datetime(2024-01-10) step 1d;
union Range1, Range2
| order by Time asc

Output

Time
2024-01-04 00:00:00.0000000
2024-01-05 00:00:00.0000000
2024-01-06 00:00:00.0000000
2024-01-07 00:00:00.0000000
2024-01-08 00:00:00.0000000
2024-01-09 00:00:00.0000000
2024-01-10 00:00:00.0000000

Range using parameters

The following example shows how to use the range operator with parameters, which are then extended and consumed as a table.

let toUnixTime = (dt:datetime) 
{ 
    (dt - datetime(1970-01-01)) / 1s 
};
let MyMonthStart = startofmonth(now()); //Start of month
let StepBy = 4.534h; //Supported timespans
let nn = 64000; // Row Count parametrized
let MyTimeline = range MyMonthHour from MyMonthStart to now() step StepBy
| extend MyMonthHourinUnixTime = toUnixTime(MyMonthHour), DateOnly = bin(MyMonthHour,1d), TimeOnly = MyMonthHour - bin(MyMonthHour,1d)
; MyTimeline | order by MyMonthHour asc | take nn

Output

MyMonthHourMyMonthHourinUnixTimeDateOnlyTimeOnly
2023-02-0100:00:00.000000016752096002023-02-01 00:00:00.0000000
2023-02-0104:32:02.40000001675225922.42023-02-01 00:00:00.0000000
2023-02-0109:04:04.80000001675242244.82023-02-01 00:00:00.0000000
2023-02-0113:36:07.20000001675258567.22023-02-01 00:00:00.0000000

Incremented steps

The following example creates a table with a single column called Steps whose type is long and results in values from one to eight incremented by three.

range Steps from 1 to 8 step 3

Output

Steps
1
4
7

Traces over a time range

The following example shows how the range operator can be used to create a dimension table that is used to introduce zeros where the source data has no values. It takes timestamps from the last four hours and counts traces for each one-minute interval. When there are no traces for a specific interval, the count is zero.

range TIMESTAMP from ago(4h) to now() step 1m
| join kind=fullouter
  (Traces
      | where TIMESTAMP > ago(4h)
      | summarize Count=count() by bin(TIMESTAMP, 1m)
  ) on TIMESTAMP
| project Count=iff(isnull(Count), 0, Count), TIMESTAMP
| render timechart  

15.32 - reduce operator

Learn how to use the reduce operator to group a set of strings together based on value similarity.

Groups a set of strings together based on value similarity.

For each such group, the operator returns a pattern, count, and representative. The pattern best describes the group, in which the * character represents a wildcard. The count is the number of values in the group, and the representative is one of the original values in the group.

Syntax

T | reduce [kind = ReduceKind] by Expr [with [threshold = Threshold] [, characters = Characters]]

Parameters

NameTypeRequiredDescription
Exprstring✔️The value by which to reduce.
ThresholdrealA value between 0 and 1 that determines the minimum fraction of rows required to match the grouping criteria in order to trigger a reduction operation. The default value is 0.1.

We recommend setting a small threshold value for large inputs. With a smaller threshold value, more similar values are grouped together, resulting in fewer but more similar groups. A larger threshold value requires less similarity, resulting in more groups that are less similar. See Examples.
ReduceKindstringThe only valid value is source. If source is specified, the operator appends the Pattern column to the existing rows in the table instead of aggregating by Pattern.

Returns

A table with as many rows as there are groups and columns titled pattern, count, and representative. The pattern best describes the group, in which the * character represents a wildcard, or placeholder for an arbitrary insertion string. The count is the number of values in the group, and the representative is one of the original values in the group.

For example, the result of reduce by city might include:

PatternCountRepresentative
San *5182San Bernard
Saint *2846Saint Lucy
Moscow3726Moscow
* -on- *2730One -on- One
Paris2716Paris

Examples

The example in this section shows how to use the syntax to help you get started.

Small threshold value

This query generates a range of numbers, creates a new column with concatenated strings and random integers, and then groups the rows by the new column with specific reduction parameters.

range x from 1 to 1000 step 1
| project MyText = strcat("MachineLearningX", tostring(toint(rand(10))))
| reduce by MyText  with threshold=0.001 , characters = "X" 

Output

PatternCountRepresentative
MachineLearning*1000MachineLearningX4

Large threshold value

This query generates a range of numbers, creates a new column with concatenated strings and random integers, and then groups the rows by the new column with specific reduction parameters.

range x from 1 to 1000 step 1
| project MyText = strcat("MachineLearningX", tostring(toint(rand(10))))
| reduce by MyText  with threshold=0.9 , characters = "X" 

Output

The result includes only those groups where the MyText value appears in at least 90% of the rows.

PatternCountRepresentative
MachineLearning*177MachineLearningX9
MachineLearning*102MachineLearningX0
MachineLearning*106MachineLearningX1
MachineLearning*96MachineLearningX6
MachineLearning*110MachineLearningX4
MachineLearning*100MachineLearningX3
MachineLearning*99MachineLearningX8
MachineLearning*104MachineLearningX7
MachineLearning*106MachineLearningX2

Behavior of Characters parameter

If the Characters parameter is unspecified, then every non-ascii numeric character becomes a term separator.

range x from 1 to 10 step 1 | project str = strcat("foo", "Z", tostring(x)) | reduce by str

Output

PatternCountRepresentative
others10

However, if you specify that “Z” is a separator, then it’s as if each value in str is two terms: foo and tostring(x):

range x from 1 to 10 step 1 | project str = strcat("foo", "Z", tostring(x)) | reduce by str with characters="Z"

Output

PatternCountRepresentative
foo*10fooZ1

Apply reduce to sanitized input

The following example shows how one might apply the reduce operator to a “sanitized” input, in which GUIDs in the column being reduced are replaced before reducing:

Start with a few records from the Trace table. Then reduce the Text column which includes random GUIDs. As random GUIDs interfere with the reduce operation, replace them all by the string “GUID”. Now perform the reduce operation. In case there are other “quasi-random” identifiers with embedded ‘-’ or ‘_’ characters in them, treat characters as non-term-breakers.

Trace
| take 10000
| extend Text = replace(@"[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}", "GUID", Text)
| reduce by Text with characters="-_"

15.33 - sample operator

Learn how to use the sample operator to return up to the specified number of rows from the input table.

Returns up to the specified number of random rows from the input table.

Syntax

T | sample NumberOfRows

Parameters

NameTypeRequiredDescription
Tstring✔️The input tabular expression.
NumberOfRowsint, long, or real✔️The number of rows to return. You can specify any numeric expression.

Examples

The example in this section shows how to use the syntax to help you get started.

Generate a sample

This query creates a range of numbers, samples one value, and then duplicates that sample.

let _data = range x from 1 to 100 step 1;
let _sample = _data | sample 1;
union (_sample), (_sample)

Output

x
74
63

To ensure that in example above _sample is calculated once, one can use materialize() function:

let _data = range x from 1 to 100 step 1;
let _sample = materialize(_data | sample 1);
union (_sample), (_sample)

Output

x
24
24

Generate a sample of a certain percentage of data

To sample a certain percentage of your data (rather than a specified number of rows), you can use

StormEvents | where rand() < 0.1

Output

The table contains the first few rows of the output. Run the query to view the full result.

StartTimeEndTimeEpisodeIdEventIdStateEventType
2007-01-01T00:00:00Z2007-01-20T10:24:00Z240311914INDIANAFlood
2007-01-01T00:00:00Z2007-01-24T18:47:00Z240811930INDIANAFlood
2007-01-01T00:00:00Z2007-01-01T12:00:00Z197912631DELAWAREHeavy Rain
2007-01-01T00:00:00Z2007-01-01T00:00:00Z259213208NORTH CAROLINAThunderstorm Wind
2007-01-01T00:00:00Z2007-01-31T23:59:00Z14927069MINNESOTADrought
2007-01-01T00:00:00Z2007-01-31T23:59:00Z224010858TEXASDrought

Generate a sample of keys

To sample keys rather than rows (for example - sample 10 Ids and get all rows for these Ids), you can use sample-distinct in combination with the in operator.

let sampleEpisodes = StormEvents | sample-distinct 10 of EpisodeId;
StormEvents
| where EpisodeId in (sampleEpisodes)

Output

The table contains the first few rows of the output. Run the query to view the full result.

StartTimeEndTimeEpisodeIdEventIdStateEventType
2007-09-18T20:00:00Z2007-09-19T18:00:00Z1107460904FLORIDAHeavy Rain
2007-09-20T21:57:00Z2007-09-20T22:05:00Z1107860913FLORIDATornado
2007-09-29T08:11:00Z2007-09-29T08:11:00Z1109161032ATLANTIC SOUTHWaterspout
2007-12-07T14:00:00Z2007-12-08T04:00:00Z1318373241AMERICAN SAMOAFlash Flood
2007-12-11T21:45:00Z2007-12-12T16:45:00Z1282670787KANSASFlood
2007-12-13T09:02:00Z2007-12-13T10:30:00Z1178064725KENTUCKYFlood

15.34 - sample-distinct operator

Learn how to use the sample-distinct operator to return a column that contains up to the specified number of distinct values of the requested columns.

Returns a single column that contains up to the specified number of distinct values of the requested column.

The operator tries to return an answer as quickly as possible rather than trying to make a fair sample.

Syntax

T | sample-distinct NumberOfValues of ColumnName

Parameters

NameTypeRequiredDescription
Tstring✔️The input tabular expression.
NumberOfValuesint, long, or real✔️The number distinct values of T to return. You can specify any numeric expression.
ColumnNamestring✔️The name of the column from which to sample.

Examples

The example in this section shows how to use the syntax to help you get started.

Get 10 distinct values from a population

StormEvents | sample-distinct 10 of EpisodeId

Output

EpisodeId
11074
11078
11749
12554
12561
13183
11780
11781
12826

Further compute the sample values

let sampleEpisodes = StormEvents | sample-distinct 10 of EpisodeId;
StormEvents 
| where EpisodeId in (sampleEpisodes) 
| summarize totalInjuries=sum(InjuriesDirect) by EpisodeId

Output

EpisodeIdtotalInjuries
110910
110740
110780
117490
125543
125610
131830
117800
117810
128260

15.35 - scan operator

Learn how to use the scan operator to scan data, match, and build sequences based on the predicates.

Scans data, matches, and builds sequences based on the predicates.

Matching records are determined according to predicates defined in the operator’s steps. A predicate can depend on the state that is generated by previous steps. The output for the matching record is determined by the input record and assignments defined in the operator’s steps.

Syntax

T | scan [ with_match_id = MatchIdColumnName ] [ declare ( ColumnDeclarations ) ] with ( StepDefinitions )

ColumnDeclarations syntax

ColumnName : ColumnType[= DefaultValue ] [, … ]

StepDefinition syntax

step StepName [ output = all | last | none] : Condition [ => Column = Assignment [, … ] ] ;

Parameters

NameTypeRequiredDescription
Tstring✔️The input tabular source.
MatchIdColumnNamestringThe name of a column of type long that is appended to the output as part of the scan execution. Indicates the 0-based index of the match for the record.
ColumnDeclarationsstringDeclares an extension to the schema of T. These columns are assigned values in the steps. If not assigned, the DefaultValue is returned. Unless otherwise specified, DefaultValue is null.
StepNamestring✔️Used to reference values in the state of scan for conditions and assignments. The step name must be unique.
Conditionstring✔️An expression that evaluates to true or false that defines which records from the input match the step. A record matches the step when the condition is true with the step’s state or with the previous step’s state.
AssignmentstringA scalar expression that is assigned to the corresponding column when a record matches a step.
outputstringControls the output logic of the step on repeated matches. all outputs all records matching the step, last outputs only the last record in a series of repeating matches for the step, and none doesn’t output records matching the step. The default is all.

Returns

A record for each match of a record from the input to a step. The schema of the output is the schema of the source extended with the column in the declare clause.

Scan logic

scan goes over the serialized input data, record by record, comparing each record against each step’s condition while taking into account the current state of each step.

State

The underlying state of the scan operator can be thought of as a table with a row for each step. Each step maintains its own state with the latest values of the columns and declared variables from all of the previous steps and the current step. If relevant, it also holds the match ID for the ongoing sequence.

If a scan operator has n steps named s_1, s_2, …, s_n then step s_k would have k records in its state corresponding to s_1, s_2, …, s_k. The StepName.ColumnName format is used to reference a value in the state. For instance, s_2.col1 would reference column col1 that belongs to step s_2 in the state of s_k. For a detailed example, see the scan logic walkthrough.

The state starts empty and updates whenever a scanned input record matches a step. When the state of the current step is nonempty, the step is referred to as having an active sequence.

Matching logic

Each input record is evaluated against all of the steps in reverse order, from the last step to the first. When a record r is evaluated against some step s_k, the following logic is applied:

  • Check 1: If the state of the previous step (s_k-1) is nonempty, and r meets the Condition of s_k, then a match occurs. The match leads to the following actions:

    1. The state of s_k is cleared.
    2. The state of s_k-1 is promoted to become the state of s_k.
    3. The assignments of s_k are calculated and extend r.
    4. The extended r is added to the output and to the state of s_k.

    [!NOTE] If Check 1 results in a match, Check 2 is disregarded, and r moves on to be evaluated against s_k-1.

  • Check 2: If the state of s_k has an active sequence or s_k is the first step, and r meets the Condition of s_k, then a match occurs. The match leads to the following actions:

    1. The assignments of s_k are calculated and extend r.
    2. The values that represent s_k in the state of s_k are replaced with the values of the extended r.
    3. If s_k is defined as output=all, the extended r is added to the output.
    4. If s_k is the first step, a new sequence begins and the match ID increases by 1. This only affects the output when with_match_id is used.

Once the checks for s_k are complete, r moves on to be evaluated against s_k-1.

For a detailed example of this logic, see the scan logic walkthrough.

Examples

The example in this section shows how to use the syntax to help you get started.

Cumulative sum

Calculate the cumulative sum for an input column. The result of this example is equivalent to using row_cumsum().

range x from 1 to 5 step 1 
| scan declare (cumulative_x:long=0) with 
(
    step s1: true => cumulative_x = x + s1.cumulative_x;
)

Output

xcumulative_x
11
23
36
410
515

Cumulative sum on multiple columns with a reset condition

Calculate the cumulative sum for two input columns, reset the sum value to the current record value whenever the cumulative sum reached 10 or more.

range x from 1 to 5 step 1
| extend y = 2 * x
| scan declare (cumulative_x:long=0, cumulative_y:long=0) with 
(
    step s1: true => cumulative_x = iff(s1.cumulative_x >= 10, x, x + s1.cumulative_x), 
                     cumulative_y = iff(s1.cumulative_y >= 10, y, y + s1.cumulative_y);
)

Output

xycumulative_xcumulative_y
1212
2436
36612
48108
510518

Fill forward a column

Fill forward a string column. Each empty value is assigned the last seen nonempty value.

let Events = datatable (Ts: timespan, Event: string) [
    0m, "A",
    1m, "",
    2m, "B",
    3m, "",
    4m, "",
    6m, "C",
    8m, "",
    11m, "D",
    12m, ""
]
;
Events
| sort by Ts asc
| scan declare (Event_filled: string="") with 
(
    step s1: true => Event_filled = iff(isempty(Event), s1.Event_filled, Event);
)

Output

TsEventEvent_filled
00:00:00AA
00:01:00A
00:02:00BB
00:03:00B
00:04:00B
00:06:00CC
00:08:00C
00:11:00DD
00:12:00D

Sessions tagging

Divide the input into sessions: a session ends 30 minutes after the first event of the session, after which a new session starts. Note the use of with_match_id flag, which assigns a unique value for each distinct match (session) of scan. Also note the special use of two steps in this example, inSession has true as condition so it captures and outputs all the records from the input while endSession captures records that happen more than 30m from the sessionStart value for the current match. The endSession step has output=none meaning it doesn’t produce output records. The endSession step is used to advance the state of the current match from inSession to endSession, allowing a new match (session) to begin, starting from the current record.

let Events = datatable (Ts: timespan, Event: string) [
    0m, "A",
    1m, "A",
    2m, "B",
    3m, "D",
    32m, "B",
    36m, "C",
    38m, "D",
    41m, "E",
    75m, "A"
]
;
Events
| sort by Ts asc
| scan with_match_id=session_id declare (sessionStart: timespan) with 
(
    step inSession: true => sessionStart = iff(isnull(inSession.sessionStart), Ts, inSession.sessionStart);
    step endSession output=none: Ts - inSession.sessionStart > 30m;
)

Output

TsEventsessionStartsession_id
00:00:00A00:00:000
00:01:00A00:00:000
00:02:00B00:00:000
00:03:00D00:00:000
00:32:00B00:32:001
00:36:00C00:32:001
00:38:00D00:32:001
00:41:00E00:32:001
01:15:00A01:15:002

Events between Start and Stop

Find all sequences of events between the event Start and the event Stop that occur within 5 minutes. Assign a match ID for each sequence.

let Events = datatable (Ts: timespan, Event: string) [
    0m, "A",
    1m, "Start",
    2m, "B",
    3m, "D",
    4m, "Stop",
    6m, "C",
    8m, "Start",
    11m, "E",
    12m, "Stop"
]
;
Events
| sort by Ts asc
| scan with_match_id=m_id with 
(
    step s1: Event == "Start";
    step s2: Event != "Start" and Event != "Stop" and Ts - s1.Ts <= 5m;
    step s3: Event == "Stop" and Ts - s1.Ts <= 5m;
)

Output

TsEventm_id
00:01:00Start0
00:02:00B0
00:03:00D0
00:04:00Stop0
00:08:00Start1
00:11:00E1
00:12:00Stop1

Calculate a custom funnel of events

Calculate a funnel completion of the sequence Hail -> Tornado -> Thunderstorm Wind by State with custom thresholds on the times between the events (Tornado within 1h and Thunderstorm Wind within 2h). This example is similar to the funnel_sequence_completion plugin, but allows greater flexibility.

StormEvents
| partition hint.strategy=native by State 
    (
    sort by StartTime asc
    | scan with 
    (
        step hail: EventType == "Hail";
        step tornado: EventType == "Tornado" and StartTime - hail.StartTime <= 1h;
        step thunderstormWind: EventType == "Thunderstorm Wind" and StartTime - tornado.StartTime <= 2h;
    )
    )
| summarize dcount(State) by EventType

Output

EventTypedcount_State
Hail50
Tornado34
Thunderstorm Wind32

Scan logic walkthrough

This section demonstrates the scan logic using a step-by-step walkthrough of the Events between start and stop example:

let Events = datatable (Ts: timespan, Event: string) [
    0m, "A",
    1m, "Start",
    2m, "B",
    3m, "D",
    4m, "Stop",
    6m, "C",
    8m, "Start",
    11m, "E",
    12m, "Stop"
]
;
Events
| sort by Ts asc
| scan with_match_id=m_id with 
(
    step s1: Event == "Start";
    step s2: Event != "Start" and Event != "Stop" and Ts - s1.Ts <= 5m;
    step s3: Event == "Stop" and Ts - s1.Ts <= 5m;
)

Output

TsEventm_id
00:01:00Start0
00:02:00B0
00:03:00D0
00:04:00Stop0
00:08:00Start1
00:11:00E1
00:12:00Stop1

The state

Think of the state of the scan operator as a table with a row for each step, in which each step has its own state. This state contains the latest values of the columns and declared variables from all of the previous steps and the current step. To learn more, see State.

For this example, the state can be represented with the following table:

stepm_ids1.Tss1.Events2.Tss2.Events3.Tss3.Event
s1XXXX
s2XX
s3

The “X” indicates that a specific field is irrelevant for that step.

The matching logic

This section follows the matching logic through each record of the Events table, explaining the transformation of the state and output at each step.

Record 1

TsEvent
0m“A”

Record evaluation at each step:

  • s3: Check 1 isn’t passed because the state of s2 is empty, and Check 2 isn’t passed because s3 lacks an active sequence.
  • s2: Check 1 isn’t passed because the state of s1 is empty, and Check 2 isn’t passed because s2 lacks an active sequence.
  • s1: Check 1 is irrelevant because there’s no previous step. Check 2 isn’t passed because the record doesn’t meet the condition of Event == "Start". Record 1 is discarded without affecting the state or output.

State:

stepm_ids1.Tss1.Events2.Tss2.Events3.Tss3.Event
s1XXXX
s2XX
s3

Record 2

TsEvent
1m“Start”

Record evaluation at each step:

  • s3: Check 1 isn’t passed because the state of s2 is empty, and Check 2 isn’t passed because s3 lacks an active sequence.
  • s2: Check 1 isn’t passed because the state of s1 is empty, and Check 2 isn’t passed because s2 lacks an active sequence.
  • s1: Check 1 is irrelevant because there’s no previous step. Check 2 is passed because the record meets the condition of Event == "Start". This match initiates a new sequence, and the m_id is assigned. Record 2 and its m_id (0) are added to the state and the output.

State:

stepm_ids1.Tss1.Events2.Tss2.Events3.Tss3.Event
s1000:01:00“Start”XXXX
s2XX
s3

Record 3

TsEvent
2m“B”

Record evaluation at each step:

  • s3: Check 1 isn’t passed because the state of s2 is empty, and Check 2 isn’t passed because s3 lacks an active sequence.
  • s2: Check 1 is passed because the state of s1 is nonempty and the record meets the condition of Ts - s1.Ts < 5m. This match causes the state of s1 to be cleared and the sequence in s1 to be promoted to s2. Record 3 and its m_id (0) are added to the state and the output.
  • s1: Check 1 is irrelevant because there’s no previous step, and Check 2 isn’t passed because the record doesn’t meet the condition of Event == "Start".

State:

stepm_ids1.Tss1.Events2.Tss2.Events3.Tss3.Event
s1XXXX
s2000:01:00“Start”00:02:00“B”XX
s3

Record 4

TsEvent
3m“D”

Record evaluation at each step:

  • s3: Check 1 isn’t passed because the record doesn’t meet the condition of Event == "Stop", and Check 2 isn’t passed because s3 lacks an active sequence.
  • s2: Check 1 isn’t passed because the state of s1 is empty. it passes Check 2 because it meets the condition of Ts - s1.Ts < 5m. Record 4 and its m_id (0) are added to the state and the output. The values from this record overwrite the previous state values for s2.Ts and s2.Event.
  • s1: Check 1 is irrelevant because there’s no previous step, and Check 2 isn’t passed because the record doesn’t meet the condition of Event == "Start".

State:

stepm_ids1.Tss1.Events2.Tss2.Events3.Tss3.Event
s1XXXX
s2000:01:00“Start”00:03:00“D”XX
s3

Record 5

TsEvent
4m“Stop”

Record evaluation at each step:

  • s3: Check 1 is passed because s2 is nonempty and it meets the s3 condition of Event == "Stop". This match causes the state of s2 to be cleared and the sequence in s2 to be promoted to s3. Record 5 and its m_id (0) are added to the state and the output.
  • s2: Check 1 isn’t passed because the state of s1 is empty, and Check 2 isn’t passed because s2 lacks an active sequence.
  • s1: Check 1 is irrelevant because there’s no previous step. Check 2 isn’t passed because the record doesn’t meet the condition of Event == "Start".

State:

stepm_ids1.Tss1.Events2.Tss2.Events3.Tss3.Event
s1XXXX
s2XX
s3000:01:00“Start”00:03:00“D”00:04:00“Stop”

Record 6

TsEvent
6m“C”

Record evaluation at each step:

  • s3: Check 1 isn’t passed because the state of s2 is empty, and Check 2 isn’t passed because s3 doesn’t meet the s3 condition of Event == "Stop".
  • s2: Check 1 isn’t passed because the state of s1 is empty, and Check 2 isn’t passed because s2 lacks an active sequence.
  • s1: Check 1 isn’t passed because there’s no previous step, and Check 2 isn’t passed because it doesn’t meet the condition of Event == "Start". Record 6 is discarded without affecting the state or output.

State:

stepm_ids1.Tss1.Events2.Tss2.Events3.Tss3.Event
s1XXXX
s2XX
s3000:01:00“Start”00:03:00“D”00:04:00“Stop”

Record 7

TsEvent
8m“Start”

Record evaluation at each step:

  • s3: Check 1 isn’t passed because the state of s2 is empty, and Check 2 isn’t passed because it doesn’t meet the condition of Event == "Stop".
  • s2: Check 1 isn’t passed because the state of s1 is empty, and Check 2 isn’t passed because s2 lacks an active sequence.
  • s1: Check 1 isn’t passed because there’s no previous step. it passes Check 2 because it meets the condition of Event == "Start". This match initiates a new sequence in s1 with a new m_id. Record 7 and its m_id (1) are added to the state and the output.

State:

stepm_ids1.Tss1.Events2.Tss2.Events3.Tss3.Event
s1100:08:00“Start”XXXX
s2XX
s3000:01:00“Start”00:03:00“D”00:04:00“Stop”

Record 8

TsEvent
11m“E”

Record evaluation at each step:

  • s3: Check 1 isn’t passed because the state of s2 is empty, and Check 2 isn’t passed because it doesn’t meet the s3 condition of Event == "Stop".
  • s2: Check 1 is passed because the state of s1 is nonempty and the record meets the condition of Ts - s1.Ts < 5m. This match causes the state of s1 to be cleared and the sequence in s1 to be promoted to s2. Record 8 and its m_id (1) are added to the state and the output.
  • s1: Check 1 is irrelevant because there’s no previous step, and Check 2 isn’t passed because the record doesn’t meet the condition of Event == "Start".

State:

stepm_ids1.Tss1.Events2.Tss2.Events3.Tss3.Event
s1XXXX
s2100:08:00“Start”00:11:00“E”XX
s3000:01:00“Start”00:03:00“D”00:04:00“Stop”

Record 9

TsEvent
12m“Stop”

Record evaluation at each step:

  • s3: Check 1 is passed because s2 is nonempty and it meets the s3 condition of Event == "Stop". This match causes the state of s2 to be cleared and the sequence in s2 to be promoted to s3. Record 9 and its m_id (1) are added to the state and the output.
  • s2: Check 1 isn’t passed because the state of s1 is empty, and Check 2 isn’t passed because s2 lacks an active sequence.
  • s1: Check 1 isn’t passed because there’s no previous step. it passes Check 2 because it meets the condition of Event == "Start". This match initiates a new sequence in s1 with a new m_id.

State:

stepm_ids1.Tss1.Events2.Tss2.Events3.Tss3.Event
s1XXXX
s2XX
s3100:08:00“Start”00:11:00“E”00:12:00“Stop”

Final output

TsEventm_id
00:01:00Start0
00:02:00B0
00:03:00D0
00:04:00Stop0
00:08:00Start1
00:11:00E1
00:12:00Stop1

15.36 - search operator

Learn how to use the search operator to search for a text pattern in multiple tables and columns.

Searches a text pattern in multiple tables and columns.

Syntax

[T |] search [kind= CaseSensitivity ] [in (TableSources)] SearchPredicate

Parameters

NameTypeRequiredDescription
TstringThe tabular data source to be searched over, such as a table name, a union operator, or the results of a tabular query. Can’t be specified together with TableSources.
CaseSensitivitystringA flag that controls the behavior of all string scalar operators, such as has, with respect to case sensitivity. Valid values are default, case_insensitive, case_sensitive. The options default and case_insensitive are synonymous, since the default behavior is case insensitive.
TableSourcesstringA comma-separated list of “wildcarded” table names to take part in the search. The list has the same syntax as the list of the union operator. Can’t be specified together with tabular data source (T).
SearchPredicatestring✔️A boolean expression to be evaluated for every record in the input. If it returns true, the record is outputted. See Search predicate syntax.

Search predicate syntax

The SearchPredicate allows you to search for specific terms in all columns of a table. The operator that is applied to a search term depends on the presence and placement of a wildcard asterisk (*) in the term, as shown in the following table.

LiteralOperator
billghas
*billghassuffix
billg*hasprefix
*billg*contains
bi*lgmatches regex

You can also restrict the search to a specific column, look for an exact match instead of a term match, or search by regular expression. The syntax for each of these cases is shown in the following table.

SyntaxExplanation
ColumnName:StringLiteralThis syntax can be used to restrict the search to a specific column. The default behavior is to search all columns.
ColumnName==StringLiteralThis syntax can be used to search for exact matches of a column against a string value. The default behavior is to look for a term-match.
Column matches regex StringLiteralThis syntax indicates regular expression matching, in which StringLiteral is the regex pattern.

Use boolean expressions to combine conditions and create more complex searches. For example, "error" and x==123 would result in a search for records that have the term error in any columns and the value 123 in the x column.

Search predicate syntax examples

#SyntaxMeaning (equivalent where)Comments
1search "err"where * has "err"
2search in (T1,T2,A*) "err"union T1,T2,A* | where * has “err”
3search col:"err"where col has "err"
4search col=="err"where col=="err"
5search "err*"where * hasprefix "err"
6search "*err"where * hassuffix "err"
7search "*err*"where * contains "err"
8search "Lab*PC"where * matches regex @"\bLab.*PC\b"
9search *where 0==0
10search col matches regex "..."where col matches regex "..."
11search kind=case_sensitiveAll string comparisons are case-sensitive
12search "abc" and ("def" or "hij")where * has "abc" and (* has "def" or * has hij")
13search "err" or (A>a and A<b)where * has "err" or (A>a and A<b)

Remarks

Unlike the find operator, the search operator doesn’t support the following syntax:

  1. withsource=: The output always includes a column called $table of type string whose value is the table name from which each record was retrieved (or some system-generated name if the source isn’t a table but a composite expression).
  2. project=, project-smart: The output schema is equivalent to project-smart output schema.

Examples

The example in this section shows how to use the syntax to help you get started.

Search for the term Green in all the tables of the ContosoSales database.

The output finds records with the term Green as a last name or a color in the Customers, Products, and SalesTable tables.

 search "Green"

Output

$tableCityNameContinentNameCustomerKeyEducationFirstNameGenderLastName
CustomersBallardNorth America16549Partial CollegeMasonMGreen
CustomersBellinghamNorth America2070High SchoolAdamMGreen
CustomersBellinghamNorth America10658BachelorsSaraFGreen
CustomersBeverly HillsNorth America806Graduate DegreeRichardMGreen
CustomersBeverly HillsNorth America7674Graduate DegreeJamesMGreen
CustomersBurbankNorth America5241Graduate DegreeMadelineFGreen

Search for records that contain the term Green and one of either terms Deluxe or Proseware in the ContosoSales database.

search "Green" and ("Deluxe" or "Proseware")

Output

$tableProductNameManufacturerColorNameClassNameProductCategoryName
ProductsContoso 8GB Clock & Radio MP3 Player X850 GreenContoso, LtdGreenDeluxeAudio
ProductsProseware Scan Jet Digital Flat Bed Scanner M300 GreenProseware, Inc.GreenRegularComputers
ProductsProseware All-In-One Photo Printer M200 GreenProseware, Inc.GreenRegularComputers
ProductsProseware Ink Jet Wireless All-In-One Printer M400 GreenProseware, Inc.GreenRegularComputers
ProductsProseware Ink Jet Instant PDF Sheet-Fed Scanner M300 GreenProseware, Inc.GreenRegularComputers
ProductsProseware Desk Jet All-in-One Printer, Scanner, Copier M350 GreenProseware, Inc.GreenRegularComputers
ProductsProseware Duplex Scanner M200 GreenProseware, Inc.GreenRegularComputers

Search a specific table

Search for the term Green only in the Customers table.

search in (Products) "Green"

Output

$tableProductNameManufacturerColorName
ProductsContoso 4G MP3 Player E400 GreenContoso, LtdGreen
ProductsContoso 8GB Super-Slim MP3/Video Player M800 GreenContoso, LtdGreen
ProductsContoso 16GB Mp5 Player M1600 GreenContoso, LtdGreen
ProductsContoso 8GB Clock & Radio MP3 Player X850 GreenContoso, LtdGreen
ProductsNT Wireless Bluetooth Stereo Headphones M402 GreenNorthwind TradersGreen
ProductsNT Wireless Transmitter and Bluetooth Headphones M150 GreenNorthwind TradersGreen

Search for records that match the case-sensitive term in the ContosoSales database.

search kind=case_sensitive "blue"

Output

$tableProductNameManufacturerColorNameClassName
ProductsContoso 16GB New Generation MP5 Player M1650 blueContoso, LtdblueRegular
ProductsContoso Bright Light battery E20 blueContoso, LtdblueEconomy
ProductsLitware 120mm Blue LED Case Fan E901 blueLitware, Inc.blueEconomy
NewSalesLitware 120mm Blue LED Case Fan E901 blueLitware, Inc.blueEconomy
NewSalesLitware 120mm Blue LED Case Fan E901 blueLitware, Inc.blueEconomy
NewSalesLitware 120mm Blue LED Case Fan E901 blueLitware, Inc.blueEconomy
NewSalesLitware 120mm Blue LED Case Fan E901 blueLitware, Inc.blueEconomy

Search specific columns

Search for the terms Aaron and Hughes, in the “FirstName” and “LastName” columns respectively, in the ContosoSales database.

search FirstName:"Aaron" or LastName:"Hughes"

Output

$tableCustomerKeyEducationFirstNameGenderLastName
Customers18285High SchoolRileyFHughes
Customers802Graduate DegreeAaronMSharma
Customers986BachelorsMelanieFHughes
Customers12669High SchoolJessicaFHughes
Customers13436Graduate DegreeMariahFHughes
Customers10152Graduate DegreeAaronMCampbell

Limit search by timestamp

Search for the term Hughes in the ContosoSales database, if the term appears in a record with a date greater than the given date in ‘datetime’.

search "Hughes" and DateKey > datetime('2009-01-01')

Output

$tableDateKeySalesAmount_real
SalesTable2021-12-13T00:00:00Z446.4715
SalesTable2021-12-13T00:00:00Z120.555
SalesTable2021-12-13T00:00:00Z48.4405
SalesTable2021-12-13T00:00:00Z39.6435
SalesTable2021-12-13T00:00:00Z56.9905

Performance Tips

#TipPreferOver
1Prefer to use a single search operator over several consecutive search operatorssearch "billg" and ("steveb" or "satyan")search “billg” | search “steveb” or “satyan”
2Prefer to filter inside the search operatorsearch "billg" and "steveb"search * | where * has “billg” and * has “steveb”

15.37 - serialize operator

Learn how to use the serialize operator to mark the input row set as serialized and ready for window functions.

Marks that the order of the input row set is safe to use for window functions.

The operator has a declarative meaning. It marks the input row set as serialized (ordered), so that window functions can be applied to it.

Syntax

serialize [Name1 = Expr1 [, Name2 = Expr2]…]

Parameters

NameTypeRequiredDescription
NamestringThe name of the column to add or update. If omitted, the output column name is automatically generated.
Exprstring✔️The calculation to perform over the input.

Examples

The example in this section shows how to use the syntax to help you get started.

Serialize subset of rows by condition

This query retrieves all log entries from the TraceLogs table that have a specific ClientRequestId and preserves the order of these entries during processing.

TraceLogs
| where ClientRequestId == "5a848f70-9996-eb17-15ed-21b8eb94bf0e"
| serialize

Output

This table only shows the top 5 query results.

TimestampNodeComponentClientRequestIdMessage
2014-03-08T12:24:55.5464757ZEngine000000000757INGESTOR_GATEWAY5a848f70-9996-eb17-15ed-21b8eb94bf0e$$IngestionCommand table=fogEvents format=json
2014-03-08T12:24:56.0929514ZEngine000000000757DOWNLOADER5a848f70-9996-eb17-15ed-21b8eb94bf0eDownloading file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_0.json.gz""
2014-03-08T12:25:40.3574831ZEngine000000000341INGESTOR_EXECUTER5a848f70-9996-eb17-15ed-21b8eb94bf0eIngestionCompletionEvent: finished ingestion file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_0.json.gz""
2014-03-08T12:25:40.9039588ZEngine000000000341DOWNLOADER5a848f70-9996-eb17-15ed-21b8eb94bf0eDownloading file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_1.json.gz""
2014-03-08T12:26:25.1684905ZEngine000000000057INGESTOR_EXECUTER5a848f70-9996-eb17-15ed-21b8eb94bf0eIngestionCompletionEvent: finished ingestion file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_1.json.gz""

Add row number to the serialized table

To add a row number to the serialized table, use the row_number() function.

TraceLogs
| where ClientRequestId == "5a848f70-9996-eb17-15ed-21b8eb94bf0e"
| serialize rn = row_number()

Output

This table only shows the top 5 query results.

TimestamprnNodeComponentClientRequestIdMessage
2014-03-08T13:00:01.6638235Z1Engine000000000899INGESTOR_EXECUTER5a848f70-9996-eb17-15ed-21b8eb94bf0eIngestionCompletionEvent: finished ingestion file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_46.json.gz""
2014-03-08T13:00:02.2102992Z2Engine000000000899DOWNLOADER5a848f70-9996-eb17-15ed-21b8eb94bf0eDownloading file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_47.json.gz""
2014-03-08T13:00:46.4748309Z3Engine000000000584INGESTOR_EXECUTER5a848f70-9996-eb17-15ed-21b8eb94bf0eIngestionCompletionEvent: finished ingestion file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_47.json.gz""
2014-03-08T13:00:47.0213066Z4Engine000000000584DOWNLOADER5a848f70-9996-eb17-15ed-21b8eb94bf0eDownloading file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_48.json.gz""
2014-03-08T13:01:31.2858383Z5Engine000000000380INGESTOR_EXECUTER5a848f70-9996-eb17-15ed-21b8eb94bf0eIngestionCompletionEvent: finished ingestion file path: ““https://benchmarklogs3.blob.core.windows.net/benchmark/2014/IMAGINEFIRST0_1399_48.json.gz""

Serialization behavior of operators

The output row set of the following operators is marked as serialized.

The output row set of the following operators is marked as nonserialized.

All other operators preserve the serialization property. If the input row set is serialized, then the output row set is also serialized.

15.38 - Shuffle query

This article describes Shuffle query.

The shuffle query is a semantic-preserving transformation used with a set of operators that support the shuffle strategy. Depending on the data involved, querying with the shuffle strategy can yield better performance. It’s better to use the shuffle query strategy when the shuffle key (a join key, summarize key, make-series key or partition key) has a high cardinality and the regular operator query hits query limits.

You can use the following operators with the shuffle command:

To use the shuffle query strategy, add the expression hint.strategy = shuffle or hint.shufflekey = <key>. When you use hint.strategy=shuffle, the operator data will be shuffled by all the keys. Use this expression when the compound key is unique but each key isn’t unique enough, so you’ll shuffle the data using all the keys of the shuffled operator.

When partitioning data with the shuffle strategy, the data load is shared on all cluster nodes. Each node processes one partition of the data. The default number of partitions is equal to the number of cluster nodes.

The partition number can be overridden by using the syntax hint.num_partitions = total_partitions, which will control the number of partitions. This is useful when the cluster has a small number of cluster nodes and the default partitions number will be small, and the query fails or takes a long execution time.

In some cases, the hint.strategy = shuffle is ignored, and the query won’t run in shuffle strategy. This can happen when:

  • The join operator has another shuffle-compatible operator (join, summarize, make-series or partition) on the left side or the right side.
  • The summarize operator appears after another shuffle-compatible operator (join, summarize, make-series or partition) in the query.

Syntax

With hint.strategy = shuffle

T | DataExpression | join hint.strategy = shuffle ( DataExpression )

T | summarize hint.strategy = shuffle DataExpression

T | Query | partition hint.strategy = shuffle ( SubQuery )

With hint.shufflekey = key

T | DataExpression | join hint.shufflekey = key ( DataExpression )

T | summarize hint.shufflekey = key DataExpression

T | make-series hint.shufflekey = key DataExpression

T | Query | partition hint.shufflekey = key ( SubQuery )

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular source whose data is to be processed by the operator.
DataExpressionstringAn implicit or explicit tabular transformation expression.
QuerystringA transformation expression run on the records of T.
keystringUse a join key, summarize key, make-series key or partition key.
SubQuerystringA transformation expression.

Examples

The example in this section shows how to use the syntax to help you get started.

Use summarize with shuffle

The shuffle strategy query with summarize operator shares the load on all cluster nodes, where each node processes one partition of the data.

StormEvents
| summarize hint.strategy = shuffle count(), avg(InjuriesIndirect) by State
| count 

Output

Count
67

Use join with shuffle

StormEvents
| where State has "West"
| where EventType has "Flood"
| join hint.strategy=shuffle 
    (
    StormEvents
    | where EventType has "Hail"
    | project EpisodeId, State, DamageProperty
    )
    on State
| count

Output

Count
103

Use make-series with shuffle

StormEvents
| where State has "North"
| make-series hint.shufflekey = State sum(DamageProperty) default = 0 on StartTime in range(datetime(2007-01-01 00:00:00.0000000), datetime(2007-01-31 23:59:00.0000000), 15d) by State

Output

Statesum_DamagePropertyStartTime
NORTH DAKOTA[60000,0,0][“2006-12-31T00:00:00.0000000Z”,“2007-01-15T00:00:00.0000000Z”,“2007-01-30T00:00:00.0000000Z”]
NORTH CAROLINA[20000,0,1000][“2006-12-31T00:00:00.0000000Z”,“2007-01-15T00:00:00.0000000Z”,“2007-01-30T00:00:00.0000000Z”]
ATLANTIC NORTH[0,0,0][“2006-12-31T00:00:00.0000000Z”,“2007-01-15T00:00:00.0000000Z”,“2007-01-30T00:00:00.0000000Z”]

Use partition with shuffle

StormEvents
| partition hint.strategy=shuffle by EpisodeId
(
    top 3 by DamageProperty
    | project EpisodeId, State, DamageProperty
)
| count

Output

Count
22345

Compare hint.strategy=shuffle and hint.shufflekey=key

When you use hint.strategy=shuffle, the shuffled operator will be shuffled by all the keys. In the following example, the query shuffles the data using both EpisodeId and EventId as keys:

StormEvents
| where StartTime > datetime(2007-01-01 00:00:00.0000000)
| join kind = inner hint.strategy=shuffle (StormEvents | where DamageCrops > 62000000) on EpisodeId, EventId
| count

Output

Count
14

The following query uses hint.shufflekey = key. The query above is equivalent to this query.

StormEvents
| where StartTime > datetime(2007-01-01 00:00:00.0000000)
| join kind = inner hint.shufflekey = EpisodeId hint.shufflekey = EventId (StormEvents | where DamageCrops > 62000000) on EpisodeId, EventId

Output

Count
14

Shuffle the data with multiple keys

In some cases, the hint.strategy=shuffle will be ignored, and the query won’t run in shuffle strategy. For example, in the following example, the join has summarize on its left side, so using hint.strategy=shuffle won’t apply shuffle strategy to the query:

StormEvents
| where StartTime > datetime(2007-01-01 00:00:00.0000000)
| summarize count() by EpisodeId, EventId
| join kind = inner hint.strategy=shuffle (StormEvents | where DamageCrops > 62000000) on EpisodeId, EventId

Output

EpisodeIdEventIdEpisodeId1EventId1
1030440710304407
103013721103013721
247712530247712530
210310237210310237
210310239210310239

To overcome this issue and run in shuffle strategy, choose the key that is common for the summarize and join operations. In this case, this key is EpisodeId. Use the hint hint.shufflekey to specify the shuffle key on the join to hint.shufflekey = EpisodeId:

StormEvents
| where StartTime > datetime(2007-01-01 00:00:00.0000000)
| summarize count() by EpisodeId, EventId
| join kind = inner hint.shufflekey=EpisodeId (StormEvents | where DamageCrops > 62000000) on EpisodeId, EventId

Output

EpisodeIdEventIdEpisodeId1EventId1
1030440710304407
103013721103013721
247712530247712530
210310237210310237
210310239210310239

Use summarize with shuffle to improve performance

In this example, using the summarize operator with shuffle strategy improves performance. The source table has 150M records and the cardinality of the group by key is 10M, which is spread over 10 cluster nodes.

Using summarize operator without shuffle strategy, the query ends after 1:08 and the memory usage peak is ~3 GB:

orders
| summarize arg_max(o_orderdate, o_totalprice) by o_custkey 
| where o_totalprice < 1000
| count

Output

Count
1086

While using shuffle strategy with summarize, the query ends after ~7 seconds and the memory usage peak is 0.43 GB:

orders
| summarize hint.strategy = shuffle arg_max(o_orderdate, o_totalprice) by o_custkey 
| where o_totalprice < 1000
| count

Output

Count
1086

The following example demonstrates performance on a cluster that has two cluster nodes, with a table that has 60M records, where the cardinality of the group by key is 2M.

Running the query without hint.num_partitions will use only two partitions (as cluster nodes number) and the following query will take ~1:10 mins:

lineitem 
| summarize hint.strategy = shuffle dcount(l_comment), dcount(l_shipdate) by l_partkey 
| consume

If setting the partitions number to 10, the query will end after 23 seconds:

lineitem 
| summarize hint.strategy = shuffle hint.num_partitions = 10 dcount(l_comment), dcount(l_shipdate) by l_partkey 
| consume

Use join with shuffle to improve performance

The following example shows how using shuffle strategy with the join operator improves performance.

The examples were sampled on a cluster with 10 nodes where the data is spread over all these nodes.

The query’s left-side source table has 15M records where the cardinality of the join key is ~14M. The query’s right-side source has 150M records and the cardinality of the join key is 10M. The query ends after ~28 seconds and the memory usage peak is 1.43 GB:

customer
| join
    orders
on $left.c_custkey == $right.o_custkey
| summarize sum(c_acctbal) by c_nationkey

When using shuffle strategy with a join operator, the query ends after ~4 seconds and the memory usage peak is 0.3 GB:

customer
| join
    hint.strategy = shuffle orders
on $left.c_custkey == $right.o_custkey
| summarize sum(c_acctbal) by c_nationkey

In another example, we try the same queries on a larger dataset with the following conditions:

  • Left-side source of the join is 150M and the cardinality of the key is 148M.
  • Right-side source of the join is 1.5B, and the cardinality of the key is ~100M.

The query with just the join operator hits limits and times-out after 4 mins. However, when using shuffle strategy with the join operator, the query ends after ~34 seconds and the memory usage peak is 1.23 GB.

The following example shows the improvement on a cluster that has two cluster nodes, with a table of 60M records, where the cardinality of the join key is 2M. Running the query without hint.num_partitions will use only two partitions (as cluster nodes number) and the following query will take ~1:10 mins:

lineitem
| summarize dcount(l_comment), dcount(l_shipdate) by l_partkey
| join
    hint.shufflekey = l_partkey   part
on $left.l_partkey == $right.p_partkey
| consume

When setting the partitions number to 10, the query will end after 23 seconds:

lineitem
| summarize dcount(l_comment), dcount(l_shipdate) by l_partkey
| join
    hint.shufflekey = l_partkey  hint.num_partitions = 10    part
on $left.l_partkey == $right.p_partkey
| consume

15.39 - sort operator

Learn how to use the sort operator to sort the rows of the input table by one or more columns.

Sorts the rows of the input table into order by one or more columns.

Syntax

T | sort by column [asc | desc] [nulls first | nulls last] [, …]

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input to sort.
columnscalar✔️The column of T by which to sort. The type of the column values must be numeric, date, time or string.
asc or descstringasc sorts into ascending order, low to high. Default is desc, high to low.
nulls first or nulls laststringnulls first will place the null values at the beginning and nulls last will place the null values at the end. Default for asc is nulls first. Default for desc is nulls last.

Returns

A copy of the input table sorted in either ascending or descending order based on the provided column.

Using special floating-point values

When the input table contains the special values null, NaN, -inf and +inf, the order will be as follows:

ValueAscendingDescending
Nulls firstnull,NaN,-inf,-5,0,5,+infnull,NaN,+inf,5,0,-5
Nulls last-inf,-5,0,+inf,NaN,null+inf,5,0,-5,NaN,null

Example

The following example shows storm events by state in alphabetical order with the most recent storms in each state appearing first.

StormEvents
| sort by State asc, StartTime desc

Output

This table only shows the top 10 query results.

StartTimeStateEventType
2007-12-28T12:10:00ZALABAMAHail
2007-12-28T04:30:00ZALABAMAHail
2007-12-28T04:16:00ZALABAMAHail
2007-12-28T04:15:00ZALABAMAHail
2007-12-28T04:13:00ZALABAMAHail
2007-12-21T14:30:00ZALABAMAStrong Wind
2007-12-20T18:15:00ZALABAMAStrong Wind
2007-12-20T18:00:00ZALABAMAStrong Wind
2007-12-20T18:00:00ZALABAMAStrong Wind
2007-12-20T17:45:00ZALABAMAStrong Wind
2007-12-20T17:45:00ZALABAMAStrong Wind

15.40 - take operator

Learn how to use the take operator to return a specified number of rows.

Return up to the specified number of rows.

There is no guarantee which records are returned, unless the source data is sorted. If the data is sorted, then the top values will be returned.

Syntax

take NumberOfRows

Parameters

NameTypeRequiredDescription
NumberOfRowsint✔️The number of rows to return.

Paging of query results

Methods for implementing paging include:

  • Export the result of a query to an external storage and paging through the generated data.
  • Write a middle-tier application that provides a stateful paging API by caching the results of a Kusto query.
  • Use pagination in Stored query results

Example

StormEvents | take 5

15.41 - top operator

Learn how to use the top operator to return the first specified number of records sorted by the specified column.

Returns the first N records sorted by the specified column.

Syntax

T | top NumberOfRows by Expression [asc | desc] [nulls first | nulls last]

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input to sort.
NumberOfRowsint✔️The number of rows of T to return.
Expressionstring✔️The scalar expression by which to sort.
asc or descstringControls whether the selection is from the “bottom” or “top” of the range. Default desc.
nulls first or nulls laststringControls whether null values appear at the “bottom” or “top” of the range. Default for asc is nulls first. Default for desc is nulls last.

Example

Show top three storms with most direct injuries.

StormEvents
| top 3 by InjuriesDirect

The below table shows only the relevant column. Run the query above to see more storm details for these events.

InjuriesDirect
519
422
200
  • Use top-nested operator to produce hierarchical (nested) top results.

15.42 - top-hitters operator

Learn how to use the top-hitters operator to return an approximation for the most popular distinct values in the input.

Returns an approximation for the most popular distinct values, or the values with the largest sum, in the input.

Syntax

T | top-hitters NumberOfValues of ValueExpression [ by SummingExpression ]

Parameters

NameTypeRequiredDescription
Tstring✔️The input tabular expression.
NumberOfValuesint, long, or real✔️The number of distinct values of ValueExpression.
ValueExpressionstring✔️An expression over the input table T whose distinct values are returned.
SummingExpressionstringIf specified, a numeric expression over the input table T whose sum per distinct value of ValueExpression establishes which values to emit. If not specified, the count of each distinct value of ValueExpression is used instead.

Remarks

The first syntax (no SummingExpression) is conceptually equivalent to:

T | summarize C``=``count() by ValueExpression | top NumberOfValues by C desc

The second syntax (with SummingExpression) is conceptually equivalent to:

T | summarize S``=``sum(*SummingExpression*) by ValueExpression | top NumberOfValues by S desc

Examples

Get most frequent items

StormEvents
| top-hitters 5 of EventType 

Output

EventTypeapproximate_count_EventType
Thunderstorm Wind13015
Hail12711
Flash Flood3688
Drought3616
Winter Weather3349

Get top hitters based on column value

The next example shows how to find the States with the most “Thunderstorm Wind” events.

StormEvents
| where EventType == "Thunderstorm Wind"
| top-hitters 10 of State 

Output

Stateapproximate_sum_State
TEXAS830
GEORGIA609
MICHIGAN602
IOWA585
PENNSYLVANIA549
ILLINOIS533
NEW YORK502
VIRGINIA482
KANSAS476
OHIO455

15.43 - top-nested operator

Learn how to use the top-nested operator to produce a hierarchical aggregation.

The top-nested operator performs hierarchical aggregation and value selection.

Imagine you have a table with sales information like regions, salespeople, and amounts sold. The top-nested operator can help you answer complex questions, such as “What are the top five regions by sales, and who are the top three salespeople in each of those regions?”

The source data is partitioned based on the criteria set in the first top-nested clause, such as region. Next, the operator picks the top records in each partition using an aggregation, such as adding sales amounts. Each subsequent top-nested clause refines the partitions created by the previous clause, creating a hierarchy of more precise groups.

The result is a table with two columns per clause. One column holds the partitioning values, such as region, while the other column holds the outcomes of the aggregation calculation, like the sum of sales.

Syntax

T | top-nested [ N ] of Expr [with others = ConstExpr] by Aggregation [asc | desc] [,
  top-nested … ]

Parameters

NameTypeRequiredDescription
Tstring✔️The input tabular expression.
NintThe number of top values to be returned for this hierarchy level. If omitted, all distinct values are returned.
Exprstring✔️An expression over the input record indicating which value to return for this hierarchy level. Typically, it refers to a column from T or involves a calculation like bin() on a column. Optionally, set an output column name as Name = Expr.
ConstExprstringIf specified, for each hierarchy level, one record is added with the value that is the aggregation over all records that didn’t make it to the top.
AggregationstringThe aggregation function applied to records with the same Expr value. The result determines the top records. See Supported aggregation functions. Optionally, set an output column name as Name = Aggregation.

Supported aggregation functions

The following aggregation functions are supported:

Returns

A table with two columns for each clause. One column contains unique values computed using Expr, and the other column shows the results obtained from the Aggregation calculation.

Using the with others clause

Using the top-nested operator with with others adds the ability to see your top content contextualized in a wider data set. Evaluating your data in this way is valuable when rendering the data visually.

Include data from other columns

Only columns specified as a top-nested clause Expr are displayed in the output table.

To include all values of a column at a specific level:

  1. Don’t specify the value of N.
  2. Use the column name as the value of Expr.
  3. Use Ignore=max(1) as the value of Aggregation.
  4. Remove the unnecessary Ignore column with project-away.

For an example, see Most recent events per state with other column data.

Performance considerations

The number of records can grow exponentially with the number of top-nested clauses, and record growth is even faster if the N parameter is not specified. This operator can consume a considerable amount of resources.

If the aggregation distribution is irregular, limit the number of distinct values to return by specifying N. Then, use the with others = ConstExpr clause to get a sense of the weight of all other cases.

Examples

Top damaged states, event types, and end locations by property damage

The following query partitions the StormEvents table by the State column and calculates the total property damage for each state. The query selects the top two states with the largest amount of property damage. Within these top two states, the query groups the data by EventType and selects the top three event types with the most damage. Then the query groups the data by EndLocation and selects the EndLocation with the highest damage. Only one EndLocation value appears in the results, possibly due to the large nature of the storm events or not documenting the end location.

StormEvents  // Data source.
| top-nested 2 of State by sum(DamageProperty),       // Top 2 States by total damaged property.
  top-nested 3 of EventType by sum(DamageProperty),   // Top 3 EventType by total damaged property for each State.
  top-nested 1 of EndLocation by sum(DamageProperty)  // Top 1 EndLocation by total damaged property for each EventType and State.
| project State, EventType, EndLocation, StateTotalDamage = aggregated_State, EventTypeTotalDamage = aggregated_EventType, EndLocationDamage = aggregated_EndLocation

Output

StateEventTypeEndLocationStateTotalDamageEventTypeTotalDamageEndLocationDamage
CALIFORNIAWildfire144593760013263150001326315000
CALIFORNIAHighWind14459376006132000061320000
CALIFORNIADebrisFlow14459376004800000048000000
OKLAHOMAIceStorm915470300826000000826000000
OKLAHOMAWinterStorm9154703004002700040027000
OKLAHOMAFloodCOMMERCE9154703002148500020000000

Top five states with property damage with others grouped

The following example uses the top-nested operator to identify the top five states with the most property damage and uses the with others clause to group damaged property for all other states. It then visualizes damaged property for the top five states and all other states as a piechart using the render command.

StormEvents
| top-nested 5 of State with others="OtherStates" by sum(DamageProperty)
| render piechart  

Output

Screenshot of the top five states with the most property damaged, and all other states grouped separately rendered as a pie-chart.

Most recent events per state with other column data

The following query retrieves the two most recent events for each US state with relevant event details. It uses max(1) within certain columns to propagate data without using the top-nested selection logic. The generated Ignore aggregation columns are removed using project-away.

StormEvents
| top-nested of State by Ignore0=max(1),                  // Partition the data by each unique value of state.
  top-nested 2 of StartTime by Ignore1=max(StartTime),    // Get the 2 most recent events in each state.
  top-nested of EndTime by Ignore2=max(1),                // Append the EndTime for each event.
  top-nested of EpisodeId by Ignore3=max(1)               // Append the EpisodeId for each event.
| project-away Ignore*                                    // Remove the unnecessary aggregation columns.
| order by State asc, StartTime desc                      // Sort results alphabetically and chronologically.

Latest records per identity with other column data

The following top-nested example extracts the latest records per identity and builds on the concepts introduced in the previous example. The first top-nested clause partitions the data by distinct values of id using Ignore0=max(1) as a placeholder. For each id, it identifies the two most recent records based on the timestamp. Other information is appended using a top-nested operator without specifying a count and using Ignore2=max(1) as a placeholder. Finally, unnecessary aggregation columns are removed using the project-away operator.

datatable(id: string, timestamp: datetime, otherInformation: string) // Create a source datatable.
[
    "Barak", datetime(2015-01-01), "1",
    "Barak", datetime(2016-01-01), "2",
    "Barak", datetime(2017-01-20), "3",
    "Donald", datetime(2017-01-20), "4",
    "Donald", datetime(2017-01-18), "5",
    "Donald", datetime(2017-01-19), "6"
]
| top-nested of id by Ignore0=max(1),                     // Partition the data by each unique value of id.
  top-nested 2 of timestamp by Ignore1=max(timestamp),    // Get the 2 most recent events for each state.
  top-nested of otherInformation by Ignore2=max(1)        // Append otherInformation for each event.
| project-away Ignore0, Ignore1, Ignore2                  // Remove the unnecessary aggregation columns.

Output

idtimestampotherInformation
Barak2016-01-01T00:00:00Z2
Donald2017-01-19T00:00:00Z6
Barak2017-01-20T00:00:00Z3
Donald2017-01-20T00:00:00Z4

15.44 - union operator

This article describes union operator.

Takes two or more tables and returns the rows of all of them.

Syntax

[ T | ] union [ UnionParameters ] [kind= inner|outer] [withsource= ColumnName] [isfuzzy= true|false] Tables

[ T | ] union [kind= inner|outer] [withsource= ColumnName] [isfuzzy= true|false] Tables

Parameters

NameTypeRequiredDescription
TstringThe input tabular expression.
UnionParametersstringZero or more space-separated parameters in the form of Name = Value that control the behavior of the row-match operation and execution plan. See supported union parameters.
kindstringEither inner or outer. inner causes the result to have the subset of columns that are common to all of the input tables. outer causes the result to have all the columns that occur in any of the inputs. Cells that aren’t defined by an input row are set to null. The default is outer.

With outer, the result has all the columns that occur in any of the inputs, one column for each name and type occurrences. This means that if a column appears in multiple tables and has multiple types, it has a corresponding column for each type in the union’s result. This column name is suffixed with a ‘_’ followed by the origin column type.
withsource=ColumnNamestringIf specified, the output includes a column called ColumnName whose value indicates which source table has contributed each row. If the query effectively references tables from more than one database including the default database, then the value of this column has a table name qualified with the database. cluster and database qualifications are present in the value if more than one cluster is referenced.
isfuzzyboolIf set to true, allows fuzzy resolution of union legs. The set of union sources is reduced to the set of table references that exist and are accessible at the time while analyzing the query and preparing for execution. If at least one such table was found, any resolution failure yields a warning in the query status results, but won’t prevent the query execution. If no resolutions were successful, the query returns an error. The default is false.

isfuzzy=true only applies to the union sources resolution phase. Once the set of source tables is determined, possible additional query failures won’t be suppressed.
TablesstringOne or more comma-separated table references, a query expression enclosed with parenthesis, or a set of tables specified with a wildcard. For example, E* would form the union of all the tables in the database whose names begin E.

Supported union parameters

NameTypeRequiredDescription
hint.concurrencyintHints the system how many concurrent subqueries of the union operator should be executed in parallel. The default is the number of CPU cores on the single node of the cluster (2 to 16).
hint.spreadintHints the system how many nodes should be used by the concurrent union subqueries execution. The default is 1.
NameTypeRequiredDescription
TstringThe input tabular expression.
kindstringEither inner or outer. inner causes the result to have the subset of columns that are common to all of the input tables. outer causes the result to have all the columns that occur in any of the inputs. Cells that aren’t defined by an input row are set to null. The default is outer.

With outer, the result has all the columns that occur in any of the inputs, one column for each name and type occurrences. This means that if a column appears in multiple tables and has multiple types, it has a corresponding column for each type in the union’s result. This column name is suffixed with a ‘_’ followed by the origin column type.
withsource=ColumnNamestringIf specified, the output includes a column called ColumnName whose value indicates which source table has contributed each row. If the query effectively references tables from more than one database including the default database, then the value of this column has a table name qualified with the database. cluster and database qualifications are present in the value if more than one cluster is referenced.
isfuzzyboolIf set to true, allows fuzzy resolution of union legs. The set of union sources is reduced to the set of table references that exist and are accessible at the time while analyzing the query and preparing for execution. If at least one such table was found, any resolution failure yields a warning in the query status results, but won’t prevent the query execution. If no resolutions were successful, the query returns an error. However, in cross-workspace and cross-app queries, if any of the workspaces or apps is not found, the query will fail. The default is false.

isfuzzy=true only applies to the union sources resolution phase. Once the set of source tables is determined, possible additional query failures won’t be suppressed.
TablesstringOne or more comma-separated table references, a query expression enclosed with parenthesis, or a set of tables specified with a wildcard. For example, E* would form the union of all the tables in the database whose names begin E.

Whenever the list of tables is known, refrain from using wildcards. Some workspaces contains very large number of tables that would lead to inefficient execution. Tables may also be added over time leading to unpredicted results.

Returns

A table with as many rows as there are in all the input tables.

Examples

Tables with string in name or column

union K* | where * has "Kusto"

Rows from all tables in the database whose name starts with K, and in which any column includes the word Kusto.

Distinct count

union withsource=SourceTable kind=outer Query, Command
| where Timestamp > ago(1d)
| summarize dcount(UserId)

The number of distinct users that have produced either a Query event or a Command event over the past day. In the result, the ‘SourceTable’ column will indicate either “Query” or “Command”.

Query
| where Timestamp > ago(1d)
| union withsource=SourceTable kind=outer 
   (Command | where Timestamp > ago(1d))
| summarize dcount(UserId)

This more efficient version produces the same result. It filters each table before creating the union.

Using isfuzzy=true

// Using union isfuzzy=true to access non-existing view:                                     
let View_1 = view () { print x=1 };
let View_2 = view () { print x=1 };
let OtherView_1 = view () { print x=1 };
union isfuzzy=true
(View_1 | where x > 0), 
(View_2 | where x > 0),
(View_3 | where x > 0)
| count 

Output

Count
2

Observing Query Status - the following warning returned: Failed to resolve entity 'View_3'

// Using union isfuzzy=true and wildcard access:
let View_1 = view () { print x=1 };
let View_2 = view () { print x=1 };
let OtherView_1 = view () { print x=1 };
union isfuzzy=true View*, SomeView*, OtherView*
| count 

Output

Count
3

Observing Query Status - the following warning returned: Failed to resolve entity 'SomeView*'

Source columns types mismatch

let View_1 = view () { print x=1 };
let View_2 = view () { print x=toint(2) };
union withsource=TableName View_1, View_2

Output

TableNamex_longx_int
View_11
View_22
let View_1 = view () { print x=1 };
let View_2 = view () { print x=toint(2) };
let View_3 = view () { print x_long=3 };
union withsource=TableName View_1, View_2, View_3 

Output

TableNamex_long1x_intx_long
View_11
View_22
View_33

Column x from View_1 received the suffix _long, and as a column named x_long already exists in the result schema, the column names were de-duplicated, producing a new column- x_long1

15.45 - where operator

Learn how to use the where operator to filter a table to the subset of rows that satisfy a predicate.

Filters a table to the subset of rows that satisfy a predicate.

Syntax

T | where Predicate

Parameters

NameTypeRequiredDescription
Tstring✔️Tabular input whose records are to be filtered.
Predicatestring✔️Expression that evaluates to a bool for each row in T.

Returns

Rows in T for which Predicate is true.

Performance tips

  • Use simple comparisons between column names and constants. (‘Constant’ means constant over the table - so now() and ago() are OK, and so are scalar values assigned using a let statement.)

    For example, prefer where Timestamp >= ago(1d) to where bin(Timestamp, 1d) == ago(1d).

  • Simplest terms first: If you have multiple clauses conjoined with and, put first the clauses that involve just one column. So Timestamp > ago(1d) and OpId == EventId is better than the other way around.

For more information, see the summary of available String operators and the summary of available Numerical operators.

Examples

Order comparisons by complexity

The following query returns storm records that report damaged property, are floods, and start and end in different places.

Notice that we put the comparison between two columns last, as the where operator can’t use the index and forces a scan.

StormEvents
| project DamageProperty, EventType, BeginLocation, EndLocation
| where DamageProperty > 0
    and EventType == "Flood"
    and BeginLocation != EndLocation 

The following table only shows the top 10 results. To see the full output, run the query.

DamagePropertyEventTypeBeginLocationEndLocation
5000FloodFAYETTE CITY LOWBER
5000FloodMORRISVILLE WEST WAYNESBURG
10000FloodCOPELAND HARRIS GROVE
5000FloodGLENFORD MT PERRY
25000FloodEAST SENECA BUFFALO AIRPARK ARPT
20000FloodEBENEZER SLOAN
10000FloodBUEL CALHOUN
10000FloodGOODHOPE WEST MILFORD
5000FloodDUNKIRK FOREST
20000FloodFARMINGTON MANNINGTON

Check if column contains string

The following query returns the rows in which the word “cow” appears in any column.

StormEvents
| where * has "cow"

16 - Time series analysis

16.1 - Example use cases

16.1.1 - Analyze time series data

Learn how to analyze time series data.

Cloud services and IoT devices generate telemetry data that can be used to gain insights such as monitoring service health, physical production processes, and usage trends. Performing time series analysis is one way to identify deviations in the pattern of these metrics compared to their typical baseline pattern.

Kusto Query Language (KQL) contains native support for creation, manipulation, and analysis of multiple time series. In this article, learn how KQL is used to create and analyze thousands of time series in seconds, enabling near real-time monitoring solutions and workflows.

Time series creation

In this section, we’ll create a large set of regular time series simply and intuitively using the make-series operator, and fill-in missing values as needed. The first step in time series analysis is to partition and transform the original telemetry table to a set of time series. The table usually contains a timestamp column, contextual dimensions, and optional metrics. The dimensions are used to partition the data. The goal is to create thousands of time series per partition at regular time intervals.

The input table demo_make_series1 contains 600K records of arbitrary web service traffic. Use the following command to sample 10 records:

demo_make_series1 | take 10 

The resulting table contains a timestamp column, three contextual dimensions columns, and no metrics:

TimeStampBrowserVerOsVerCountry/Region
2016-08-25 09:12:35.4020000Chrome 51.0Windows 7United Kingdom
2016-08-25 09:12:41.1120000Chrome 52.0Windows 10
2016-08-25 09:12:46.2300000Chrome 52.0Windows 7United Kingdom
2016-08-25 09:12:46.5100000Chrome 52.0Windows 10United Kingdom
2016-08-25 09:12:46.5570000Chrome 52.0Windows 10Republic of Lithuania
2016-08-25 09:12:47.0470000Chrome 52.0Windows 8.1India
2016-08-25 09:12:51.3600000Chrome 52.0Windows 10United Kingdom
2016-08-25 09:12:51.6930000Chrome 52.0Windows 7Netherlands
2016-08-25 09:12:56.4240000Chrome 52.0Windows 10United Kingdom
2016-08-25 09:13:08.7230000Chrome 52.0Windows 10India

Since there are no metrics, we can only build a set of time series representing the traffic count itself, partitioned by OS using the following query:

let min_t = toscalar(demo_make_series1 | summarize min(TimeStamp));
let max_t = toscalar(demo_make_series1 | summarize max(TimeStamp));
demo_make_series1
| make-series num=count() default=0 on TimeStamp from min_t to max_t step 1h by OsVer
| render timechart 
  • Use the make-series operator to create a set of three time series, where:
  • The actual time series data structure is a numeric array of the aggregated value per each time bin. We use render timechart for visualization.

In the table above, we have three partitions. We can create a separate time series: Windows 10 (red), 7 (blue) and 8.1 (green) for each OS version as seen in the graph:

Time series partition.

Time series analysis functions

In this section, we’ll perform typical series processing functions. Once a set of time series is created, KQL supports a growing list of functions to process and analyze them. We’ll describe a few representative functions for processing and analyzing time series.

Filtering

Filtering is a common practice in signal processing and useful for time series processing tasks (for example, smooth a noisy signal, change detection).

  • There are two generic filtering functions:
    • series_fir(): Applying FIR filter. Used for simple calculation of moving average and differentiation of the time series for change detection.
    • series_iir(): Applying IIR filter. Used for exponential smoothing and cumulative sum.
  • Extend the time series set by adding a new moving average series of size 5 bins (named ma_num) to the query:
let min_t = toscalar(demo_make_series1 | summarize min(TimeStamp));
let max_t = toscalar(demo_make_series1 | summarize max(TimeStamp));
demo_make_series1
| make-series num=count() default=0 on TimeStamp from min_t to max_t step 1h by OsVer
| extend ma_num=series_fir(num, repeat(1, 5), true, true)
| render timechart

Time series filtering.

Regression analysis

A segmented linear regression analysis can be used to estimate the trend of the time series.

  • Use series_fit_line() to fit the best line to a time series for general trend detection.
  • Use series_fit_2lines() to detect trend changes, relative to the baseline, that are useful in monitoring scenarios.

Example of series_fit_line() and series_fit_2lines() functions in a time series query:

demo_series2
| extend series_fit_2lines(y), series_fit_line(y)
| render linechart with(xcolumn=x)

Time series regression.

  • Blue: original time series
  • Green: fitted line
  • Red: two fitted lines

Seasonality detection

Many metrics follow seasonal (periodic) patterns. User traffic of cloud services usually contains daily and weekly patterns that are highest around the middle of the business day and lowest at night and over the weekend. IoT sensors measure in periodic intervals. Physical measurements such as temperature, pressure, or humidity may also show seasonal behavior.

The following example applies seasonality detection on one month traffic of a web service (2-hour bins):

demo_series3
| render timechart 

Time series seasonality.

demo_series3
| project (periods, scores) = series_periods_detect(num, 0., 14d/2h, 2) //to detect the periods in the time series
| mv-expand periods, scores
| extend days=2h*todouble(periods)/1d
periodsscoresdays
840.8206227860555957
120.7646014058035021

The function detects daily and weekly seasonality. The daily scores less than the weekly because weekend days are different from weekdays.

Element-wise functions

Arithmetic and logical operations can be done on a time series. Using series_subtract() we can calculate a residual time series, that is, the difference between original raw metric and a smoothed one, and look for anomalies in the residual signal:

let min_t = toscalar(demo_make_series1 | summarize min(TimeStamp));
let max_t = toscalar(demo_make_series1 | summarize max(TimeStamp));
demo_make_series1
| make-series num=count() default=0 on TimeStamp in from min_t to max_t step 1h by OsVer
| extend ma_num=series_fir(num, repeat(1, 5), true, true)
| extend residual_num=series_subtract(num, ma_num) //to calculate residual time series
| where OsVer == "Windows 10"   // filter on Win 10 to visualize a cleaner chart 
| render timechart

Time series operations.

  • Blue: original time series
  • Red: smoothed time series
  • Green: residual time series

Time series workflow at scale

The example below shows how these functions can run at scale on thousands of time series in seconds for anomaly detection. To see a few sample telemetry records of a DB service’s read count metric over four days run the following query:

demo_many_series1
| take 4 
TIMESTAMPLocOpDBDataRead
2016-09-11 21:00:00.0000000Loc 951178539340496300892620
2016-09-11 21:00:00.0000000Loc 951178539340496300892410
2016-09-11 21:00:00.0000000Loc 9-865998331941149874262279862
2016-09-11 21:00:00.0000000Loc 93719217345637834102550

And simple statistics:

demo_many_series1
| summarize num=count(), min_t=min(TIMESTAMP), max_t=max(TIMESTAMP) 
nummin_tmax_t
21774722016-09-08 00:00:00.00000002016-09-11 23:00:00.0000000

Building a time series in 1-hour bins of the read metric (total four days * 24 hours = 96 points), results in normal pattern fluctuation:

let min_t = toscalar(demo_many_series1 | summarize min(TIMESTAMP));  
let max_t = toscalar(demo_many_series1 | summarize max(TIMESTAMP));  
demo_many_series1
| make-series reads=avg(DataRead) on TIMESTAMP from min_t to max_t step 1h
| render timechart with(ymin=0) 

Time series at scale.

The above behavior is misleading, since the single normal time series is aggregated from thousands of different instances that may have abnormal patterns. Therefore, we create a time series per instance. An instance is defined by Loc (location), Op (operation), and DB (specific machine).

How many time series can we create?

demo_many_series1
| summarize by Loc, Op, DB
| count
Count
18339

Now, we’re going to create a set of 18339 time series of the read count metric. We add the by clause to the make-series statement, apply linear regression, and select the top two time series that had the most significant decreasing trend:

let min_t = toscalar(demo_many_series1 | summarize min(TIMESTAMP));  
let max_t = toscalar(demo_many_series1 | summarize max(TIMESTAMP));  
demo_many_series1
| make-series reads=avg(DataRead) on TIMESTAMP from min_t to max_t step 1h by Loc, Op, DB
| extend (rsquare, slope) = series_fit_line(reads)
| top 2 by slope asc 
| render timechart with(title='Service Traffic Outage for 2 instances (out of 18339)')

Time series top two.

Display the instances:

let min_t = toscalar(demo_many_series1 | summarize min(TIMESTAMP));  
let max_t = toscalar(demo_many_series1 | summarize max(TIMESTAMP));  
demo_many_series1
| make-series reads=avg(DataRead) on TIMESTAMP from min_t to max_t step 1h by Loc, Op, DB
| extend (rsquare, slope) = series_fit_line(reads)
| top 2 by slope asc
| project Loc, Op, DB, slope 
LocOpDBslope
Loc 15371151-102743.910227889
Loc 13371249-86303.2334644601

In less than two minutes, close to 20,000 time series were analyzed and two abnormal time series in which the read count suddenly dropped were detected.

These advanced capabilities combined with fast performance supply a unique and powerful solution for time series analysis.

16.1.2 - Anomaly diagnosis for root cause analysis

Use machine learning clustering for Root Cause Analysis.

Kusto Query Language (KQL) has built-in anomaly detection and forecasting functions to check for anomalous behavior. Once such a pattern is detected, a Root Cause Analysis (RCA) can be run to mitigate or resolve the anomaly.

The diagnosis process is complex and lengthy, and done by domain experts. The process includes:

  • Fetching and joining more data from different sources for the same time frame
  • Looking for changes in the distribution of values on multiple dimensions
  • Charting more variables
  • Other techniques based on domain knowledge and intuition

Since these diagnosis scenarios are common, machine learning plugins are available to make the diagnosis phase easier, and shorten the duration of the RCA.

All three of the following Machine Learning plugins implement clustering algorithms: autocluster, basket, and diffpatterns. The autocluster and basket plugins cluster a single record set, and the diffpatterns plugin clusters the differences between two record sets.

Clustering a single record set

A common scenario includes a dataset selected by a specific criteria such as:

  • Time window that shows anomalous behavior
  • High temperature device readings
  • Long duration commands
  • Top spending users

You want a fast and easy way to find common patterns (segments) in the data. Patterns are a subset of the dataset whose records share the same values over multiple dimensions (categorical columns).

The following query builds and shows a time series of service exceptions over the period of a week, in ten-minute bins:

let min_t = toscalar(demo_clustering1 | summarize min(PreciseTimeStamp));  
let max_t = toscalar(demo_clustering1 | summarize max(PreciseTimeStamp));  
demo_clustering1
| make-series num=count() on PreciseTimeStamp from min_t to max_t step 10m
| render timechart with(title="Service exceptions over a week, 10 minutes resolution")

Service exceptions timechart.

The service exception count correlates with the overall service traffic. You can clearly see the daily pattern for business days, Monday to Friday. There’s a rise in service exception counts at mid-day, and drops in counts during the night. Flat low counts are visible over the weekend. Exception spikes can be detected using time series anomaly detection.

The second spike in the data occurs on Tuesday afternoon. The following query is used to further diagnose and verify whether it’s a sharp spike. The query redraws the chart around the spike in a higher resolution of eight hours in one-minute bins. You can then study its borders.

let min_t=datetime(2016-08-23 11:00);
demo_clustering1
| make-series num=count() on PreciseTimeStamp from min_t to min_t+8h step 1m
| render timechart with(title="Zoom on the 2nd spike, 1 minute resolution")

Focus on spike timechart.

You see a narrow two-minute spike from 15:00 to 15:02. In the following query, count the exceptions in this two-minute window:

let min_peak_t=datetime(2016-08-23 15:00);
let max_peak_t=datetime(2016-08-23 15:02);
demo_clustering1
| where PreciseTimeStamp between(min_peak_t..max_peak_t)
| count
Count
972

In the following query, sample 20 exceptions out of 972:

let min_peak_t=datetime(2016-08-23 15:00);
let max_peak_t=datetime(2016-08-23 15:02);
demo_clustering1
| where PreciseTimeStamp between(min_peak_t..max_peak_t)
| take 20
PreciseTimeStampRegionScaleUnitDeploymentIdTracepointServiceHost
2016-08-23 15:00:08.7302460scussu59dbd1b161d5b4779a73cf19a7836ebd610000500000000-0000-0000-0000-000000000000
2016-08-23 15:00:09.9496584scussu59dbd1b161d5b4779a73cf19a7836ebd6100070068d257da1-7a1c-44f5-9acd-f9e02ff507fd
2016-08-23 15:00:10.5911748scussu59dbd1b161d5b4779a73cf19a7836ebd610000500000000-0000-0000-0000-000000000000
2016-08-23 15:00:12.2957912scussu59dbd1b161d5b4779a73cf19a7836ebd610007007f855fcef-ebfe-405d-aaf8-9c5e2e43d862
2016-08-23 15:00:18.5955357scussu59dbd1b161d5b4779a73cf19a7836ebd6100070069d390e07-417d-42eb-bebd-793965189a28
2016-08-23 15:00:20.7444854scussu59dbd1b161d5b4779a73cf19a7836ebd6100070066e54c1c8-42d3-4e4e-8b79-9bb076ca71f1
2016-08-23 15:00:23.8694999eus2su289e2f62a73bb4efd8f545aeae40d7e513610919422243-19b9-4d85-9ca6-bc961861d287
2016-08-23 15:00:26.4271786ncussu1e24ef436e02b4823ac5d5b1465a9401e361093271bae4-1c5b-4f73-98ef-cc117e9be914
2016-08-23 15:00:27.8958124scussu390d3d2fc7ecc430c9621ece335651a019044988cf38575-fca9-48ca-bd7c-21196f6d6765
2016-08-23 15:00:32.9884969scussu390d3d2fc7ecc430c9621ece335651a0110007007d5c7c825-9d46-4ab7-a0c1-8e2ac1d83ddb
2016-08-23 15:00:34.5061623scussu59dbd1b161d5b4779a73cf19a7836ebd6100211055a71811-5ec4-497a-a058-140fb0d611ad
2016-08-23 15:00:37.4490273scussu390d3d2fc7ecc430c9621ece335651a0110007006f2ee8254-173c-477d-a1de-4902150ea50d
2016-08-23 15:00:41.2431223scussu390d3d2fc7ecc430c9621ece335651a011032008cf38575-fca9-48ca-bd7c-21196f6d6765
2016-08-23 15:00:47.2983975ncussu1e24ef436e02b4823ac5d5b1465a9401e42369059000000000-0000-0000-0000-000000000000
2016-08-23 15:00:50.5932834scussu59dbd1b161d5b4779a73cf19a7836ebd6100070062a41b552-aa19-4987-8cdd-410a3af016ac
2016-08-23 15:00:50.8259021scussu59dbd1b161d5b4779a73cf19a7836ebd610021100d56b8e3-470d-4213-91da-97405f8d005e
2016-08-23 15:00:53.2490731scussu59dbd1b161d5b4779a73cf19a7836ebd63610955a71811-5ec4-497a-a058-140fb0d611ad
2016-08-23 15:00:57.0000946eus2su289e2f62a73bb4efd8f545aeae40d7e5164038cb55739e-4afe-46a3-970f-1b49d8ee7564
2016-08-23 15:00:58.2222707scussu59dbd1b161d5b4779a73cf19a7836ebd6100070078215dcf6-2de0-42bd-9c90-181c70486c9c
2016-08-23 15:00:59.9382620scussu390d3d2fc7ecc430c9621ece335651a0110007006451e3c4c-0808-4566-a64d-84d85cf30978

Use autocluster() for single record set clustering

Even though there are less than a thousand exceptions, it’s still hard to find common segments, since there are multiple values in each column. You can use the autocluster() plugin to instantly extract a short list of common segments and find the interesting clusters within the spike’s two minutes, as seen in the following query:

let min_peak_t=datetime(2016-08-23 15:00);
let max_peak_t=datetime(2016-08-23 15:02);
demo_clustering1
| where PreciseTimeStamp between(min_peak_t..max_peak_t)
| evaluate autocluster()
SegmentIdCountPercentRegionScaleUnitDeploymentIdServiceHost
063965.7407407407407eausu7b5d1d4df547d4a04ac15885617edba57e7f60c5d-4944-42b3-922a-92e98a8e7dec
1949.67078189300411scussu59dbd1b161d5b4779a73cf19a7836ebd6
2828.43621399176955ncussu1e24ef436e02b4823ac5d5b1465a9401e
3686.99588477366255scussu390d3d2fc7ecc430c9621ece335651a01
4555.65843621399177weusu4be1d6d7ac9574cbc9a22cb8ee20f16fc

You can see from the results above that the most dominant segment contains 65.74% of the total exception records and shares four dimensions. The next segment is much less common. It contains only 9.67% of the records, and shares three dimensions. The other segments are even less common.

Autocluster uses a proprietary algorithm for mining multiple dimensions and extracting interesting segments. “Interesting” means that each segment has significant coverage of both the records set and the features set. The segments are also diverged, meaning that each one is different from the others. One or more of these segments might be relevant for the RCA process. To minimize segment review and assessment, autocluster extracts only a small segment list.

Use basket() for single record set clustering

You can also use the basket() plugin as seen in the following query:

let min_peak_t=datetime(2016-08-23 15:00);
let max_peak_t=datetime(2016-08-23 15:02);
demo_clustering1
| where PreciseTimeStamp between(min_peak_t..max_peak_t)
| evaluate basket()
SegmentIdCountPercentRegionScaleUnitDeploymentIdTracepointServiceHost
063965.7407407407407eausu7b5d1d4df547d4a04ac15885617edba57e7f60c5d-4944-42b3-922a-92e98a8e7dec
164266.0493827160494eausu7b5d1d4df547d4a04ac15885617edba57
232433.3333333333333eausu7b5d1d4df547d4a04ac15885617edba570e7f60c5d-4944-42b3-922a-92e98a8e7dec
331532.4074074074074eausu7b5d1d4df547d4a04ac15885617edba5716108e7f60c5d-4944-42b3-922a-92e98a8e7dec
432833.74485596707820
5949.67078189300411scussu59dbd1b161d5b4779a73cf19a7836ebd6
6828.43621399176955ncussu1e24ef436e02b4823ac5d5b1465a9401e
7686.99588477366255scussu390d3d2fc7ecc430c9621ece335651a01
816717.1810699588477scus
9555.65843621399177weusu4be1d6d7ac9574cbc9a22cb8ee20f16fc
10929.4650205761316910007007
11909.2592592592592610007006
12575.864197530864200000000-0000-0000-0000-000000000000

Basket implements the “Apriori” algorithm for item set mining. It extracts all segments whose coverage of the record set is above a threshold (default 5%). You can see that more segments were extracted with similar ones, such as segments 0, 1 or 2, 3.

Both plugins are powerful and easy to use. Their limitation is that they cluster a single record set in an unsupervised manner with no labels. It’s unclear whether the extracted patterns characterize the selected record set, anomalous records, or the global record set.

Clustering the difference between two records sets

The diffpatterns() plugin overcomes the limitation of autocluster and basket. Diffpatterns takes two record sets and extracts the main segments that are different. One set usually contains the anomalous record set being investigated. One is analyzed by autocluster and basket. The other set contains the reference record set, the baseline.

In the following query, diffpatterns finds interesting clusters within the spike’s two minutes, which are different from the clusters within the baseline. The baseline window is defined as the eight minutes before 15:00, when the spike started. You extend by a binary column (AB), and specify whether a specific record belongs to the baseline or to the anomalous set. Diffpatterns implements a supervised learning algorithm, where the two class labels were generated by the anomalous versus the baseline flag (AB).

let min_peak_t=datetime(2016-08-23 15:00);
let max_peak_t=datetime(2016-08-23 15:02);
let min_baseline_t=datetime(2016-08-23 14:50);
let max_baseline_t=datetime(2016-08-23 14:58); // Leave a gap between the baseline and the spike to avoid the transition zone.
let splitime=(max_baseline_t+min_peak_t)/2.0;
demo_clustering1
| where (PreciseTimeStamp between(min_baseline_t..max_baseline_t)) or
        (PreciseTimeStamp between(min_peak_t..max_peak_t))
| extend AB=iff(PreciseTimeStamp > splitime, 'Anomaly', 'Baseline')
| evaluate diffpatterns(AB, 'Anomaly', 'Baseline')
SegmentIdCountACountBPercentAPercentBPercentDiffABRegionScaleUnitDeploymentIdTracepoint
06392165.741.764.04eausu7b5d1d4df547d4a04ac15885617edba57
116754417.1844.1626.97scus
2923569.4728.919.4310007007
3903369.2627.2718.0110007006
4823188.4425.8117.38ncussu1e24ef436e02b4823ac5d5b1465a9401e
5552525.6620.4514.8weusu4be1d6d7ac9574cbc9a22cb8ee20f16fc
6572045.8616.5610.69

The most dominant segment is the same segment that was extracted by autocluster. Its coverage on the two-minute anomalous window is also 65.74%. However, its coverage on the eight-minute baseline window is only 1.7%. The difference is 64.04%. This difference seems to be related to the anomalous spike. To verify this assumption, the following query splits the original chart into the records that belong to this problematic segment, and records from the other segments.

let min_t = toscalar(demo_clustering1 | summarize min(PreciseTimeStamp));  
let max_t = toscalar(demo_clustering1 | summarize max(PreciseTimeStamp));  
demo_clustering1
| extend seg = iff(Region == "eau" and ScaleUnit == "su7" and DeploymentId == "b5d1d4df547d4a04ac15885617edba57"
and ServiceHost == "e7f60c5d-4944-42b3-922a-92e98a8e7dec", "Problem", "Normal")
| make-series num=count() on PreciseTimeStamp from min_t to max_t step 10m by seg
| render timechart

Validating <code>diffpattern</code> segment timechart.

This chart allows us to see that the spike on Tuesday afternoon was because of exceptions from this specific segment, discovered by using the diffpatterns plugin.

Summary

The Machine Learning plugins are helpful for many scenarios. The autocluster and basket implement an unsupervised learning algorithm and are easy to use. Diffpatterns implements a supervised learning algorithm and, although more complex, it’s more powerful for extracting differentiation segments for RCA.

These plugins are used interactively in ad-hoc scenarios and in automatic near real-time monitoring services. Time series anomaly detection is followed by a diagnosis process. The process is highly optimized to meet necessary performance standards.

16.1.3 - Time series anomaly detection & forecasting

Learn how to analyze time series data for anomaly detection and forecasting.

Cloud services and IoT devices generate telemetry data that can be used to gain insights such as monitoring service health, physical production processes, and usage trends. Performing time series analysis is one way to identify deviations in the pattern of these metrics compared to their typical baseline pattern.

Kusto Query Language (KQL) contains native support for creation, manipulation, and analysis of multiple time series. With KQL, you can create and analyze thousands of time series in seconds, enabling near real time monitoring solutions and workflows.

This article details time series anomaly detection and forecasting capabilities of KQL. The applicable time series functions are based on a robust well-known decomposition model, where each original time series is decomposed into seasonal, trend, and residual components. Anomalies are detected by outliers on the residual component, while forecasting is done by extrapolating the seasonal and trend components. The KQL implementation significantly enhances the basic decomposition model by automatic seasonality detection, robust outlier analysis, and vectorized implementation to process thousands of time series in seconds.

Prerequisites

  • A Microsoft account or a Microsoft Entra user identity. An Azure subscription isn’t required.
  • Read Time series analysis for an overview of time series capabilities.

Time series decomposition model

The KQL native implementation for time series prediction and anomaly detection uses a well-known decomposition model. This model is applied to time series of metrics expected to manifest periodic and trend behavior, such as service traffic, component heartbeats, and IoT periodic measurements to forecast future metric values and detect anomalous ones. The assumption of this regression process is that other than the previously known seasonal and trend behavior, the time series is randomly distributed. You can then forecast future metric values from the seasonal and trend components, collectively named baseline, and ignore the residual part. You can also detect anomalous values based on outlier analysis using only the residual portion. To create a decomposition model, use the function series_decompose(). The series_decompose() function takes a set of time series and automatically decomposes each time series to its seasonal, trend, residual, and baseline components.

For example, you can decompose traffic of an internal web service by using the following query:

let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t step dt by sid 
| where sid == 'TS1'   //  select a single time series for a cleaner visualization
| extend (baseline, seasonal, trend, residual) = series_decompose(num, -1, 'linefit')  //  decomposition of a set of time series to seasonal, trend, residual, and baseline (seasonal+trend)
| render timechart with(title='Web app. traffic of a month, decomposition', ysplit=panels)

Time series decomposition.

  • The original time series is labeled num (in red).
  • The process starts by auto detection of the seasonality by using the function series_periods_detect() and extracts the seasonal pattern (in purple).
  • The seasonal pattern is subtracted from the original time series and a linear regression is run using the function series_fit_line() to find the trend component (in light blue).
  • The function subtracts the trend and the remainder is the residual component (in green).
  • Finally, the function adds the seasonal and trend components to generate the baseline (in blue).

Time series anomaly detection

The function series_decompose_anomalies() finds anomalous points on a set of time series. This function calls series_decompose() to build the decomposition model and then runs series_outliers() on the residual component. series_outliers() calculates anomaly scores for each point of the residual component using Tukey’s fence test. Anomaly scores above 1.5 or below -1.5 indicate a mild anomaly rise or decline respectively. Anomaly scores above 3.0 or below -3.0 indicate a strong anomaly.

The following query allows you to detect anomalies in internal web service traffic:

let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t step dt by sid 
| where sid == 'TS1'   //  select a single time series for a cleaner visualization
| extend (anomalies, score, baseline) = series_decompose_anomalies(num, 1.5, -1, 'linefit')
| render anomalychart with(anomalycolumns=anomalies, title='Web app. traffic of a month, anomalies') //use "| render anomalychart with anomalycolumns=anomalies" to render the anomalies as bold points on the series charts.

Time series anomaly detection.

  • The original time series (in red).
  • The baseline (seasonal + trend) component (in blue).
  • The anomalous points (in purple) on top of the original time series. The anomalous points significantly deviate from the expected baseline values.

Time series forecasting

The function series_decompose_forecast() predicts future values of a set of time series. This function calls series_decompose() to build the decomposition model and then, for each time series, extrapolates the baseline component into the future.

The following query allows you to predict next week’s web service traffic:

let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
let horizon=7d;
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t+horizon step dt by sid 
| where sid == 'TS1'   //  select a single time series for a cleaner visualization
| extend forecast = series_decompose_forecast(num, toint(horizon/dt))
| render timechart with(title='Web app. traffic of a month, forecasting the next week by Time Series Decomposition')

Time series forecasting.

  • Original metric (in red). Future values are missing and set to 0, by default.
  • Extrapolate the baseline component (in blue) to predict next week’s values.

Scalability

Kusto Query Language syntax enables a single call to process multiple time series. Its unique optimized implementation allows for fast performance, which is critical for effective anomaly detection and forecasting when monitoring thousands of counters in near real-time scenarios.

The following query shows the processing of three time series simultaneously:

let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
let horizon=7d;
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t+horizon step dt by sid
| extend offset=case(sid=='TS3', 4000000, sid=='TS2', 2000000, 0)   //  add artificial offset for easy visualization of multiple time series
| extend num=series_add(num, offset)
| extend forecast = series_decompose_forecast(num, toint(horizon/dt))
| render timechart with(title='Web app. traffic of a month, forecasting the next week for 3 time series')

Time series scalability.

Summary

This document details native KQL functions for time series anomaly detection and forecasting. Each original time series is decomposed into seasonal, trend and residual components for detecting anomalies and/or forecasting. These functionalities can be used for near real-time monitoring scenarios, such as fault detection, predictive maintenance, and demand and load forecasting.

16.2 - make-series operator

Learn how to use the make-series operator to create a series of specified aggregated values along a specified axis.

Create series of specified aggregated values along a specified axis.

Syntax

T | make-series [MakeSeriesParameters] [Column =] Aggregation [default = DefaultValue] [, …] on AxisColumn [from start] [to end] step step [by [Column =] GroupExpression [, …]]

Parameters

NameTypeRequiredDescription
ColumnstringThe name for the result column. Defaults to a name derived from the expression.
DefaultValuescalarA default value to use instead of absent values. If there’s no row with specific values of AxisColumn and GroupExpression, then the corresponding element of the array will be assigned a DefaultValue. Default is 0.
Aggregationstring✔️A call to an aggregation function, such as count() or avg(), with column names as arguments. See the list of aggregation functions. Only aggregation functions that return numeric results can be used with the make-series operator.
AxisColumnstring✔️The column by which the series will be ordered. Usually the column values will be of type datetime or timespan but all numeric types are accepted.
startscalar✔️The low bound value of the AxisColumn for each of the series to be built. If start is not specified, it will be the first bin, or step, that has data in each series.
endscalar✔️The high bound non-inclusive value of the AxisColumn. The last index of the time series is smaller than this value and will be start plus integer multiple of step that is smaller than end. If end is not specified, it will be the upper bound of the last bin, or step, that has data per each series.
stepscalar✔️The difference, or bin size, between two consecutive elements of the AxisColumn array. For a list of possible time intervals, see timespan.
GroupExpressionAn expression over the columns that provides a set of distinct values. Typically it’s a column name that already provides a restricted set of values.
MakeSeriesParametersZero or more space-separated parameters in the form of Name = Value that control the behavior. See supported make series parameters.

Supported make series parameters

NameDescription
kindProduces default result when the input of make-series operator is empty. Value: nonempty
hint.shufflekey=<key>The shufflekey query shares the query load on cluster nodes, using a key to partition data. See shuffle query

Alternate Syntax

T | make-series [Column =] Aggregation [default = DefaultValue] [, …] on AxisColumn in range(start, stop, step) [by [Column =] GroupExpression [, …]]

The generated series from the alternate syntax differs from the main syntax in two aspects:

  • The stop value is inclusive.
  • Binning the index axis is generated with bin() and not bin_at(), which means that start may not be included in the generated series.

It’s recommended to use the main syntax of make-series and not the alternate syntax.

Returns

The input rows are arranged into groups having the same values of the by expressions and the bin_at(AxisColumn,step,start) expression. Then the specified aggregation functions are computed over each group, producing a row for each group. The result contains the by columns, AxisColumn column and also at least one column for each computed aggregate. (Aggregations over multiple columns or non-numeric results aren’t supported.)

This intermediate result has as many rows as there are distinct combinations of by and bin_at(AxisColumn,step,start) values.

Finally the rows from the intermediate result arranged into groups having the same values of the by expressions and all aggregated values are arranged into arrays (values of dynamic type). For each aggregation, there’s one column containing its array with the same name. The last column is an array containing the values of AxisColumn binned according to the specified step.

List of aggregation functions

FunctionDescription
avg()Returns an average value across the group
avgif()Returns an average with the predicate of the group
count()Returns a count of the group
countif()Returns a count with the predicate of the group
dcount()Returns an approximate distinct count of the group elements
dcountif()Returns an approximate distinct count with the predicate of the group
max()Returns the maximum value across the group
maxif()Returns the maximum value with the predicate of the group
min()Returns the minimum value across the group
minif()Returns the minimum value with the predicate of the group
percentile()Returns the percentile value across the group
take_any()Returns a random non-empty value for the group
stdev()Returns the standard deviation across the group
sum()Returns the sum of the elements within the group
sumif()Returns the sum of the elements with the predicate of the group
variance()Returns the variance across the group

List of series analysis functions

FunctionDescription
series_fir()Applies Finite Impulse Response filter
series_iir()Applies Infinite Impulse Response filter
series_fit_line()Finds a straight line that is the best approximation of the input
series_fit_line_dynamic()Finds a line that is the best approximation of the input, returning dynamic object
series_fit_2lines()Finds two lines that are the best approximation of the input
series_fit_2lines_dynamic()Finds two lines that are the best approximation of the input, returning dynamic object
series_outliers()Scores anomaly points in a series
series_periods_detect()Finds the most significant periods that exist in a time series
series_periods_validate()Checks whether a time series contains periodic patterns of given lengths
series_stats_dynamic()Return multiple columns with the common statistics (min/max/variance/stdev/average)
series_stats()Generates a dynamic value with the common statistics (min/max/variance/stdev/average)

For a complete list of series analysis functions, see: Series processing functions

List of series interpolation functions

FunctionDescription
series_fill_backward()Performs backward fill interpolation of missing values in a series
series_fill_const()Replaces missing values in a series with a specified constant value
series_fill_forward()Performs forward fill interpolation of missing values in a series
series_fill_linear()Performs linear interpolation of missing values in a series
  • Note: Interpolation functions by default assume null as a missing value. Therefore specify default=double(null) in make-series if you intend to use interpolation functions for the series.

Examples

A table that shows arrays of the numbers and average prices of each fruit from each supplier ordered by the timestamp with specified range. There’s a row in the output for each distinct combination of fruit and supplier. The output columns show the fruit, supplier, and arrays of: count, average, and the whole timeline (from 2016-01-01 until 2016-01-10). All arrays are sorted by the respective timestamp and all gaps are filled with default values (0 in this example). All other input columns are ignored.

T | make-series PriceAvg=avg(Price) default=0
on Purchase from datetime(2016-09-10) to datetime(2016-09-13) step 1d by Supplier, Fruit

Three tables. The first lists raw data, the second has only distinct supplier-fruit-date combinations, and the third contains the make-series results.

let data=datatable(timestamp:datetime, metric: real)
[
  datetime(2016-12-31T06:00), 50,
  datetime(2017-01-01), 4,
  datetime(2017-01-02), 3,
  datetime(2017-01-03), 4,
  datetime(2017-01-03T03:00), 6,
  datetime(2017-01-05), 8,
  datetime(2017-01-05T13:40), 13,
  datetime(2017-01-06), 4,
  datetime(2017-01-07), 3,
  datetime(2017-01-08), 8,
  datetime(2017-01-08T21:00), 8,
  datetime(2017-01-09), 2,
  datetime(2017-01-09T12:00), 11,
  datetime(2017-01-10T05:00), 5,
];
let interval = 1d;
let stime = datetime(2017-01-01);
let etime = datetime(2017-01-10);
data
| make-series avg(metric) on timestamp from stime to etime step interval 
avg_metrictimestamp
[ 4.0, 3.0, 5.0, 0.0, 10.5, 4.0, 3.0, 8.0, 6.5 ][ “2017-01-01T00:00:00.0000000Z”, “2017-01-02T00:00:00.0000000Z”, “2017-01-03T00:00:00.0000000Z”, “2017-01-04T00:00:00.0000000Z”, “2017-01-05T00:00:00.0000000Z”, “2017-01-06T00:00:00.0000000Z”, “2017-01-07T00:00:00.0000000Z”, “2017-01-08T00:00:00.0000000Z”, “2017-01-09T00:00:00.0000000Z” ]

When the input to make-series is empty, the default behavior of make-series produces an empty result.

let data=datatable(timestamp:datetime, metric: real)
[
  datetime(2016-12-31T06:00), 50,
  datetime(2017-01-01), 4,
  datetime(2017-01-02), 3,
  datetime(2017-01-03), 4,
  datetime(2017-01-03T03:00), 6,
  datetime(2017-01-05), 8,
  datetime(2017-01-05T13:40), 13,
  datetime(2017-01-06), 4,
  datetime(2017-01-07), 3,
  datetime(2017-01-08), 8,
  datetime(2017-01-08T21:00), 8,
  datetime(2017-01-09), 2,
  datetime(2017-01-09T12:00), 11,
  datetime(2017-01-10T05:00), 5,
];
let interval = 1d;
let stime = datetime(2017-01-01);
let etime = datetime(2017-01-10);
data
| take 0
| make-series avg(metric) default=1.0 on timestamp from stime to etime step interval 
| count 

Output

Count
0

Using kind=nonempty in make-series will produce a non-empty result of the default values:

let data=datatable(timestamp:datetime, metric: real)
[
  datetime(2016-12-31T06:00), 50,
  datetime(2017-01-01), 4,
  datetime(2017-01-02), 3,
  datetime(2017-01-03), 4,
  datetime(2017-01-03T03:00), 6,
  datetime(2017-01-05), 8,
  datetime(2017-01-05T13:40), 13,
  datetime(2017-01-06), 4,
  datetime(2017-01-07), 3,
  datetime(2017-01-08), 8,
  datetime(2017-01-08T21:00), 8,
  datetime(2017-01-09), 2,
  datetime(2017-01-09T12:00), 11,
  datetime(2017-01-10T05:00), 5,
];
let interval = 1d;
let stime = datetime(2017-01-01);
let etime = datetime(2017-01-10);
data
| take 0
| make-series kind=nonempty avg(metric) default=1.0 on timestamp from stime to etime step interval 

Output

avg_metrictimestamp
[
1.0,
1.0,
1.0,
1.0,
1.0,
1.0,
1.0,
1.0,
1.0
]
[
“2017-01-01T00:00:00.0000000Z”,
“2017-01-02T00:00:00.0000000Z”,
“2017-01-03T00:00:00.0000000Z”,
“2017-01-04T00:00:00.0000000Z”,
“2017-01-05T00:00:00.0000000Z”,
“2017-01-06T00:00:00.0000000Z”,
“2017-01-07T00:00:00.0000000Z”,
“2017-01-08T00:00:00.0000000Z”,
“2017-01-09T00:00:00.0000000Z”
]

16.3 - series_abs()

Learn how to use the series_abs() function to calculate the element-wise absolute value of the numeric series input.

Calculates the element-wise absolute value of the numeric series input.

Syntax

series_abs(series)

Parameters

NameTypeRequiredDescription
seriesdynamic✔️An array of numeric values over which the absolute value function is applied.

Returns

Dynamic array of calculated absolute value. Any non-numeric element yields a null element value.

Example

print arr = dynamic([-6.5,0,8.2])
| extend arr_abs = series_abs(arr)

Output

arrarr_abs
[-6.5,0,8.2][6.5,0,8.2]

16.4 - series_acos()

Learn how to use the series_acos() function to calculate the element-wise arccosine function of the numeric series input.

Calculates the element-wise arccosine function of the numeric series input.

Syntax

series_acos(series)

Parameters

NameTypeRequiredDescription
seriesdynamic✔️An array of numeric values over which the arccosine function is applied.

Returns

Dynamic array of calculated arccosine function values. Any non-numeric element yields a null element value.

Example

print arr = dynamic([-1,0,1])
| extend arr_acos = series_acos(arr)

Output

arrarr_acos
[-6.5,0,8.2][3.1415926535897931,1.5707963267948966,0.0]

16.5 - series_add()

Learn how to use the series_add() function to calculate the element-wise addition of two numeric series inputs.

Calculates the element-wise addition of two numeric series inputs.

Syntax

series_add(series1, series2)

Parameters

NameTypeRequiredDescription
series1, series2dynamic✔️The numeric arrays to be element-wise added into a dynamic array result.

Returns

Dynamic array of calculated element-wise add operation between the two inputs. Any non-numeric element or non-existing element (arrays of different sizes) yields a null element value.

Example

range x from 1 to 3 step 1
| extend y = x * 2
| extend z = y * 2
| project s1 = pack_array(x,y,z), s2 = pack_array(z, y, x)
| extend s1_add_s2 = series_add(s1, s2)

Output

s1s2s1_add_s2
[1,2,4][4,2,1][5,4,5]
[2,4,8][8,4,2][10,8,10]
[3,6,12][12,6,3][15,12,15]

16.6 - series_atan()

Learn how to use the series_atan() function to calculate the element-wise arctangent of the numeric series input.

Calculates the element-wise arctangent function of the numeric series input.

Syntax

series_atan(series)

Parameters

NameTypeRequiredDescription
seriesdynamic✔️An array of numeric values over which the arctangent function is applied.

Returns

Dynamic array of calculated arctangent function values. Any non-numeric element yields a null element value.

Example

print arr = dynamic([-1,0,1])
| extend arr_atan = series_atan(arr)

Output

arrarr_atan
[-6.5,0,8.2][-0.78539816339744828,0.0,0.78539816339744828]

16.7 - series_cos()

Learn how to use the series_cos() function to calculate the element-wise cosine function of the numeric series input.

Calculates the element-wise cosine function of the numeric series input.

Syntax

series_cos(series)

Parameters

NameTypeRequiredDescription
seriesdynamic✔️An array of numeric values over which the cosine function is applied.

Returns

Dynamic array of calculated cosine function values. Any non-numeric element yields a null element value.

Example

print arr = dynamic([-1,0,1])
| extend arr_cos = series_cos(arr)

Output

arrarr_cos
[-6.5,0,8.2][0.54030230586813976,1.0,0.54030230586813976]

16.8 - series_cosine_similarity()

This article describes series_cosine_similarity().

Calculate the cosine similarity of two numerical vectors.

The function series_cosine_similarity() takes two numeric series as input, and calculates their cosine similarity.

Syntax

series_cosine_similarity(series1, series2, [*magnitude1, [*magnitude2]])

Parameters

NameTypeRequiredDescription
series1, series2dynamic✔️Input arrays with numeric data.
magnitude1, magnitude2realOptional magnitude of the first and the second vectors respectively. The magnitude is the square root of the dot product of the vector with itself. If the magnitude isn’t provided, it will be calculated.

Returns

Returns a value of type real whose value is the cosine similarity of series1 with series2. In case both series length isn’t equal, the longer series will be truncated to the length of the shorter one. Any non-numeric element of the input series will be ignored.

Example

target="_blank">Run the query

datatable(s1:dynamic, s2:dynamic)
[
    dynamic([0.1,0.2,0.1,0.2]), dynamic([0.11,0.2,0.11,0.21]),
    dynamic([0.1,0.2,0.1,0.2]), dynamic([1,2,3,4]),
]
| extend cosine_similarity=series_cosine_similarity(s1, s2)
s1s2cosine_similarity
[0.1,0.2,0.1,0.2][0.11,0.2,0.11,0.21]0.99935343825504
[0.1,0.2,0.1,0.2][1,2,3,4]0.923760430703401

16.9 - series_decompose_anomalies()

Learn how to use series_decompose_anomalies() function to extract anomalous points from a dynamic numerical array.

Anomaly Detection is based on series decomposition. For more information, see series_decompose().

The function takes an expression containing a series (dynamic numerical array) as input, and extracts anomalous points with scores.

Syntax

series_decompose_anomalies (Series, [ Threshold, Seasonality, Trend, Test_points, AD_method, Seasonality_threshold ])

Parameters

NameTypeRequiredDescription
Seriesdynamic✔️An array of numeric values, typically the resulting output of make-series or make_list operators.
ThresholdrealThe anomaly threshold. The default is 1.5, k value, for detecting mild or stronger anomalies.
SeasonalityintControls the seasonal analysis. The possible values are:

- -1: Autodetect seasonality using series_periods_detect. This is the default value.
- Integer time period: A positive integer specifying the expected period in number of bins. For example, if the series is in 1h bins, a weekly period is 168 bins.
- 0: No seasonality, so skip extracting this component.
TrendstringControls the trend analysis. The possible values are:

- avg: Define trend component as average(x). This is the default.
- linefit: Extract trend component using linear regression.
- none: No trend, so skip extracting this component.
Test_pointsintA positive integer specifying the number of points at the end of the series to exclude from the learning, or regression, process. This parameter should be set for forecasting purposes. The default value is 0.
AD_methodstringControls the anomaly detection method on the residual time series, containing one of the following values:

- ctukey: Tukey’s fence test with custom 10th-90th percentile range. This is the default.
- tukey: Tukey’s fence test with standard 25th-75th percentile range.

For more information on residual time series, see series_outliers.
Seasonality_thresholdrealThe threshold for seasonality score when Seasonality is set to autodetect. The default score threshold is 0.6.

For more information, see series_periods_detect.

Returns

The function returns the following respective series:

  • ad_flag: A ternary series containing (+1, -1, 0) marking up/down/no anomaly respectively
  • ad_score: Anomaly score
  • baseline: The predicted value of the series, according to the decomposition

The algorithm

This function follows these steps:

  1. Calls series_decompose() with the respective parameters, to create the baseline and residuals series.
  2. Calculates ad_score series by applying series_outliers() with the chosen anomaly detection method on the residuals series.
  3. Calculates the ad_flag series by applying the threshold on the ad_score to mark up/down/no anomaly respectively.

Examples

Detect anomalies in weekly seasonality

In the following example, generate a series with weekly seasonality, and then add some outliers to it. series_decompose_anomalies autodetects the seasonality and generates a baseline that captures the repetitive pattern. The outliers you added can be clearly spotted in the ad_score component.

let ts=range t from 1 to 24*7*5 step 1 
| extend Timestamp = datetime(2018-03-01 05:00) + 1h * t 
| extend y = 2*rand() + iff((t/24)%7>=5, 10.0, 15.0) - (((t%24)/10)*((t%24)/10)) // generate a series with weekly seasonality
| extend y=iff(t==150 or t==200 or t==780, y-8.0, y) // add some dip outliers
| extend y=iff(t==300 or t==400 or t==600, y+8.0, y) // add some spike outliers
| summarize Timestamp=make_list(Timestamp, 10000),y=make_list(y, 10000);
ts 
| extend series_decompose_anomalies(y)
| render timechart  

Weekly seasonality showing baseline and outliers.

Detect anomalies in weekly seasonality with trend

In this example, add a trend to the series from the previous example. First, run series_decompose_anomalies with the default parameters in which the trend avg default value only takes the average and doesn’t compute the trend. The generated baseline doesn’t contain the trend and is less exact, compared to the previous example. Consequently, some of the outliers you inserted in the data aren’t detected because of the higher variance.

let ts=range t from 1 to 24*7*5 step 1 
| extend Timestamp = datetime(2018-03-01 05:00) + 1h * t 
| extend y = 2*rand() + iff((t/24)%7>=5, 5.0, 15.0) - (((t%24)/10)*((t%24)/10)) + t/72.0 // generate a series with weekly seasonality and ongoing trend
| extend y=iff(t==150 or t==200 or t==780, y-8.0, y) // add some dip outliers
| extend y=iff(t==300 or t==400 or t==600, y+8.0, y) // add some spike outliers
| summarize Timestamp=make_list(Timestamp, 10000),y=make_list(y, 10000);
ts 
| extend series_decompose_anomalies(y)
| extend series_decompose_anomalies_y_ad_flag = 
series_multiply(10, series_decompose_anomalies_y_ad_flag) // multiply by 10 for visualization purposes
| render timechart

Weekly seasonality outliers with trend.

Next, run the same example, but since you’re expecting a trend in the series, specify linefit in the trend parameter. You can see that the baseline is much closer to the input series. All the inserted outliers are detected, and also some false positives. See the next example on tweaking the threshold.

let ts=range t from 1 to 24*7*5 step 1 
| extend Timestamp = datetime(2018-03-01 05:00) + 1h * t 
| extend y = 2*rand() + iff((t/24)%7>=5, 5.0, 15.0) - (((t%24)/10)*((t%24)/10)) + t/72.0 // generate a series with weekly seasonality and ongoing trend
| extend y=iff(t==150 or t==200 or t==780, y-8.0, y) // add some dip outliers
| extend y=iff(t==300 or t==400 or t==600, y+8.0, y) // add some spike outliers
| summarize Timestamp=make_list(Timestamp, 10000),y=make_list(y, 10000);
ts 
| extend series_decompose_anomalies(y, 1.5, -1, 'linefit')
| extend series_decompose_anomalies_y_ad_flag = 
series_multiply(10, series_decompose_anomalies_y_ad_flag) // multiply by 10 for visualization purposes
| render timechart  

Weekly seasonality anomalies with linefit trend.

Tweak the anomaly detection threshold

A few noisy points were detected as anomalies in the previous example. Now increase the anomaly detection threshold from a default of 1.5 to 2.5. Use this interpercentile range, so that only stronger anomalies are detected. Now, only the outliers you inserted in the data, will be detected.

let ts=range t from 1 to 24*7*5 step 1 
| extend Timestamp = datetime(2018-03-01 05:00) + 1h * t 
| extend y = 2*rand() + iff((t/24)%7>=5, 5.0, 15.0) - (((t%24)/10)*((t%24)/10)) + t/72.0 // generate a series with weekly seasonality and onlgoing trend
| extend y=iff(t==150 or t==200 or t==780, y-8.0, y) // add some dip outliers
| extend y=iff(t==300 or t==400 or t==600, y+8.0, y) // add some spike outliers
| summarize Timestamp=make_list(Timestamp, 10000),y=make_list(y, 10000);
ts 
| extend series_decompose_anomalies(y, 2.5, -1, 'linefit')
| extend series_decompose_anomalies_y_ad_flag = 
series_multiply(10, series_decompose_anomalies_y_ad_flag) // multiply by 10 for visualization purposes
| render timechart  

Weekly series anomalies with higher anomaly threshold.

16.10 - series_decompose_forecast()

Learn how to use the series_decompose_forecast() function to predict the value of the last trailing points.

Forecast based on series decomposition.

Takes an expression containing a series (dynamic numerical array) as input, and predicts the values of the last trailing points. For more information, see series_decompose.

Syntax

series_decompose_forecast(Series, Points, [ Seasonality, Trend, Seasonality_threshold ])

Parameters

NameTypeRequiredDescription
Seriesdynamic✔️An array of numeric values, typically the resulting output of make-series or make_list operators.
Pointsint✔️Specifies the number of points at the end of the series to predict, or forecast. These points are excluded from the learning, or regression, process.
SeasonalityintControls the seasonal analysis. The possible values are:

- -1: Autodetect seasonality using series_periods_detect. This is the default value.
- Period: A positive integer specifying the expected period in number of bins. For example, if the series is in 1 - h bins, a weekly period is 168 bins.
- 0: No seasonality, so skip extracting this component.
TrendstringControls the trend analysis. The possible values are:

- avg: Define trend component as average(x). This is the default.
- linefit: Extract trend component using linear regression.
- none: No trend, so skip extracting this component.
Seasonality_thresholdrealThe threshold for seasonality score when Seasonality is set to autodetect. The default score threshold is 0.6.

For more information, see series_periods_detect.

Returns

A dynamic array with the forecasted series.

Example

In the following example, we generate a series of four weeks in an hourly grain, with weekly seasonality and a small upward trend. We then use make-series and add another empty week to the series. series_decompose_forecast is called with a week (24*7 points), and it automatically detects the seasonality and trend, and generates a forecast of the entire five-week period.

let ts=range t from 1 to 24*7*4 step 1 // generate 4 weeks of hourly data
| extend Timestamp = datetime(2018-03-01 05:00) + 1h * t 
| extend y = 2*rand() + iff((t/24)%7>=5, 5.0, 15.0) - (((t%24)/10)*((t%24)/10)) + t/72.0 // generate a series with weekly seasonality and ongoing trend
| extend y=iff(t==150 or t==200 or t==780, y-8.0, y) // add some dip outliers
| extend y=iff(t==300 or t==400 or t==600, y+8.0, y) // add some spike outliers
| make-series y=max(y) on Timestamp from datetime(2018-03-01 05:00) to datetime(2018-03-01 05:00)+24*7*5h step 1h; // create a time series of 5 weeks (last week is empty)
ts 
| extend y_forcasted = series_decompose_forecast(y, 24*7)  // forecast a week forward
| render timechart 

Series decompose forecast.

16.11 - series_decompose()

Learn how to use the series_decompose() function to apply a decomposition transformation on a series.

Applies a decomposition transformation on a series.

Takes an expression containing a series (dynamic numerical array) as input and decomposes it to seasonal, trend, and residual components.

Syntax

series_decompose(Series , [ Seasonality, Trend, Test_points, Seasonality_threshold ])

Parameters

NameTypeRequiredDescription
Seriesdynamic✔️An array of numeric values, typically the resulting output of make-series or make_list operators.
SeasonalityintControls the seasonal analysis. The possible values are:

- -1: Autodetect seasonality using series_periods_detect. This is the default value.
- Period: A positive integer specifying the expected period in number of bins. For example, if the series is in 1 - h bins, a weekly period is 168 bins.
- 0: No seasonality, so skip extracting this component.
TrendstringControls the trend analysis. The possible values are:

- avg: Define trend component as average(x). This is the default.
- linefit: Extract trend component using linear regression.
- none: No trend, so skip extracting this component.
Test_pointsintA positive integer specifying the number of points at the end of the series to exclude from the learning, or regression, process. This parameter should be set for forecasting purposes. The default value is 0.
Seasonality_thresholdrealThe threshold for seasonality score when Seasonality is set to autodetect. The default score threshold is 0.6.

For more information, see series_periods_detect.

Returns

The function returns the following respective series:

  • baseline: the predicted value of the series (sum of seasonal and trend components, see below).
  • seasonal: the series of the seasonal component:
    • if the period isn’t detected or is explicitly set to 0: constant 0.
    • if detected or set to positive integer: median of the series points in the same phase
  • trend: the series of the trend component.
  • residual: the series of the residual component (that is, x - baseline).

More about series decomposition

This method is usually applied to time series of metrics expected to manifest periodic and/or trend behavior. You can use the method to forecast future metric values and/or detect anomalous values. The implicit assumption of this regression process is that apart from seasonal and trend behavior, the time series is stochastic and randomly distributed. Forecast future metric values from the seasonal and trend components while ignoring the residual part. Detect anomalous values based on outlier detection only on the residual part only. Further details can be found in the Time Series Decomposition chapter.

Examples

Weekly seasonality

In the following example, we generate a series with weekly seasonality and without trend, we then add some outliers to it. series_decompose finds and automatically detects the seasonality, and generates a baseline that is almost identical to the seasonal component. The outliers we added can be clearly seen in the residuals component.

let ts=range t from 1 to 24*7*5 step 1 
| extend Timestamp = datetime(2018-03-01 05:00) + 1h * t 
| extend y = 2*rand() + iff((t/24)%7>=5, 10.0, 15.0) - (((t%24)/10)*((t%24)/10)) // generate a series with weekly seasonality
| extend y=iff(t==150 or t==200 or t==780, y-8.0, y) // add some dip outliers
| extend y=iff(t==300 or t==400 or t==600, y+8.0, y) // add some spike outliers
| summarize Timestamp=make_list(Timestamp, 10000),y=make_list(y, 10000);
ts 
| extend series_decompose(y)
| render timechart  

Series decompose 1.

Weekly seasonality with trend

In this example, we add a trend to the series from the previous example. First, we run series_decompose with the default parameters. The trend avg default value only takes the average and doesn’t compute the trend. The generated baseline doesn’t contain the trend. When observing the trend in the residuals, it becomes apparent that this example is less accurate than the previous example.

let ts=range t from 1 to 24*7*5 step 1 
| extend Timestamp = datetime(2018-03-01 05:00) + 1h * t 
| extend y = 2*rand() + iff((t/24)%7>=5, 5.0, 15.0) - (((t%24)/10)*((t%24)/10)) + t/72.0 // generate a series with weekly seasonality and ongoing trend
| extend y=iff(t==150 or t==200 or t==780, y-8.0, y) // add some dip outliers
| extend y=iff(t==300 or t==400 or t==600, y+8.0, y) // add some spike outliers
| summarize Timestamp=make_list(Timestamp, 10000),y=make_list(y, 10000);
ts 
| extend series_decompose(y)
| render timechart  

Series decompose 2.

Next, we rerun the same example. Since we’re expecting a trend in the series, we specify linefit in the trend parameter. We can see that the positive trend is detected and the baseline is much closer to the input series. The residuals are close to zero, and only the outliers stand out. We can see all the components on the series in the chart.

let ts=range t from 1 to 24*7*5 step 1 
| extend Timestamp = datetime(2018-03-01 05:00) + 1h * t 
| extend y = 2*rand() + iff((t/24)%7>=5, 5.0, 15.0) - (((t%24)/10)*((t%24)/10)) + t/72.0 // generate a series with weekly seasonality and ongoing trend
| extend y=iff(t==150 or t==200 or t==780, y-8.0, y) // add some dip outliers
| extend y=iff(t==300 or t==400 or t==600, y+8.0, y) // add some spike outliers
| summarize Timestamp=make_list(Timestamp, 10000),y=make_list(y, 10000);
ts 
| extend series_decompose(y, -1, 'linefit')
| render timechart  

Series decompose 3.

16.12 - series_divide()

Learn how to use the series_divide() function to calculate the element-wise division of two numeric series inputs.

Calculates the element-wise division of two numeric series inputs.

Syntax

series_divide(series1, series2)

Parameters

NameTypeRequiredDescription
series1, series2dynamic✔️The numeric arrays over which to calculate the element-wise division. The first array is to be divided by the second.

Returns

Dynamic array of calculated element-wise divide operation between the two inputs. Any non-numeric element or non-existing element (arrays of different sizes) yields a null element value.

Note: the result series is of double type, even if the inputs are integers. Division by zero follows the double division by zero (e.g. 2/0 yields double(+inf)).

Example

range x from 1 to 3 step 1
| extend y = x * 2
| extend z = y * 2
| project s1 = pack_array(x,y,z), s2 = pack_array(z, y, x)
| extend s1_divide_s2 = series_divide(s1, s2)

Output

s1s2s1_divide_s2
[1,2,4][4,2,1][0.25,1.0,4.0]
[2,4,8][8,4,2][0.25,1.0,4.0]
[3,6,12][12,6,3][0.25,1.0,4.0]

16.13 - series_dot_product()

This article describes series_dot_product().

Calculates the dot product of two numeric series.

The function series_dot_product() takes two numeric series as input, and calculates their dot product.

Syntax

series_dot_product(series1, series2)

Alternate syntax

series_dot_product(series, numeric)

series_dot_product(numeric, series)

Parameters

NameTypeRequiredDescription
series1, series2dynamic✔️Input arrays with numeric data, to be element-wise multiplied and then summed into a value of type real.

Returns

Returns a value of type real whose value is the sum over the product of each element of series1 with the corresponding element of series2. In case both series length isn’t equal, the longer series will be truncated to the length of the shorter one. Any non-numeric element of the input series will be ignored.

Example

range x from 1 to 3 step 1 
| extend y = x * 2
| extend z = y * 2
| project s1 = pack_array(x,y,z), s2 = pack_array(z, y, x)
| extend s1_dot_product_s2 = series_dot_product(s1, s2)
s1s2s1_dot_product_s2
[1,2,4][4,2,1]12
[2,4,8][8,4,2]48
[3,6,12][12,6,3]108
range x from 1 to 3 step 1 
| extend y = x * 2
| extend z = y * 2
| project s1 = pack_array(x,y,z), s2 = x
| extend s1_dot_product_s2 = series_dot_product(s1, s2)
s1s2s1_dot_product_s2
[1,2,4]17
[2,4,8]228
[3,6,12]363

16.14 - series_equals()

Learn how to use the series_equals() function to calculate the element-wise equals (==) logic operation of two numeric series inputs.

Calculates the element-wise equals (==) logic operation of two numeric series inputs.

Syntax

series_equals (series1, series2)

Parameters

NameTypeRequiredDescription
series1, series2dynamic✔️The numeric arrays to be element-wise compared.

Returns

Dynamic array of booleans containing the calculated element-wise equal logic operation between the two inputs. Any non-numeric element or non-existing element (arrays of different sizes) yields a null element value.

Example

print s1 = dynamic([1,2,4]), s2 = dynamic([4,2,1])
| extend s1_equals_s2 = series_equals(s1, s2)

Output

s1s2s1_equals_s2
[1,2,4][4,2,1][false,true,false]

For entire series statistics comparisons, see:

16.15 - series_exp()

Learn how to use the series_exp() function to calculate the element-wise base-e exponential function (e^x) of the numeric series input.

Calculates the element-wise base-e exponential function (e^x) of the numeric series input.

Syntax

series_exp(series)

Parameters

NameTypeRequiredDescription
seriesdynamic✔️An array of numeric values whose elements are applied as the exponent in the exponential function.

Returns

Dynamic array of calculated exponential function. Any non-numeric element yields a null element value.

Example

print s = dynamic([1,2,3])
| extend s_exp = series_exp(s)

Output

ss_exp
[1,2,3][2.7182818284590451,7.38905609893065,20.085536923187668]

16.16 - series_fft()

Learn how to use the series_fft() function to apply the Fast Fourier Transform (FFT) on a series.

Applies the Fast Fourier Transform (FFT) on a series.

The series_fft() function takes a series of complex numbers in the time/spatial domain and transforms it to the frequency domain using the Fast Fourier Transform. The transformed complex series represents the magnitude and phase of the frequencies appearing in the original series. Use the complementary function series_ifft to transform from the frequency domain back to the time/spatial domain.

Syntax

series_fft(x_real [, x_imaginary])

Parameters

NameTypeRequiredDescription
x_realdynamic✔️A numeric array representing the real component of the series to transform.
x_imaginarydynamicA similar array representing the imaginary component of the series. This parameter should only be specified if the input series contains complex numbers.

Returns

The function returns the complex inverse fft in two series. The first series for the real component and the second one for the imaginary component.

Example

  • Generate a complex series, where the real and imaginary components are pure sine waves in different frequencies. Use FFT to transform it to the frequency domain:

    [!div class=“nextstepaction”] Run the query

    let sinewave=(x:double, period:double, gain:double=1.0, phase:double=0.0)
    {
        gain*sin(2*pi()/period*(x+phase))
    }
    ;
    let n=128;      //  signal length
    range x from 0 to n-1 step 1 | extend yr=sinewave(x, 8), yi=sinewave(x, 32)
    | summarize x=make_list(x), y_real=make_list(yr), y_imag=make_list(yi)
    | extend (fft_y_real, fft_y_imag) = series_fft(y_real, y_imag)
    | render linechart with(ysplit=panels)
    

    This query returns fft_y_real and fft_y_imag:

    Series fft.

  • Transform a series to the frequency domain, and then apply the inverse transform to get back the original series:

    [!div class=“nextstepaction”] Run the query

    let sinewave=(x:double, period:double, gain:double=1.0, phase:double=0.0)
    {
        gain*sin(2*pi()/period*(x+phase))
    }
    ;
    let n=128;      //  signal length
    range x from 0 to n-1 step 1 | extend yr=sinewave(x, 8), yi=sinewave(x, 32)
    | summarize x=make_list(x), y_real=make_list(yr), y_imag=make_list(yi)
    | extend (fft_y_real, fft_y_imag) = series_fft(y_real, y_imag)
    | extend (y_real2, y_image2) = series_ifft(fft_y_real, fft_y_imag)
    | project-away fft_y_real, fft_y_imag   //  too many series for linechart with panels
    | render linechart with(ysplit=panels)
    

    This query returns y_real2 and *y_imag2, which are the same as y_real and y_imag:

    Series ifft.

16.17 - series_fill_backward()

Learn how to use the series_fill_backward() function to perform a backward fill interpolation of missing values in a series.

Performs a backward fill interpolation of missing values in a series.

An expression containing dynamic numerical array is the input. The function replaces all instances of missing_value_placeholder with the nearest value from its right side (other than missing_value_placeholder), and returns the resulting array. The rightmost instances of missing_value_placeholder are preserved.

Syntax

series_fill_backward(series[,missing_value_placeholder])

Parameters

NameTypeRequiredDescription
seriesdynamic✔️An array of numeric values.
missing_value_placeholderscalarSpecifies a placeholder for missing values. The default value is double(null). The value can be of any type that will be converted to actual element types. double(null), long(null) and int(null) have the same meaning.

Returns

series with all instances of missing_value_placeholder filled backwards.

Example

let data = datatable(arr: dynamic)
    [
    dynamic([111, null, 36, 41, null, null, 16, 61, 33, null, null])   
];
data 
| project
    arr, 
    fill_backward = series_fill_backward(arr)

Output

arrfill_backward
[111,null,36,41,null,null,16,61,33,null,null][111,36,36,41,16,16,16,61,33,null,null]

16.18 - series_fill_const()

Learn how to use the series_fill_const() function to replace missing values in a series with a specified constant value.

Replaces missing values in a series with a specified constant value.

Takes an expression containing dynamic numerical array as input, replaces all instances of missing_value_placeholder with the specified constant_value and returns the resulting array.

Syntax

series_fill_const(series, constant_value, [ missing_value_placeholder ])

Parameters

NameTypeRequiredDescription
seriesdynamic✔️An array of numeric values.
constant_valuescalar✔️The value used to replace the missing values.
missing_value_placeholderscalarSpecifies a placeholder for missing values. The default value is double(null). The value can be of any type that will be converted to actual element types. double(null), long(null) and int(null) have the same meaning.

Returns

series with all instances of missing_value_placeholder replaced with constant_value.

Example

let data = datatable(arr: dynamic)
    [
    dynamic([111, null, 36, 41, 23, null, 16, 61, 33, null, null])   
];
data 
| project
    arr, 
    fill_const1 = series_fill_const(arr, 0.0),
    fill_const2 = series_fill_const(arr, -1)  

Output

arrfill_const1fill_const2
[111,null,36,41,23,null,16,61,33,null,null][111,0.0,36,41,23,0.0,16,61,33,0.0,0.0][111,-1,36,41,23,-1,16,61,33,-1,-1]

16.19 - series_fill_forward()

Learn how to use the series_fill_forward() function to perform a forward fill interpolation of missing values in a series.

Performs a forward fill interpolation of missing values in a series.

An expression containing dynamic numerical array is the input. The function replaces all instances of missing_value_placeholder with the nearest value from its left side other than missing_value_placeholder, and returns the resulting array. The leftmost instances of missing_value_placeholder are preserved.

Syntax

series_fill_forward(series, [ missing_value_placeholder ])

Parameters

NameTypeRequiredDescription
seriesdynamic✔️An array of numeric values.
missing_value_placeholderscalarSpecifies a placeholder for missing values. The default value is double(null). The value can be of any type that will be converted to actual element types. double(null), long(null) and int(null) have the same meaning.

Returns

series with all instances of missing_value_placeholder filled forwards.

Example

let data = datatable(arr: dynamic)
    [
    dynamic([null, null, 36, 41, null, null, 16, 61, 33, null, null])   
];
data 
| project
    arr, 
    fill_forward = series_fill_forward(arr)  

Output

arrfill_forward
[null,null,36,41,null,null,16,61,33,null,null][null,null,36,41,41,41,16,61,33,33,33]

Use series_fill_backward or series-fill-const to complete interpolation of the above array.

16.20 - series_fill_linear()

Learn how to use the series_fill_linear() function to linearly interpolate missing values in a series.

Linearly interpolates missing values in a series.

Takes an expression containing dynamic numerical array as input, does linear interpolation for all instances of missing_value_placeholder, and returns the resulting array. If the beginning and end of the array contain missing_value_placeholder, then it’s replaced with the nearest value other than missing_value_placeholder. This feature can be turned off. If the whole array consists of the missing_value_placeholder, the array is filled with constant_value, or 0 if not specified.

Syntax

series_fill_linear(series, [ missing_value_placeholder [,fill_edges [, constant_value ]]])

Parameters

NameTypeRequiredDescription
seriesdynamic✔️An array of numeric values.
missing_value_placeholderscalarSpecifies a placeholder for missing values. The default value is double(null). The value can be of any type that will be converted to actual element types. double(null), long(null) and int(null) have the same meaning.
fill_edgesboolIndicates whether missing_value_placeholder at the start and end of the array should be replaced with nearest value. true by default. If set to false, then missing_value_placeholder at the start and end of the array will be preserved.
constant_valuescalarRelevant only for arrays that entirely consist of null values. This parameter specifies a constant value with which to fill the series. Default value is 0. Setting this parameter it to double(null) preserves the null values.

Returns

A series linear interpolation of series using the specified parameters. If series contains only int or long elements, then the linear interpolation returns rounded interpolated values rather than exact ones.

Example

let data = datatable(arr: dynamic)
    [
    dynamic([null, 111.0, null, 36.0, 41.0, null, null, 16.0, 61.0, 33.0, null, null]), // Array of double    
    dynamic([null, 111, null, 36, 41, null, null, 16, 61, 33, null, null]), // Similar array of int
    dynamic([null, null, null, null])                                                   // Array with missing values only
];
data
| project
    arr, 
    without_args = series_fill_linear(arr),
    with_edges = series_fill_linear(arr, double(null), true),
    wo_edges = series_fill_linear(arr, double(null), false),
    with_const = series_fill_linear(arr, double(null), true, 3.14159)  

Output

arrwithout_argswith_edgeswo_edgeswith_const
[null,111.0,null,36.0,41.0,null,null,16.0,61.0,33.0,null,null][111.0,111.0,73.5,36.0,41.0,32.667,24.333,16.0,61.0,33.0,33.0,33.0][111.0,111.0,73.5,36.0,41.0,32.667,24.333,16.0,61.0,33.0,33.0,33.0][null,111.0,73.5,36.0,41.0,32.667,24.333,16.0,61.0,33.0,null,null][111.0,111.0,73.5,36.0,41.0,32.667,24.333,16.0,61.0,33.0,33.0,33.0]
[null,111,null,36,41,null,null,16,61,33,null,null][111,111,73,36,41,32,24,16,61,33,33,33][111,111,73,36,41,32,24,16,61,33,33,33][null,111,73,36,41,32,24,16,61,33,null,null][111,111,74,38, 41,32,24,16,61,33,33,33]
[null,null,null,null][0.0,0.0,0.0,0.0][0.0,0.0,0.0,0.0][0.0,0.0,0.0,0.0][3.14159,3.14159,3.14159,3.14159]

16.21 - series_fir()

Learn how to use the series_fir() function to apply a Finite Impulse Response (FIR) filter on a series.

Applies a Finite Impulse Response (FIR) filter on a series.

The function takes an expression containing a dynamic numerical array as input and applies a Finite Impulse Response filter. By specifying the filter coefficients, it can be used for calculating a moving average, smoothing, change-detection, and many more use cases. The function takes the column containing the dynamic array and a static dynamic array of the filter’s coefficients as input, and applies the filter on the column. It outputs a new dynamic array column, containing the filtered output.

Syntax

series_fir(series, filter [, normalize[, center]])

Parameters

NameTypeRequiredDescription
seriesdynamic✔️An array of numeric values.
filterdynamic✔️An array of numeric values containing the coefficients of the filter.
normalizeboolIndicates whether the filter should be normalized. That is, divided by the sum of the coefficients. If filter contains negative values, then normalize must be specified as false, otherwise result will be null. If not specified, then a default value of true is assumed, depending on the presence of negative values in the filter. If filter contains at least one negative value, then normalize is assumed to be false.
centerboolIndicates whether the filter is applied symmetrically on a time window before and after the current point, or on a time window from the current point backwards. By default, center is false, which fits the scenario of streaming data so that we can only apply the filter on the current and older points. However, for ad-hoc processing you can set it to true, keeping it synchronized with the time series. See examples below. This parameter controls the filter’s group delay.

Returns

A new dynamic array column containing the filtered output.

Examples

  • Calculate a moving average of five points by setting filter=[1,1,1,1,1] and normalize=true (default). Note the effect of center=false (default) vs. true:
range t from bin(now(), 1h) - 23h to bin(now(), 1h) step 1h
| summarize t=make_list(t)
| project
    id='TS',
    val=dynamic([0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 20, 40, 100, 40, 20, 10, 0, 0, 0, 0, 0, 0, 0, 0]),
    t
| extend
    5h_MovingAvg=series_fir(val, dynamic([1, 1, 1, 1, 1])),
    5h_MovingAvg_centered=series_fir(val, dynamic([1, 1, 1, 1, 1]), true, true)
| render timechart

This query returns:
5h_MovingAvg: Five points moving average filter. The spike is smoothed and its peak shifted by (5-1)/2 = 2h.
5h_MovingAvg_centered: Same, but by setting center=true, the peak stays in its original location.

Series fir.

  • To calculate the difference between a point and its preceding one, set filter=[1,-1].
range t from bin(now(), 1h) - 11h to bin(now(), 1h) step 1h
| summarize t=make_list(t)
| project id='TS', t, value=dynamic([0, 0, 0, 0, 2, 2, 2, 2, 3, 3, 3, 3])
| extend diff=series_fir(value, dynamic([1, -1]), false, false)
| render timechart

Series fir 2.

16.22 - series_fit_2lines_dynamic()

Learn how to use the series_fit_2lines_dynamic() function to apply two segments linear regression on a dynamic numerical array.

Applies two segments linear regression on a series, returning a dynamic object.

Takes an expression containing dynamic numerical array as input and applies two segments linear regression in order to identify and quantify trend changes in a series. The function iterates on the series indexes. In each iteration, it splits the series to two parts, and fits a separate line using series_fit_line() or series_fit_line_dynamic(). The function fits the lines to each of the two parts, and calculates the total R-squared value. The best split is the one that maximizes R-squared. The function returns its parameters in dynamic value with the following content:

  • rsquare: R-squared is a standard measure of the fit quality. It’s a number in the range of [0-1], where 1 is the best possible fit, and 0 means the data is unordered and don’t fit any line.
  • split_idx: the index of breaking point to two segments (zero-based).
  • variance: variance of the input data.
  • rvariance: residual variance that is the variance between the input data values the approximated ones (by the two line segments).
  • line_fit: numerical array holding a series of values of the best fitted line. The series length is equal to the length of the input array. It’s used for charting.
  • right.rsquare: r-square of the line on the right side of the split, see series_fit_line() or series_fit_line_dynamic().
  • right.slope: slope of the right approximated line (of the form y=ax+b).
  • right.interception: interception of the approximated left line (b from y=ax+b).
  • right.variance: variance of the input data on the right side of the split.
  • right.rvariance: residual variance of the input data on the right side of the split.
  • left.rsquare: r-square of the line on the left side of the split, see [series_fit_line()].(series-fit-line-function.md) or series_fit_line_dynamic().
  • left.slope: slope of the left approximated line (of the form y=ax+b).
  • left.interception: interception of the approximated left line (of the form y=ax+b).
  • left.variance: variance of the input data on the left side of the split.
  • left.rvariance: residual variance of the input data on the left side of the split.

This operator is similar to series_fit_2lines. Unlike series-fit-2lines, it returns a dynamic bag.

Syntax

series_fit_2lines_dynamic(series)

Parameters

NameTypeRequiredDescription
seriesdynamic✔️An array of numeric values.

Example

print
    id=' ',
    x=range(bin(now(), 1h) - 11h, bin(now(), 1h), 1h),
    y=dynamic([1, 2.2, 2.5, 4.7, 5.0, 12, 10.3, 10.3, 9, 8.3, 6.2])
| extend
    LineFit=series_fit_line_dynamic(y).line_fit,
    LineFit2=series_fit_2lines_dynamic(y).line_fit
| project id, x, y, LineFit, LineFit2
| render timechart

Series fit 2 lines.

16.23 - series_fit_2lines()

Learn how to use the series_fit_2lines() function to apply a two segmented linear regression on a series.

Applies a two segmented linear regression on a series, returning multiple columns.

Takes an expression containing dynamic numerical array as input and applies a two segmented linear regression in order to identify and quantify a trend change in a series. The function iterates on the series indexes. In each iteration, the function splits the series to two parts, fits a separate line (using series_fit_line()) to each part, and calculates the total r-square. The best split is the one that maximized r-square; the function returns its parameters:

ParameterDescription
rsquareR-square is standard measure of the fit quality. It’s a number in the range [0-1], where 1 - is the best possible fit, and 0 means the data is unordered and don’t fit any line.
split_idxThe index of breaking point to two segments (zero-based).
varianceVariance of the input data.
rvarianceResidual variance, which is the variance between the input data values the approximated ones (by the two line segments).
line_fitNumerical array holding a series of values of the best fitted line. The series length is equal to the length of the input array. It’s mainly used for charting.
right_rsquareR-square of the line on the right side of the split, see series_fit_line().
right_slopeSlope of the right approximated line (of the form y=ax+b).
right_interceptionInterception of the approximated left line (b from y=ax+b).
right_varianceVariance of the input data on the right side of the split.
right_rvarianceResidual variance of the input data on the right side of the split.
left_rsquareR-square of the line on the left side of the split, see series_fit_line().
left_slopeSlope of the left approximated line (of the form y=ax+b).
left_interceptionInterception of the approximated left line (of the form y=ax+b).
left_varianceVariance of the input data on the left side of the split.
left_rvarianceResidual variance of the input data on the left side of the split.

Syntax

project series_fit_2lines(series)

  • Will return all mentioned above columns with the following names: series_fit_2lines_x_rsquare, series_fit_2lines_x_split_idx etc.

project (rs, si, v)=series_fit_2lines(series)

  • Will return the following columns: rs (r-square), si (split index), v (variance) and the rest will look like series_fit_2lines_x_rvariance, series_fit_2lines_x_line_fit and etc.

extend (rs, si, v)=series_fit_2lines(series)

  • Will return only: rs (r-square), si (split index) and v (variance).

Parameters

NameTypeRequiredDescription
seriesdynamic✔️An array of numeric values.

Examples

print
    id=' ',
    x=range(bin(now(), 1h) - 11h, bin(now(), 1h), 1h),
    y=dynamic([1, 2.2, 2.5, 4.7, 5.0, 12, 10.3, 10.3, 9, 8.3, 6.2])
| extend
    (Slope, Interception, RSquare, Variance, RVariance, LineFit)=series_fit_line(y),
    (RSquare2, SplitIdx, Variance2, RVariance2, LineFit2)=series_fit_2lines(y)
| project id, x, y, LineFit, LineFit2
| render timechart

Series fit 2 lines.

16.24 - series_fit_line_dynamic()

Learn how to use the series_fit_line_dynamic() function to apply a linear regression on a series to return a dynamic object.

Applies linear regression on a series, returning dynamic object.

Takes an expression containing dynamic numerical array as input, and does linear regression to find the line that best fits it. This function should be used on time series arrays, fitting the output of make-series operator. It generates a dynamic value with the following content:

  • rsquare: r-square is a standard measure of the fit quality. It’s a number in the range [0-1], where 1 is the best possible fit, and 0 means the data is unordered and doesn’t fit any line
  • slope: Slope of the approximated line (the a-value from y=ax+b)
  • variance: Variance of the input data
  • rvariance: Residual variance that is the variance between the input data values and the approximated ones.
  • interception: Interception of the approximated line (the b-value from y=ax+b)
  • line_fit: Numerical array containing a series of values of the best fit line. The series length is equal to the length of the input array. It’s used mainly for charting.

This operator is similar to series_fit_line, but unlike series-fit-line it returns a dynamic bag.

Syntax

series_fit_line_dynamic(series)

Parameters

NameTypeRequiredDescription
seriesdynamic✔️An array of numeric values.

Examples

print
    id=' ',
    x=range(bin(now(), 1h) - 11h, bin(now(), 1h), 1h),
    y=dynamic([2, 5, 6, 8, 11, 15, 17, 18, 25, 26, 30, 30])
| extend fit=series_fit_line_dynamic(y)
| extend
    RSquare=fit.rsquare,
    Slope=fit.slope,
    Variance=fit.variance,
    RVariance=fit.rvariance,
    Interception=fit.interception,
    LineFit=fit.line_fit
| render timechart

Series fit line.

RSquareSlopeVarianceRVarianceInterceptionLineFit
0.9822.73098.6281.686-1.6661.064, 3.7945, 6.526, 9.256, 11.987, 14.718, 17.449, 20.180, 22.910, 25.641, 28.371, 31.102

16.25 - series_fit_line()

Learn how to use the series_fit_line() function to apply a linear regression on a series to return multiple columns.

Applies linear regression on a series, returning multiple columns.

Takes an expression containing dynamic numerical array as input and does linear regression to find the line that best fits it. This function should be used on time series arrays, fitting the output of make-series operator. The function generates the following columns:

  • rsquare: r-square is a standard measure of the fit quality. The value’s a number in the range [0-1], where 1 - is the best possible fit, and 0 means the data is unordered and doesn’t fit any line.
  • slope: Slope of the approximated line (“a” from y=ax+b).
  • variance: Variance of the input data.
  • rvariance: Residual variance that is the variance between the input data values the approximated ones.
  • interception: Interception of the approximated line (“b” from y=ax+b).
  • line_fit: Numerical array holding a series of values of the best fitted line. The series length is equal to the length of the input array. The value’s used for charting.

Syntax

series_fit_line(series)

Parameters

NameTypeRequiredDescription
seriesdynamic✔️An array of numeric values.

Examples

print
    id=' ',
    x=range(bin(now(), 1h) - 11h, bin(now(), 1h), 1h),
    y=dynamic([2, 5, 6, 8, 11, 15, 17, 18, 25, 26, 30, 30])
| extend (RSquare, Slope, Variance, RVariance, Interception, LineFit)=series_fit_line(y)
| render timechart

Series fit line.

RSquareSlopeVarianceRVarianceInterceptionLineFit
0.9822.73098.6281.686-1.6661.064, 3.7945, 6.526, 9.256, 11.987, 14.718, 17.449, 20.180, 22.910, 25.641, 28.371, 31.102

16.26 - series_fit_poly()

Learn how to use the series_fit_poly() to apply a polynomial regression from an independent variable (x_series) to a dependent variable (y_series).

Applies a polynomial regression from an independent variable (x_series) to a dependent variable (y_series). This function takes a table containing multiple series (dynamic numerical arrays) and generates the best fit high-order polynomial for each series using polynomial regression.

Syntax

T | extend series_fit_poly(y_series [, x_series, degree ])

Parameters

NameTypeRequiredDescription
y_seriesdynamic✔️An array of numeric values containing the dependent variable.
x_seriesdynamicAn array of numeric values containing the independent variable. Required only for unevenly spaced series. If not specified, it’s set to a default value of [1, 2, …, length(y_series)].
degreeThe required order of the polynomial to fit. For example, 1 for linear regression, 2 for quadratic regression, and so on. Defaults to 1, which indicates linear regression.

Returns

The series_fit_poly() function returns the following columns:

  • rsquare: r-square is a standard measure of the fit quality. The value’s a number in the range [0-1], where 1 - is the best possible fit, and 0 means the data is unordered and doesn’t fit any line.
  • coefficients: Numerical array holding the coefficients of the best fitted polynomial with the given degree, ordered from the highest power coefficient to the lowest.
  • variance: Variance of the dependent variable (y_series).
  • rvariance: Residual variance that is the variance between the input data values the approximated ones.
  • poly_fit: Numerical array holding a series of values of the best fitted polynomial. The series length is equal to the length of the dependent variable (y_series). The value’s used for charting.

Examples

Example 1

A fifth order polynomial with noise on x & y axes:

range x from 1 to 200 step 1
| project x = rand()*5 - 2.3
| extend y = pow(x, 5)-8*pow(x, 3)+10*x+6
| extend y = y + (rand() - 0.5)*0.5*y
| summarize x=make_list(x), y=make_list(y)
| extend series_fit_poly(y, x, 5)
| project-rename fy=series_fit_poly_y_poly_fit, coeff=series_fit_poly_y_coefficients
|fork (project x, y, fy) (project-away x, y, fy)
| render linechart 

Graph showing fifth order polynomial fit to a series with noise.

Coefficients of fifth order polynomial fit to  a series with noise.

Example 2

Verify that series_fit_poly with degree=1 matches series_fit_line:

demo_series1
| extend series_fit_line(y)
| extend series_fit_poly(y)
| project-rename y_line = series_fit_line_y_line_fit, y_poly = series_fit_poly_y_poly_fit
| fork (project x, y, y_line, y_poly) (project-away id, x, y, y_line, y_poly) 
| render linechart with(xcolumn=x, ycolumns=y, y_line, y_poly)

Graph showing linear regression.

Coefficients of linear regression.

Example 3

Irregular (unevenly spaced) time series:

//
//  x-axis must be normalized to the range [0-1] if either degree is relatively big (>= 5) or original x range is big.
//  so if x is a time axis it must be normalized as conversion of timestamp to long generate huge numbers (number of 100 nano-sec ticks from 1/1/1970)
//
//  Normalization: x_norm = (x - min(x))/(max(x) - min(x))
//
irregular_ts
| extend series_stats(series_add(TimeStamp, 0))                                                                 //  extract min/max of time axis as doubles
| extend x = series_divide(series_subtract(TimeStamp, series_stats__min), series_stats__max-series_stats__min)  // normalize time axis to [0-1] range
| extend series_fit_poly(num, x, 8)
| project-rename fnum=series_fit_poly_num_poly_fit
| render timechart with(ycolumns=num, fnum)

Graph showing eighth order polynomial fit to an irregular time series.

16.27 - series_floor()

Learn how to use the series_floor() function to calculate the element-wise floor function of the numeric series input.

Calculates the element-wise floor function of the numeric series input.

Syntax

series_floor(series)

Parameters

NameTypeRequiredDescription
seriesdynamic✔️An array of numeric values on which the floor function is applied.

Returns

Dynamic array of the calculated floor function. Any non-numeric element yields a null element value.

Example

print s = dynamic([-1.5,1,2.5])
| extend s_floor = series_floor(s)

Output

ss_floor
[-1.5,1,2.5][-2.0,1.0,2.0]

16.28 - series_greater_equals()

Learn how to use the series_greater_equals() function to calculate the element-wise greater or equals (>=) logic operation of two numeric series inputs.

Calculates the element-wise greater or equals (>=) logic operation of two numeric series inputs.

Syntax

series_greater_equals(series1, series2)

Parameters

NameTypeRequiredDescription
series1, series2dynamic✔️The arrays of numeric values to be element-wise compared.

Returns

Dynamic array of booleans containing the calculated element-wise greater or equal logic operation between the two inputs. Any non-numeric element or non-existing element (arrays of different sizes) yields a null element value.

Example

print s1 = dynamic([1,2,4]), s2 = dynamic([4,2,1])
| extend s1_greater_equals_s2 = series_greater_equals(s1, s2)

Output

s1s2s1_greater_equals_s2
[1,2,4][4,2,1][false,true,true]

For entire series statistics comparisons, see:

16.29 - series_greater()

Learn how to use the series_greater() function to calculate the element-wise greater (>) logic operation of two numeric series inputs.

Calculates the element-wise greater (>) logic operation of two numeric series inputs.

Syntax

series_greater(series1, series2)

Parameters

NameTypeRequiredDescription
series1, series2dynamic✔️The arrays of numeric values to be element-wise compared.

Returns

Dynamic array of booleans containing the calculated element-wise greater logic operation between the two inputs. Any non-numeric element or non-existing element (arrays of different sizes) yields a null element value.

Example

print s1 = dynamic([1,2,4]), s2 = dynamic([4,2,1])
| extend s1_greater_s2 = series_greater(s1, s2)

Output

s1s2s1_greater_s2
[1,2,4][4,2,1][false,false,true]

For entire series statistics comparisons, see:

16.30 - series_ifft()

Learn how to use the series_ifft() function to apply the Inverse Fast Fourier Transform (IFFT) on a series.

Applies the Inverse Fast Fourier Transform (IFFT) on a series.

The series_ifft() function takes a series of complex numbers in the frequency domain and transforms it back to the time/spatial domain using the Fast Fourier Transform. This function is the complementary function of series_fft. Commonly the original series is transformed to the frequency domain for spectral processing and then back to the time/spatial domain.

Syntax

series_ifft(fft_real [, fft_imaginary])

Parameters

NameTypeRequiredDescription
fft_realdynamic✔️An array of numeric values representing the real component of the series to transform.
fft_imaginarydynamicAn array of numeric values representing the imaginary component of the series. This parameter should be specified only if the input series contains complex numbers.

Returns

The function returns the complex inverse fft in two series. The first series for the real component and the second one for the imaginary component.

Example

See series_fft

16.31 - series_iir()

Learn how to use the series_iir() function to apply an Infinite Impulse Response filter on a series.

Applies an Infinite Impulse Response filter on a series.

The function takes an expression containing dynamic numerical array as input, and applies an Infinite Impulse Response filter. By specifying the filter coefficients, you can use the function to:

The function takes as input the column containing the dynamic array and two static dynamic arrays of the filter’s denominators and numerators coefficients, and applies the filter on the column. It outputs a new dynamic array column, containing the filtered output.

Syntax

series_iir(series, numerators , denominators)

Parameters

NameTypeRequiredDescription
seriesdynamic✔️An array of numeric values, typically the resulting output of make-series or make_list operators.
numeratorsdynamic✔️An array of numeric values, containing the numerator coefficients of the filter.
denominatorsdynamic✔️An array of numeric values, containing the denominator coefficients of the filter.

The filter’s recursive formula

  • Consider an input array X, and coefficients arrays a and b of lengths n_a and n_b respectively. The transfer function of the filter that will generate the output array Y, is defined by:
Yi = a0-1(b0Xi + b1Xi-1 + ... + bnb-1Xi-nb-1 - a1Yi-1-a2Yi-2 - ... - ana-1Yi-na-1)

Example

Calculate a cumulative sum. Use the iir filter with coefficients denominators=[1,-1] and numerators=[1]:

let x = range(1.0, 10, 1);
print x=x, y = series_iir(x, dynamic([1]), dynamic([1,-1]))
| mv-expand x, y

Output

xy
1.01.0
2.03.0
3.06.0
4.010.0

Here’s how to wrap it in a function:

let vector_sum=(x: dynamic) {
    let y=array_length(x) - 1;
    todouble(series_iir(x, dynamic([1]), dynamic([1, -1]))[y])
};
print d=dynamic([0, 1, 2, 3, 4])
| extend dd=vector_sum(d)

Output

ddd
[0,1,2,3,4]10

16.32 - series_less_equals()

Learn how to use the series_less_equals() function to calculate the element-wise less or equal (<=) logic operation of two numeric series inputs.

Calculates the element-wise less or equal (<=) logic operation of two numeric series inputs.

Syntax

series_less_equals(series1, series2)

Parameters

NameTypeRequiredDescription
series1, series2dynamic✔️The arrays of numeric values to be element-wise compared.

Returns

Dynamic array of booleans containing the calculated element-wise less or equal logic operation between the two inputs. Any non-numeric element or non-existing element (arrays of different sizes) yields a null element value.

Example

print s1 = dynamic([1,2,4]), s2 = dynamic([4,2,1])
| extend s1_less_equals_s2 = series_less_equals(s1, s2)

Output

s1s2s1_less_equals_s2
[1,2,4][4,2,1][true,true,false]

For entire series statistics comparisons, see:

16.33 - series_less()

Learn how to use the series_less() function to calculate the element-wise less (<) logic operation of two numeric series inputs.

Calculates the element-wise less (<) logic operation of two numeric series inputs.

Syntax

series_less(series1, series2)

Parameters

NameTypeRequiredDescription
series1, series2dynamic✔️The arrays of numeric values to be element-wise compared.

Returns

Dynamic array of booleans containing the calculated element-wise less logic operation between the two inputs. Any non-numeric element or non-existing element (arrays of different sizes) yields a null element value.

Example

print s1 = dynamic([1,2,4]), s2 = dynamic([4,2,1])
| extend s1_less_s2 = series_less(s1, s2)

Output

s1s2s1_less_s2
[1,2,4][4,2,1][true,false,false]

For entire series statistics comparisons, see:

16.34 - series_log()

Learn how to use the series_log() function to calculate the element-wise natural logarithm function (base-e) of the numeric series input.

Calculates the element-wise natural logarithm function (base-e) of the numeric series input.

Syntax

series_log(series)

Parameters

NameTypeRequiredDescription
seriesdynamic✔️An array of numeric values on which the natural logarithm function is applied.

Returns

Dynamic array of the calculated natural logarithm function. Any non-numeric element yields a null element value.

Example

print s = dynamic([1,2,3])
| extend s_log = series_log(s)

Output

ss_log
[1,2,3][0.0,0.69314718055994529,1.0986122886681098]

16.35 - series_magnitude()

Learn how to use the series_magnitude() function to calculate the magnitude of series elements.

Calculates the magnitude of series elements. This is equivalent to the square root of the dot product of the series with itself.

Syntax

series_magnitude(series)

Parameters

NameTypeRequiredDescription
seriesdynamic✔️Array of numeric values.

Returns

Returns a double type value representing the magnitude of the series.

Example

print arr=dynamic([1,2,3,4]) 
| extend series_magnitude=series_magnitude(arr)

Output

s1s2series_magnitude
[1,2,3,4]5.4772255750516612

16.36 - series_multiply()

Learn how to use the series_multiply() function to calculate the element-wise multiplication of two numeric series inputs.

Calculates the element-wise multiplication of two numeric series inputs.

Syntax

series_multiply(series1, series2)

Parameters

NameTypeRequiredDescription
series1, series2dynamic✔️The arrays of numeric values to be element-wise multiplied.

Returns

Dynamic array of calculated element-wise multiplication operation between the two inputs. Any non-numeric element or non-existing element (arrays of different sizes) yields a null element value.

Example

range x from 1 to 3 step 1
| extend y = x * 2
| extend z = y * 2
| project s1 = pack_array(x,y,z), s2 = pack_array(z, y, x)
| extend s1_multiply_s2 = series_multiply(s1, s2)

Output

s1s2s1_multiply_s2
[1,2,4][4,2,1][4,4,4]
[2,4,8][8,4,2][16,16,16]
[3,6,12][12,6,3][36,36,36]

16.37 - series_not_equals()

Learn how to use the series_not_equals() function to calculate the element-wise not equals (!=) logic operation of two numeric series inputs.

Calculates the element-wise not equals (!=) logic operation of two numeric series inputs.

Syntax

series_not_equals(series1, series2)

Parameters

NameTypeRequiredDescription
series1, series2dynamic✔️The arrays of numeric values to be element-wise compared.

Returns

Dynamic array of booleans containing the calculated element-wise not equal logic operation between the two inputs. Any non-numeric element or non-existing element (arrays of different sizes) yields a null element value.

Example

print s1 = dynamic([1,2,4]), s2 = dynamic([4,2,1])
| extend s1_not_equals_s2 = series_not_equals(s1, s2)

Output

s1s2s1_not_equals_s2
[1,2,4][4,2,1][true,false,true]

For entire series statistics comparisons, see:

16.38 - series_outliers()

Learn how to use the series_outliers() function to score anomaly points in a series.

Scores anomaly points in a series.

The function takes an expression with a dynamic numerical array as input, and generates a dynamic numeric array of the same length. Each value of the array indicates a score of a possible anomaly, using “Tukey’s test”. A value greater than 1.5 in the same element of the input indicates a rise anomaly. A value less than -1.5 indicates a decline anomaly.

Syntax

series_outliers(series [, kind ] [, ignore_val ] [, min_percentile ] [, max_percentile ])

Parameters

NameTypeRequiredDescription
seriesdynamic✔️An array of numeric values.
kindstringThe algorithm to use for outlier detection. The supported options are "tukey", which is traditional “Tukey”, and "ctukey", which is custom “Tukey”. The default is "ctukey".
ignore_valint, long, or realA numeric value indicating the missing values in the series. The default is double(null). The score of nulls and ignore values is set to 0.
min_percentileint, long, or realThe minimum percentile to use to calculate the normal inter-quantile range. The default is 10. The value must be in the range [2.0, 98.0]. This parameter is only relevant for the "ctukey" kind.
max_percentileint, long, or realThe maximum percentile to use to calculate the normal inter-quantile range. The default is 90. The value must be in the range [2.0, 98.0]. This parameter is only relevant for the "ctukey" kind.

The following table describes differences between "tukey" and "ctukey":

AlgorithmDefault quantile rangeSupports custom quantile range
"tukey"25% / 75%No
"ctukey"10% / 90%Yes

Example

range x from 0 to 364 step 1 
| extend t = datetime(2023-01-01) + 1d*x
| extend y = rand() * 10
| extend y = iff(monthofyear(t) != monthofyear(prev(t)), y+20, y) // generate a sample series with outliers at first day of each month
| summarize t = make_list(t), series = make_list(y)
| extend outliers=series_outliers(series)
| extend pos_anomalies = array_iff(series_greater_equals(outliers, 1.5), 1, 0)
| render anomalychart with(xcolumn=t, ycolumns=series, anomalycolumns=pos_anomalies)

Chart of a time series with outliers.

16.39 - series_pearson_correlation()

Learn how to use the series_pearson_correlation() function to calculate the pearson correlation coefficient of two numeric series inputs.

Calculates the pearson correlation coefficient of two numeric series inputs.

See: Pearson correlation coefficient.

Syntax

series_pearson_correlation(series1, series2)

Parameters

NameTypeRequiredDescription
series1, series2dynamic✔️The arrays of numeric values for calculating the correlation coefficient.

Returns

The calculated Pearson correlation coefficient between the two inputs. Any non-numeric element or nonexisting element (arrays of different sizes) yields a null result.

Example

range s1 from 1 to 5 step 1
| extend s2 = 2 * s1 // Perfect correlation
| summarize s1 = make_list(s1), s2 = make_list(s2)
| extend correlation_coefficient = series_pearson_correlation(s1, s2)

Output

s1s2correlation_coefficient
[1,2,3,4,5][2,4,6,8,10]1

16.40 - series_periods_detect()

Learn how to use the series_periods_detect() function to find the most significant periods that exist in a time series.

Finds the most significant periods within a time series.

The series_periods_detect() function is useful for detecting periodic patterns in data, such as daily, weekly, or monthly cycles.

Syntax

series_periods_detect(series, min_period, max_period, num_periods)

Parameters

NameTypeRequiredDescription
seriesdynamic✔️An array of numeric values, typically the resulting output of the make-series or make_list operators.
min_periodreal✔️The minimal period length for which to search.
max_periodreal✔️The maximal period length for which to search.
num_periodslong✔️The maximum number of periods to return. This number is the length of the output dynamic arrays.

Returns

The function returns a table with two columns:

  • periods: A dynamic array containing the periods found, in units of the bin size, ordered by their scores.
  • scores: A dynamic array containing values between 0 and 1. Each array measures the significance of a period in its respective position in the periods array.

Example

The following query embeds a snapshot of application traffic for one month. The amount of traffic is aggregated twice a day, meaning the bin size is 12 hours. The query produces a line chart clearly showing a pattern in the data.

print y=dynamic([80, 139, 87, 110, 68, 54, 50, 51, 53, 133, 86, 141, 97, 156, 94, 149, 95, 140, 77, 61, 50, 54, 47, 133, 72, 152, 94, 148, 105, 162, 101, 160, 87, 63, 53, 55, 54, 151, 103, 189, 108, 183, 113, 175, 113, 178, 90, 71, 62, 62, 65, 165, 109, 181, 115, 182, 121, 178, 114, 170])
| project x=range(1, array_length(y), 1), y  
| render linechart

Series periods.

You can run the series_periods_detect() function on the same series to identify the recurring patterns. The function searches for patterns in the specified period range and returns two values. The first value indicates a detected pattern that is 14 point long with a score of approximately .84. The other value is zero that indicates no additional pattern was found.

print y=dynamic([80, 139, 87, 110, 68, 54, 50, 51, 53, 133, 86, 141, 97, 156, 94, 149, 95, 140, 77, 61, 50, 54, 47, 133, 72, 152, 94, 148, 105, 162, 101, 160, 87, 63, 53, 55, 54, 151, 103, 189, 108, 183, 113, 175, 113, 178, 90, 71, 62, 62, 65, 165, 109, 181, 115, 182, 121, 178, 114, 170])
| project x=range(1, array_length(y), 1), y  
| project series_periods_detect(y, 0.0, 50.0, 2)

Output

series_periods_detect_y_periodsseries_periods_detect_y_periods_scores
[14, 0][0.84, 0]

The value in series_periods_detect_y_periods_scores is truncated.

16.41 - series_periods_validate()

Learn how to use the series_periods_validate() function to check whether a time series contains periodic patterns of given lengths.

Checks whether a time series contains periodic patterns of given lengths.

Often a metric measuring the traffic of an application is characterized by a weekly or daily period. This period can be confirmed by running series_periods_validate() that checks for a weekly and daily period.

Syntax

series_periods_validate(series, period1 [ , period2 , . . . ] )

Parameters

NameTypeRequiredDescription
seriesdynamic✔️An array of numeric values, typically the resulting output of make-series or make_list operators.
period1, period2, etc.real✔️The periods to validate in units of the bin size. For example, if the series is in 1h bins, a weekly period is 168 bins. At least one period is required.

Returns

The function outputs a table with two columns:

  • periods: A dynamic array that contains the periods to validate as supplied in the input.
  • scores: A dynamic array that contains a score between 0 and 1. The score shows the significance of a period in its respective position in the periods array.

Example

The following query embeds a snapshot of a month of an application’s traffic, aggregated twice a day (the bin size is 12 hours).

print y=dynamic([80, 139, 87, 110, 68, 54, 50, 51, 53, 133, 86, 141, 97, 156, 94, 149, 95, 140, 77, 61, 50, 54, 47, 133, 72, 152, 94, 148, 105, 162, 101, 160, 87, 63, 53, 55, 54, 151, 103, 189, 108, 183, 113, 175, 113, 178, 90, 71, 62, 62, 65, 165, 109, 181, 115, 182, 121, 178, 114, 170])
| project x=range(1, array_length(y), 1), y  
| render linechart

Series periods.

If you run series_periods_validate() on this series to validate a weekly period (14 points long) it results in a high score, and with a 0 score when you validate a five-day period (10 points long).

print y=dynamic([80, 139, 87, 110, 68, 54, 50, 51, 53, 133, 86, 141, 97, 156, 94, 149, 95, 140, 77, 61, 50, 54, 47, 133, 72, 152, 94, 148, 105, 162, 101, 160, 87, 63, 53, 55, 54, 151, 103, 189, 108, 183, 113, 175, 113, 178, 90, 71, 62, 62, 65, 165, 109, 181, 115, 182, 121, 178, 114, 170])
| project x=range(1, array_length(y), 1), y  
| project series_periods_validate(y, 14.0, 10.0)

Output

series_periods_validate_y_periodsseries_periods_validate_y_scores
[14.0, 10.0][0.84, 0.0]

16.42 - series_seasonal()

Learn how to use the series_seasonal() function to calculate the seasonal component of a series according to the detected seasonal period.

Calculates the seasonal component of a series, according to the detected or given seasonal period.

Syntax

series_seasonal(series [, period ])

Parameters

NameTypeRequiredDescription
seriesdynamic✔️An array of numeric values.
periodintThe number of bins for each seasonal period. This value can be any positive integer. By default, the value is set to -1, which automatically detects the period using the series_periods_detect() with a threshold of 0.7. If seasonality is not detected, the function returns zeros. If a different value is set, it ignores seasonality and returns a series of zeros.

Returns

A dynamic array of the same length as the series input that contains the calculated seasonal component of the series. The seasonal component is calculated as the median of all the values that correspond to the location of the bin, across the periods.

Examples

Auto detect the period

In the following example, the series’ period is automatically detected. The first series’ period is detected to be six bins and the second five bins. The third series’ period is too short to be detected and returns a series of zeroes. See the next example on how to force the period.

print s=dynamic([2, 5, 3, 4, 3, 2, 1, 2, 3, 4, 3, 2, 1, 2, 3, 4, 3, 2, 1, 2, 3, 4, 3, 2, 1])
| union (print s=dynamic([8, 12, 14, 12, 10, 10, 12, 14, 12, 10, 10, 12, 14, 12, 10, 10, 12, 14, 12, 10]))
| union (print s=dynamic([1, 3, 5, 2, 4, 6, 1, 3, 5, 2, 4, 6]))
| extend s_seasonal = series_seasonal(s)

Output

ss_seasonal
[2,5,3,4,3,2,1,2,3,4,3,2,1,2,3,4,3,2,1,2,3,4,3,2,1][1.0,2.0,3.0,4.0,3.0,2.0,1.0,2.0,3.0,4.0,3.0,2.0,1.0,2.0,3.0,4.0,3.0,2.0,1.0,2.0,3.0,4.0,3.0,2.0,1.0]
[8,12,14,12,10,10,12,14,12,10,10,12,14,12,10,10,12,14,12,10][10.0,12.0,14.0,12.0,10.0,10.0,12.0,14.0,12.0,10.0,10.0,12.0,14.0,12.0,10.0,10.0,12.0,14.0,12.0,10.0]
[1,3,5,2,4,6,1,3,5,2,4,6][0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]

Force a period

In this example, the series’ period is too short to be detected by series_periods_detect(), so we explicitly force the period to get the seasonal pattern.

print s=dynamic([1, 3, 5, 1, 3, 5, 2, 4, 6]) 
| union (print s=dynamic([1, 3, 5, 2, 4, 6, 1, 3, 5, 2, 4, 6]))
| extend s_seasonal = series_seasonal(s, 3)

Output

ss_seasonal
[1,3,5,1,3,5,2,4,6][1.0,3.0,5.0,1.0,3.0,5.0,1.0,3.0,5.0]
[1,3,5,2,4,6,1,3,5,2,4,6][1.5,3.5,5.5,1.5,3.5,5.5,1.5,3.5,5.5,1.5,3.5,5.5]

16.43 - series_sign()

Learn how to use the series_sign() function to calculate the element-wise sign of the numeric series input.

Calculates the element-wise sign of the numeric series input.

Syntax

series_sign(series)

Parameters

NameTypeRequiredDescription
seriesdynamic✔️An array of numeric values over which the sign function is applied.

Returns

A dynamic array of calculated sign function values. -1 for negative, 0 for 0, and 1 for positive. Any non-numeric element yields a null element value.

Example

print arr = dynamic([-6, 0, 8])
| extend arr_sign = series_sign(arr)

Output

arrarr_sign
[-6,0,8][-1,0,1]

16.44 - series_sin()

Learn how to use the series_sin() function to calculate the element-wise sine of the numeric series input.

Calculates the element-wise sine of the numeric series input.

Syntax

series_sin(series)

Parameters

NameTypeRequiredDescription
seriesdynamic✔️An array of numeric values over which the sine function is applied.

Returns

A dynamic array of calculated sine function values. Any non-numeric element yields a null element value.

Example

print arr = dynamic([-1, 0, 1])
| extend arr_sin = series_sin(arr)

Output

arrarr_sin
[-6.5,0,8.2][-0.8414709848078965,0.0,0.8414709848078965]

16.45 - series_stats_dynamic()

Learn how to use the series_stats_dynamic() function to calculate the statistics for a series in a dynamic object.

Returns statistics for a series in a dynamic object.

Syntax

series_stats_dynamic(series [, ignore_nonfinite ])

Parameters

NameTypeRequiredDescription
seriesdynamic✔️An array of numeric values.
ignore_nonfiniteboolIndicates whether to calculate the statistics while ignoring non-finite values, such as null, NaN, inf, and so on. The default is false, which returns null if non-finite values are present in the array.

Returns

A dynamic property bag object with the following content:

  • min: The minimum value in the input array.
  • min_idx: The first position of the minimum value in the input array.
  • max: The maximum value in the input array.
  • max_idx: The first position of the maximum value in the input array.
  • avg: The average value of the input array.
  • variance: The sample variance of input array.
  • stdev: The sample standard deviation of the input array.
  • sum: The sum of the values in the input array.
  • len: The length of the input array.

Example

print x=dynamic([23, 46, 23, 87, 4, 8, 3, 75, 2, 56, 13, 75, 32, 16, 29]) 
| project stats=series_stats_dynamic(x)

Output

stats
{“min”: 2.0, “min_idx”: 8, “max”: 87.0, “max_idx”: 3, “avg”: 32.8, “stdev”: 28.503633853548269, “variance”: 812.45714285714291, “sum”: 492.0, “len”: 15}

The following query creates a series of the average taxi fare per minute, and then calculates statistics on these average fares:

nyc_taxi
| make-series Series=avg(fare_amount) on pickup_datetime step 1min
| project Stats=series_stats_dynamic(Series)

Output

Stats
{“min”:0,“min_idx”:96600,“max”:“31.779069767441861”,“max_idx”:481260,“avg”:“13.062685479531414”,“stdev”:“1.7730590207741219”,“variance”:“3.1437382911484884”,“sum”:“6865747.488041711”,“len”:525600}

16.46 - series_stats()

Learn how to use the series_stats() function to calculate the statistics for a numerical series using multiple columns.

Returns statistics for a numerical series in a table with a column for each statistic.

Syntax

... | extend ( Name,) = series_stats ( series [, ignore_nonfinite] )

Parameters

NameTypeRequiredDescription
NamestringThe column labels for the output table. If not provided, the system will generate them. If you provide a limited number of names, the table will show only those columns.
seriesdynamic✔️An array of numeric values.
ignore_nonfiniteboolDetermines if the calculation includes non-finite values like null, NaN, inf, and so on. The default is false, which will result in null if non-finite values are present.

Returns

A table with a column for each of the statistics displayed in the following table.

StatisticDescription
minThe minimum value in the input array.
min_idxThe first position of the minimum value in the input array.
maxThe maximum value in the input array.
max_idxThe first position of the maximum value in the input array.
avgThe average value of the input array.
varianceThe sample variance of input array.
stdevThe sample standard deviation of the input array.

Example

print x=dynamic([23, 46, 23, 87, 4, 8, 3, 75, 2, 56, 13, 75, 32, 16, 29]) 
| project series_stats(x)

Output

series_stats_x_minseries_stats_x_min_idxseries_stats_x_maxseries_stats_x_max_idxseries_stats_x_avgseries_stats_x_stdevseries_stats_x_variance
2887332.828.5036338535483812.457142857143

16.47 - series_subtract()

Learn how to use the series_subtract() function to calculate the element-wise subtraction of two numeric series inputs.

Calculates the element-wise subtraction of two numeric series inputs.

Syntax

series_subtract(series1, series2)

Parameters

NameTypeRequiredDescription
series1, series2dynamic✔️Arrays of numeric values, the second array to be element-wise subtracted from the first array.

Returns

A dynamic array of calculated element-wise subtract operation between the two inputs. Any non-numeric element or non-existing element, such as in the case of arrays of different sizes, yields a null element value.

Example

range x from 1 to 3 step 1
| extend y = x * 2
| extend z = y * 2
| project s1 = pack_array(x,y,z), s2 = pack_array(z, y, x)
| extend s1_subtract_s2 = series_subtract(s1, s2)

Output

s1s2s1_subtract_s2
[1,2,4][4,2,1][-3,0,3]
[2,4,8][8,4,2][-6,0,6]
[3,6,12][12,6,3][-9,0,9]

16.48 - series_sum()

Learn how to use the series_sum() function to calculate the sum of series elements.

Calculates the sum of series elements.

Syntax

series_sum(series)

Parameters

NameTypeRequiredDescription
seriesdynamic✔️Array of numeric values.

Returns

Returns a double type value with the sum of the elements of the array.

Example

print arr=dynamic([1,2,3,4]) 
| extend series_sum=series_sum(arr)

Output

s1series_sum
[1,2,3,4]10

16.49 - series_tan()

Learn how to use the series_tan() function to calculate the element-wise tangent of the numeric series input.

Calculates the element-wise tangent of the numeric series input.

Syntax

series_tan(series)

Parameters

NameTypeRequiredDescription
seriesdynamic✔️An array of numeric values on which the tangent function is applied.

Returns

A dynamic array of calculated tangent function values. Any non-numeric element yields a null element value.

Example

print arr = dynamic([-1, 0, 1])
| extend arr_tan = series_tan(arr)

Output

arrarr_tan
[-6.5,0,8.2][-1.5574077246549023,0.0,1.5574077246549023]

16.50 - series_asin()

Learn how to use the series_asin() function to calculate the element-wise arcsine function of the numeric series input.

Calculates the element-wise arcsine function of the numeric series input.

Syntax

series_asin(series)

Parameters

NameTypeRequiredDescription
seriesdynamic✔️An array of numeric values over which the arcsine function is applied.

Returns

Dynamic array of calculated arcsine function values. Any non-numeric element yields a null element value.

Example

The following example creates a dynamic array, arr, with the value [-1,0,1]. It then extends the results with column arr_asin, containing the results of the series_asin() function applied to the arr array.

print arr = dynamic([-1,0,1])
| extend arr_asin = series_asin(arr)

Output

arrarr_asin
[-1,0,1]["-1.5707963267948966",0,“1.5707963267948966”]

16.51 - series_ceiling()

Learn how to use the series_ceiling() function to calculate the element-wise ceiling function of the numeric series input.

Calculates the element-wise ceiling function of the numeric series input.

Syntax

series_ceiling(series)

Parameters

NameTypeRequiredDescription
seriesdynamic✔️An array of numeric values over which the ceiling function is applied.

Returns

Dynamic array of the calculated ceiling function. Any non-numeric element yields a null element value.

Example

print s = dynamic([-1.5,1,2.5])
| extend s_ceiling = series_ceiling(s)

Output

ss_ceiling
[-1.5,1,2.5][-1.0,1.0,3.0]

16.52 - series_pow()

Learn how to use the series_pow() function to calculate the element-wise power of two numeric series inputs.

Calculates the element-wise power of two numeric series inputs.

Syntax

series_pow(series1, series2)

Parameters

NameTypeRequiredDescription
series1, series2dynamic✔️Arrays of numeric values. The first array, or base, is element-wise raised to the power of the second array, or power, into a dynamic array result.

Returns

A dynamic array of calculated element-wise power operation between the two inputs. Any non-numeric element or non-existing element, such as in the case of arrays of different sizes, yields a null element value.

Example

print x = dynamic([1, 2, 3, 4]), y=dynamic([1, 2, 3, 0.5])
| extend x_pow_y = series_pow(x, y) 

Output

xyx_pow_y
[1,2,3,4][1,2,3,0.5][1.0,4.0,27.0,2.0]

17 - Window functions

17.1 - next()

Learn how to use the next() function to return the value of the next column at an offset.

Returns the value of a column in a row that is at some offset following the current row in a serialized row set.

Syntax

next(column, [ offset, default_value ])

Parameters

NameTypeRequiredDescription
columnstring✔️The column from which to get the values.
offsetintThe amount of rows to move from the current row. Default is 1.
default_valuescalarThe default value when there’s no value in the next row. When no default value is specified, null is used.

Examples

Filter data based on comparison between adjacent rows

The following query returns rows that show breaks longer than a quarter of a second between calls to sensor-9.

TransformedSensorsData
| where SensorName == 'sensor-9'
| sort by Timestamp asc
| extend timeDiffInMilliseconds = datetime_diff('millisecond', next(Timestamp, 1), Timestamp)
| where timeDiffInMilliseconds > 250

Output

TimestampSensorNameValuePublisherIdMachineIdtimeDiff
2022-04-13T00:58:53.048506Zsensor-90.39217481975439894fdbd39ab-82ac-4ca0-99ed-2f83daf3f9bbM100251
2022-04-13T01:07:09.63713Zsensor-90.46645392778288297e3ed081e-501b-4d59-8e60-8524633d9131M100313
2022-04-13T01:07:10.858267Zsensor-90.693091598493419278ca033-2b5e-4f2c-b493-00319b275aeaM100254
2022-04-13T01:07:11.203834Zsensor-90.524158088402497784ea27181-392d-4947-b811-ad5af02a54bbM100331
2022-04-13T01:07:14.431908Zsensor-90.354306454054520af415c2-59dc-4a50-89c3-9a18ae5d621fM100268

Perform aggregation based on comparison between adjacent rows

The following query calculates the average time difference in milliseconds between calls to sensor-9.

TransformedSensorsData
| where SensorName == 'sensor-9'
| sort by Timestamp asc
| extend timeDiffInMilliseconds = datetime_diff('millisecond', next(Timestamp, 1), Timestamp)
| summarize avg(timeDiffInMilliseconds)

Output

avg_timeDiffInMilliseconds
30.726900061254298

Extend row with data from the next row

In the following query, as part of the serialization done with the serialize operator, a new column next_session_type is added with data from the next row.

ConferenceSessions
| where conference == 'Build 2019'
| serialize next_session_type = next(session_type)
| project time_and_duration, session_title, session_type, next_session_type

Output

time_and_durationsession_titlesession_typenext_session_type
Mon, May 6, 8:30-10:00 amVision Keynote - Satya NadellaKeynoteExpo Session
Mon, May 6, 1:20-1:40 pmAzure Data Explorer: Advanced Time Series analysisExpo SessionBreakout
Mon, May 6, 2:00-3:00 pmAzure’s Data Platform - Powering Modern Applications and Cloud Scale Analytics at Petabyte ScaleBreakoutExpo Session
Mon, May 6, 4:00-4:20 pmHow BASF is using Azure Data ServicesExpo SessionExpo Session
Mon, May 6, 6:50 - 7:10 pmAzure Data Explorer: Operationalize your ML modelsExpo SessionExpo Session

17.2 - prev()

Learn how to use the prev() function to return the value of a specific column in a specified row.

Returns the value of a specific column in a specified row. The specified row is at a specified offset from the current row in a serialized row set.

Syntax

prev(column, [ offset ], [ default_value ] )

Parameters

NameTypeRequiredDescription
columnstring✔️The column from which to get the values.
offsetintThe offset to go back in rows. The default is 1.
default_valuescalarThe default value to be used when there are no previous rows from which to take the value. The default is null.

Examples

Filter data based on comparison between adjacent rows

The following query returns rows that show breaks longer than a quarter of a second between calls to sensor-9.

TransformedSensorsData
| where SensorName == 'sensor-9'
| sort by Timestamp asc
| extend timeDiffInMilliseconds = datetime_diff('millisecond', Timestamp, prev(Timestamp, 1))
| where timeDiffInMilliseconds > 250

Output

TimestampSensorNameValuePublisherIdMachineIdtimeDiff
2022-04-13T00:58:53.048506Zsensor-90.39217481975439894fdbd39ab-82ac-4ca0-99ed-2f83daf3f9bbM100251
2022-04-13T01:07:09.63713Zsensor-90.46645392778288297e3ed081e-501b-4d59-8e60-8524633d9131M100313
2022-04-13T01:07:10.858267Zsensor-90.693091598493419278ca033-2b5e-4f2c-b493-00319b275aeaM100254
2022-04-13T01:07:11.203834Zsensor-90.524158088402497784ea27181-392d-4947-b811-ad5af02a54bbM100331
2022-04-13T01:07:14.431908Zsensor-90.354306454054520af415c2-59dc-4a50-89c3-9a18ae5d621fM100268

Perform aggregation based on comparison between adjacent rows

The following query calculates the average time difference in milliseconds between calls to sensor-9.

TransformedSensorsData
| where SensorName == 'sensor-9'
| sort by Timestamp asc
| extend timeDiffInMilliseconds = datetime_diff('millisecond', Timestamp, prev(Timestamp, 1))
| summarize avg(timeDiffInMilliseconds)

Output

avg_timeDiffInMilliseconds
30.726900061254298

Extend row with data from the previous row

In the following query, as part of the serialization done with the serialize operator, a new column previous_session_type is added with data from the previous row. Since there was no session prior to the first session, the column is empty in the first row.

ConferenceSessions
| where conference == 'Build 2019'
| serialize previous_session_type = prev(session_type)
| project time_and_duration, session_title, session_type, previous_session_type

Output

time_and_durationsession_titlesession_typeprevious_session_type
Mon, May 6, 8:30-10:00 amVision Keynote - Satya NadellaKeynote
Mon, May 6, 1:20-1:40 pmAzure Data Explorer: Advanced Time Series analysisExpo SessionKeynote
Mon, May 6, 2:00-3:00 pmAzure’s Data Platform - Powering Modern Applications and Cloud Scale Analytics at Petabyte ScaleBreakoutExpo Session
Mon, May 6, 4:00-4:20 pmHow BASF is using Azure Data ServicesExpo SessionBreakout
Mon, May 6, 6:50 - 7:10 pmAzure Data Explorer: Operationalize your ML modelsExpo SessionExpo Session

17.3 - row_cumsum()

Learn how to use the row_cumsum() function to calculate the cumulative sum of a column in a serialized row set.

Calculates the cumulative sum of a column in a serialized row set.

Syntax

row_cumsum( term [, restart] )

Parameters

NameTypeRequiredDescription
termint, long, or real✔️The expression indicating the value to be summed.
restartboolIndicates when the accumulation operation should be restarted, or set back to 0. It can be used to indicate partitions in the data.

Returns

The function returns the cumulative sum of its argument.

Examples

The following example shows how to calculate the cumulative sum of the first few even integers.

datatable (a:long) [
    1, 2, 3, 4, 5, 6, 7, 8, 9, 10
]
| where a%2==0
| serialize cs=row_cumsum(a)
acs
22
46
612
820
1030

This example shows how to calculate the cumulative sum (here, of salary) when the data is partitioned (here, by name):

datatable (name:string, month:int, salary:long)
[
    "Alice", 1, 1000,
    "Bob",   1, 1000,
    "Alice", 2, 2000,
    "Bob",   2, 1950,
    "Alice", 3, 1400,
    "Bob",   3, 1450,
]
| order by name asc, month asc
| extend total=row_cumsum(salary, name != prev(name))
namemonthsalarytotal
Alice110001000
Alice220003000
Alice314004400
Bob110001000
Bob219502950
Bob314504400

17.4 - row_number()

Learn how to use the row_number() to return the current row’s index in a serialized row set.

Returns the current row’s index in a serialized row set.

The row index starts by default at 1 for the first row, and is incremented by 1 for each additional row. Optionally, the row index can start at a different value than 1. Additionally, the row index may be reset according to some provided predicate.

Syntax

row_number( [StartingIndex [, Restart]] )

Parameters

NameTypeRequiredDescription
StartingIndexlongThe value of the row index to start at or restart to. The default value is 1.
restartboolIndicates when the numbering is to be restarted to the StartingIndex value. The default is false.

Returns

The function returns the row index of the current row as a value of type long.

Examples

The following example returns a table with two columns, the first column (a) with numbers from 10 down to 1, and the second column (rn) with numbers from 1 up to 10:

range a from 1 to 10 step 1
| sort by a desc
| extend rn=row_number()

The following example is similar to the above, only the second column (rn) starts at 7:

range a from 1 to 10 step 1
| sort by a desc
| extend rn=row_number(7)

The last example shows how one can partition the data and number the rows per each partition. Here, we partition the data by Airport:

datatable (Airport:string, Airline:string, Departures:long)
[
  "TLV", "LH", 1,
  "TLV", "LY", 100,
  "SEA", "LH", 1,
  "SEA", "BA", 2,
  "SEA", "LY", 0
]
| sort by Airport asc, Departures desc
| extend Rank=row_number(1, prev(Airport) != Airport)

Running this query produces the following result:

AirportAirlineDeparturesRank
SEABA21
SEALH12
SEALY03
TLVLY1001
TLVLH12

17.5 - row_rank_dense()

Learn how to use the row_rank_dense() function to return the current row’s dense rank in a serialized row set.

Returns the current row’s dense rank in a serialized row set.

The row rank starts by default at 1 for the first row, and is incremented by 1 whenever the provided Term is different than the previous row’s Term.

Syntax

row_rank_dense ( Term )

Parameters

NameTypeRequiredDescription
Termstring✔️An expression indicating the value to consider for the rank. The rank is increased whenever the Term changes.
restartboolIndicates when the numbering is to be restarted to the StartingIndex value. The default is false.

Returns

Returns the row rank of the current row as a value of type long.

Example

The following query shows how to rank the Airline by the number of departures from the SEA Airport using dense rank.

datatable (Airport:string, Airline:string, Departures:long)
[
  "SEA", "LH", 3,
  "SEA", "LY", 100,
  "SEA", "UA", 3,
  "SEA", "BA", 2,
  "SEA", "EL", 3
]
| sort by Departures asc
| extend Rank=row_rank_dense(Departures)

Output

AirportAirlineDeparturesRank
SEABA21
SEALH32
SEAUA32
SEAEL32
SEALY1003

The following example shows how to rank the Airline by the number of departures per each partition. Here, we partition the data by Airport:

datatable (Airport:string, Airline:string, Departures:long)
[
  "SEA", "LH", 3,
  "SEA", "LY", 100,
  "SEA", "UA", 3,
  "SEA", "BA", 2,
  "SEA", "EL", 3,
  "AMS", "EL", 1,
  "AMS", "BA", 1
]
| sort by Airport desc, Departures asc
| extend Rank=row_rank_dense(Departures, prev(Airport) != Airport)

Output

AirportAirlineDeparturesRank
SEABA21
SEALH32
SEAUA32
SEAEL32
SEALY1003
AMSEL11
AMSBA11

17.6 - row_rank_min()

Learn how to use the row_rank_min() function to return the current row’s minimal rank in a serialized row set.

Returns the current row’s minimal rank in a serialized row set.

The rank is the minimal row number that the current row’s Term appears in.

Syntax

row_rank_min ( Term )

Parameters

NameTypeRequiredDescription
Termstring✔️An expression indicating the value to consider for the rank. The rank is the minimal row number for Term.
restartboolIndicates when the numbering is to be restarted to the StartingIndex value. The default is false.

Returns

Returns the row rank of the current row as a value of type long.

Example

The following query shows how to rank the Airline by the number of departures from the SEA Airport.

datatable (Airport:string, Airline:string, Departures:long)
[
  "SEA", "LH", 3,
  "SEA", "LY", 100,
  "SEA", "UA", 3,
  "SEA", "BA", 2,
  "SEA", "EL", 3
]
| sort by Departures asc
| extend Rank=row_rank_min(Departures)

Output

AirportAirlineDeparturesRank
SEABA21
SEALH32
SEAUA32
SEAEL32
SEALY1005

17.7 - row_window_session()

Learn how to use the row_window_session() function to calculate session start values of a column in a serialized row set.

Calculates session start values of a column in a serialized row set.

Syntax

row_window_session ( Expr , MaxDistanceFromFirst , MaxDistanceBetweenNeighbors [, Restart] )

Parameters

NameTypeRequiredDescription
Exprdatetime✔️An expression whose values are grouped together in sessions. When Expr results in a null value, the next value starts a new session.
MaxDistanceFromFirsttimespan✔️Determines when a new session starts using the maximum distance between the current Expr value and its value at the beginning of the session.
MaxDistanceBetweenNeighborstimespan✔️Another criterion for starting a new session using the maximum distance from one value of Expr to the next.
RestartbooleanIf specified, every value that evaluates to true immediately restarts the session.

Returns

The function returns the values at the beginning of each session. It uses the following conceptual calculation model:

  1. Iterates over the input sequence of Expr values in order.

  2. For each value, it decides whether to create a new session.

  3. If a new session is created, the function returns the current value of Expr. Otherwise, it returns the previous value of Expr.

MaxDistanceFromFirst. plus MaxDistanceBetweenNeighbors.

Examples

The following example calculates session start values for a table, datatable, with a sequence ID column and a Timestamp column to record the time of each record. The data is sorted by the sequence IDs and timestamps and then the example returns values for ID, Timestamp, and a new SessionStarted column. A session can’t exceed one hour. It continues for as long as records are less than five minutes apart and the ID stays the same. The example includes records that are less than five minutes apart.

datatable (ID:string, Timestamp:datetime) [
    "1", datetime(2024-04-11 10:00:00),
    "2", datetime(2024-04-11 10:18:00),
    "1", datetime(2024-04-11 11:00:00),
    "3", datetime(2024-04-11 11:30:00),
    "2", datetime(2024-04-11 13:30:00),
    "2", datetime(2024-04-11 10:16:00)
]
| sort by ID asc, Timestamp asc
| extend SessionStarted = row_window_session(Timestamp, 1h, 5m, ID != prev(ID))

Output

IDTimestampSessionStarted
12024-04-11T10:00:00Z2024-04-11T10:00:00Z
12024-04-11T11:00:00Z2024-04-11T11:00:00Z
22024-04-11T10:16:00Z2024-04-11T10:16:00Z
22024-04-11T10:18:00Z2024-04-11T10:16:00Z
22024-04-11T13:30:00Z2024-04-11T13:30:00Z
32024-04-11T11:30:00Z2024-04-11T11:30:00Z

17.8 - Window functions

Learn how to use window functions on rows in a serialized set.

Window functions operate on multiple rows (records) in a row set at a time. Unlike aggregation functions, window functions require that the rows in the row set be serialized (have a specific order to them). Window functions may depend on the order to determine the result.

Window functions can only be used on serialized sets. The easiest way to serialize a row set is to use the serialize operator. This operator “freezes” the order of rows in an arbitrary manner. If the order of serialized rows is semantically important, use the sort operator to force a particular order.

The serialization process has a non-trivial cost associated with it. For example, it might prevent query parallelism in many scenarios. Therefore, don’t apply serialization unnecessarily. If necessary, rearrange the query to perform serialization on the smallest row set possible.

Serialized row set

An arbitrary row set (such as a table, or the output of a tabular operator) can be serialized in one of the following ways:

  1. By sorting the row set. See below for a list of operators that emit sorted row sets.
  2. By using the serialize operator.

Many tabular operators serialize output whenever the input is already serialized, even if the operator doesn’t itself guarantee that the result is serialized. For example, this property is guaranteed for the extend operator, the project operator, and the where operator.

Operators that emit serialized row sets by sorting

Operators that preserve the serialized row set property

18 - Add a comment in KQL

Learn how to add comments in Kusto Query Language.

Indicates user-provided text. Comments can be inserted on a separate line, nested at the end, or within a KQL query or command. The comment text isn’t evaluated.

Syntax

// comment

Remarks

Use the two slashes (//) to add comments. The following table lists the keyboard shortcuts that you can use to comment or uncomment text.

Hot KeyDescription
Ctrl+K+CComment current line or selected lines.
Ctrl+K+UUncomment current line or selected lines.

Example

This example returns a count of events in the New York state:

// Return the count of events in the New York state from the StormEvents table
StormEvents
| where State == "NEW YORK" // Filter the records where the State is "NEW YORK"
| count

19 - Debug Kusto Query Language inline Python using Visual Studio Code

Learn how to debug Kusto Query Language (KQL) inline Python using Visual Studio Code.

You can embed Python code in Kusto Query Language queries using the python() plugin. The plugin runtime is hosted in a sandbox, an isolated and secure Python environment. The python() plugin capability extends Kusto Query Language native functionalities with the huge archive of OSS Python packages. This extension enables you to run advanced algorithms, such as machine learning, artificial intelligence, statistical, and time series as part of the query.

Prerequisites

Enable Python debugging in Visual Studio Code

  1. In your client application, prefix a query containing inline Python with set query_python_debug;

  2. Run the query.

    • Kusto Explorer: Visual Studio Code is automatically launched with the debug_python.py script.
    • Kusto Web UI:
      1. Download and save debug_python.py, df.txt, and kargs.txt. In window, select Allow. Save files in selected directory.
      2. Right-click debug_python.py and open with Visual Studio Code. The debug_python.py script contains the inline Python code, from the KQL query, prefixed by the template code to initialize the input dataframe from df.txt and the dictionary of parameters from kargs.txt.
  3. In Visual Studio Code, launch the Visual Studio Code debugger: Run > Start Debugging (F5), select Python configuration. The debugger launches and automatically sets a breakpoint to debug the inline code.

  4. In your client application, prefix a query containing inline Python with set query_python_debug;

  5. Run the query.

    • Kusto Explorer: Visual Studio Code is automatically launched with the debug_python.py script.
    • KQL queryset:
      1. Download and save debug_python.py, df.txt, and kargs.txt. In window, select Allow. Save files in selected directory.
      2. Right-click debug_python.py and open with Visual Studio Code. The debug_python.py script contains the inline Python code, from the KQL query, prefixed by the template code to initialize the input dataframe from df.txt and the dictionary of parameters from kargs.txt.
  6. In Visual Studio Code, launch the Visual Studio Code debugger: Run > Start Debugging (F5), select Python configuration. The debugger launches and automatically sets a breakpoint to debug the inline code.

How does inline Python debugging in Visual Studio Code work?

  1. The query is parsed and executed in the server until the required | evaluate python() clause is reached.
  2. The Python sandbox is invoked but instead of running the code, it serializes the input table, the dictionary of parameters, and the code, and sends them back to the client.
  3. These three objects are saved in three files: df.txt, kargs.txt, and debug_python.py in the selected directory (Web UI) or in the client %TEMP% directory (Kusto Explorer).
  4. Visual Studio Code is launched, preloaded with the debug_python.py file that contains a prefix code to initialize df and kargs from their respective files, followed by the Python script embedded in the KQL query.

Query example

  1. Run the following KQL query in your client application:

    range x from 1 to 4 step 1
    | evaluate python(typeof(*, x4:int), 
    'exp = kargs["exp"]\n'
    'result = df\n'
    'result["x4"] = df["x"].pow(exp)\n'
    , bag_pack('exp', 4))
    

    See the resulting table:

    xx4
    11
    216
    381
    4256
  2. Run the same KQL query in your client application using set query_python_debug;:

    set query_python_debug;
    range x from 1 to 4 step 1
    | evaluate python(typeof(*, x4:int), 
    'exp = kargs["exp"]\n'
    'result = df\n'
    'result["x4"] = df["x"].pow(exp)\n'
    , bag_pack('exp', 4))
    
  3. Visual Studio Code is launched:

    launch Visual Studio Code.

  4. Visual Studio Code debugs and prints ‘result’ dataframe in the debug console:

    VS Code debug.

20 - Set timeouts

Learn how to set the query timeout length in various tools, such as Kusto.Explorer and the Azure Data Explorer web UI.

It’s possible to customize the timeout length for your queries and management commands. In this article, you’ll learn how to set a custom timeout in various tools such as the Azure Data Explorer web UI, Kusto.Explorer, Kusto.Cli, Power BI, and when using an SDK. Certain tools have their own default timeout values, but it may be helpful to adjust these values based on the complexity and expected runtime of your queries.

Azure Data Explorer web UI

This section describes how to configure a custom query timeout and admin command timeout in the Azure Data Explorer web UI.

Prerequisites

  • A Microsoft account or a Microsoft Entra user identity. An Azure subscription isn’t required.
  • An Azure Data Explorer cluster and database. Create a cluster and database.

Set timeout length

  1. Sign in to the Azure Data Explorer web UI with your Microsoft account or Microsoft Entra user identity credentials.

  2. In the top menu, select the Settings icon.

  3. From the left menu, select Connection.

  4. Under the Query timeout (in minutes) setting, use the slider to choose the desired query timeout length.

  5. Under the Admin command timeout (in minutes) setting, use the slider to choose the desired admin command timeout length.

    Screenshot of the settings in the Azure Data Explorer web UI that control timeout length.

  6. Close the settings window, and the changes will be saved automatically.

Kusto.Explorer

This section describes how to configure a custom query timeout and admin command timeout in the Kusto.Explorer.

Prerequisites

Set timeout length

  1. Open the Kusto.Explorer tool.

  2. In the top menu, select the Tools tab.

  3. On the right-hand side, select Options.

    Screenshot showing the options widget in the Kusto.Explorer tool.

  4. In the left menu, select Connections.

  5. In the Query Server Timeout setting, enter the desired timeout length. The maximum is 1 hour.

  6. Under the Admin Command Server Timeout setting, enter the desired timeout length. The maximum is 1 hour.

    Screenshot showing settings that control the timeout length in Kusto.Explorer.

  7. Select OK to save the changes.

Kusto.Cli

This section describes how to configure a custom server timeout in the Kusto.Cli.

Prerequisites

Set timeout length

Run the following command to set the servertimeout client request property with the desired timeout length as a valid timespan value up to 1 hour.

Kusto.Cli.exe <ConnectionString> -execute:"#crp servertimeout=<timespan>" -execute:"…"

Alternatively, use the following command to set the norequesttimeout client request property, which will set the timeout to the maximum value of 1 hour.

Kusto.Cli.exe <ConnectionString> -execute:"#crp norequesttimeout=true" -execute:"…"

Once set, the client request property applies to all future values until the app is restarted or another value gets set. To retrieve the current value, use:

Kusto.Cli.exe <ConnectionString> -execute:"#crp servertimeout"

Power BI

This section describes how to configure a custom server timeout in Power BI.

Prerequisites

Set timeout length

  1. Connect to your Azure Data Explorer cluster from Power BI desktop.

  2. In the top menu, select Transform Data.

    Screenshot of the transform data option in Power BI Desktop.

  3. In the top menu, select Advanced Query Editor.

    Screenshot of the Power BI advanced query editor option in Power BI Desktop.

  4. In the pop-up window, set the timeout option in the fourth parameter of the AzureDataExplorer.Contents method. The following example shows how to set a timeout length of 59 minutes.

    let 
        Source = AzureDataExplorer.Contents(<cluster>, <database>, <table>, [Timeout=#duration(0,0,59,0)])
    in
        Source
    
  5. Select Done to apply the changes.

SDKs

To learn how to set timeouts with the SDKs, see Customize query behavior with client request properties.

21 - Syntax conventions for reference documentation

Learn about the syntax conventions for the Kusto Query Language and management command documentation.

This article outlines the syntax conventions followed in the Kusto Query Language (KQL) and management commands reference documentation.

A good place to start learning Kusto Query Language is to understand the overall query structure. The first thing you notice when looking at a Kusto query is the use of the pipe symbol (|). The structure of a Kusto query starts with getting your data from a data source and then passing the data across a pipeline, and each step provides some level of processing and then passes the data to the next step. At the end of the pipeline, you get your final result. In effect, this is our pipeline:

Get Data | Filter | Summarize | Sort | Select

This concept of passing data down the pipeline makes for an intuitive structure, as it’s easy to create a mental picture of your data at each step.

To illustrate this, let’s take a look at the following query, which looks at Microsoft Entra sign-in logs. As you read through each line, you can see the keywords that indicate what’s happening to the data. We’ve included the relevant stage in the pipeline as a comment in each line.

SigninLogs                              // Get data
| evaluate bag_unpack(LocationDetails)  // Ignore this line for now; we'll come back to it at the end.
| where RiskLevelDuringSignIn == 'none' // Filter
   and TimeGenerated >= ago(7d)         // Filter
| summarize Count = count() by city     // Summarize
| sort by Count desc                    // Sort
| take 5                                // Select

Because the output of every step serves as the input for the following step, the order of the steps can determine the query’s results and affect its performance. It’s crucial that you order the steps according to what you want to get out of the query.

Syntax conventions

ConventionDescription
BlockString literals to be entered exactly as shown.
ItalicParameters to be provided a value upon use of the function or command.
[ ]Denotes that the enclosed item is optional.
( )Denotes that at least one of the enclosed items is required.
| (pipe)Used within square or round brackets to denote that you may specify one of the items separated by the pipe character. In this form, the pipe is equivalent to the logical OR operator. When in a block (`
[, …]Indicates that the preceding parameter can be repeated multiple times, separated by commas.
;Query statement terminator.

Examples

Scalar function

This example shows the syntax and an example usage of the hash function, followed by an explanation of how each syntax component translates into the example usage.

Syntax

hash(source [, mod])

Example usage

hash("World")
  • The name of the function, hash, and the opening parenthesis are entered exactly as shown.
  • “World” is passed as an argument for the required source parameter.
  • No argument is passed for the mod parameter, which is optional as indicated by the square brackets.
  • The closing parenthesis is entered exactly as shown.

Tabular operator

This example shows the syntax and an example usage of the sort operator, followed by an explanation of how each syntax component translates into the example usage.

Syntax

T | sort by column [asc | desc] [nulls first | nulls last] [, …]

Example usage

StormEvents
| sort by State asc, StartTime desc
  • The StormEvents table is passed as an argument for the required T parameter.
  • | sort by is entered exactly as shown. In this case, the pipe character is part of the tabular expression statement syntax, as represented by the block text. To learn more, see What is a query statement.
  • The State column is passed as an argument for the required column parameter with the optional asc flag.
  • After a comma, another set of arguments is passed: the StartTime column with the optional desc flag. The [, …] syntax indicates that more argument sets may be passed but aren’t required.

Working with optional parameters

To provide an argument for an optional parameter that comes after another optional parameter, you must provide an argument for the prior parameter. This requirement is because arguments must follow the order specified in the syntax. If you don’t have a specific value to pass for the parameter, use an empty value of the same type.

Example of sequential optional parameters

Consider the syntax for the http_request plugin:

evaluate http_request ( Uri [, RequestHeaders [, Options]] )

RequestHeaders and Options are optional parameters of type dynamic. To provide an argument for the Options parameter, you must also provide an argument for the RequestHeaders parameter. The following example shows how to provide an empty value for the first optional parameter, RequestHeaders, in order to be able to specify a value for the second optional parameter, Options.

evaluate http_request ( "https://contoso.com/", dynamic({}), dynamic({ EmployeeName: Nicole }) )

22 - T-SQL

This article describes T-SQL.

The query editor supports the use of T-SQL in addition to its primary query language, Kusto query language (KQL). While KQL is the recommended query language, T-SQL can be useful for tools that are unable to use KQL.

Query with T-SQL

To run a T-SQL query, begin the query with an empty T-SQL comment line: --. The -- syntax tells the query editor to interpret the following query as T-SQL and not KQL.

Example

--
SELECT * FROM StormEvents

T-SQL to Kusto Query Language

The query editor supports the ability to translate T-SQL queries into KQL. This translation feature can be helpful for users who are familiar with SQL and want to learn more about KQL.

To get the equivalent KQL for a T-SQL SELECT statement, add the keyword explain before the query. The output will be the KQL version of the query, which can be useful for understanding the corresponding KQL syntax and concepts.

Remember to preface T-SQL queries with a T-SQL comment line, --, to tell the query editor to interpret the following query as T-SQL and not KQL.

Example

--
explain
SELECT top(10) *
FROM StormEvents
ORDER BY DamageProperty DESC

Output

StormEvents
| project
    StartTime,
    EndTime,
    EpisodeId,
    EventId,
    State,
    EventType,
    InjuriesDirect,
    InjuriesIndirect,
    DeathsDirect,
    DeathsIndirect,
    DamageProperty,
    DamageCrops,
    Source,
    BeginLocation,
    EndLocation,
    BeginLat,
    BeginLon,
    EndLat,
    EndLon,
    EpisodeNarrative,
    EventNarrative,
    StormSummary
| sort by DamageProperty desc nulls first
| take int(10)

Run stored functions

When using T-SQL, we recommend that you create optimized KQL queries and encapsulate them in stored functions, as doing so minimizes T-SQL code and may increase performance. For example, if you have a stored function as described in the following table, you can execute it as shown in the code example.

NameParametersBodyFolderDocString
MyFunction(myLimit: long){StormEvents | take myLimit}MyFolderDemo function with parameter
SELECT * FROM kusto.MyFunction(10)

Set request properties

Request properties control how a query executes and returns results. To set request properties with T-SQL, preface your query with one or more statements with the following syntax:

Syntax

DECLARE @__kql_set_requestPropertyName type = value;

Parameters

NameTypeRequiredDescription
requestPropertyNamestring✔️The name of the request property to set.
typestring✔️The T-SQL data type of the value.
valuescalar✔️The value to assign to the request property.

Examples

The following table shows examples for how to set request properties with T-SQL.

Request propertyExample
query_datetimescope_toDECLARE @__kql_set_query_datetimescope_to DATETIME = ‘2023-03-31 03:02:01’;
request_app_nameDECLARE @__kql_set_request_app_name NVARCHAR = ‘kuku’;
query_results_cache_max_ageDECLARE @__kql_set_query_results_cache_max_age TIME = ‘00:05:00’;
truncationmaxsizeDECLARE @__kql_set_truncationmaxsize BIGINT = 4294967297;
maxoutputcolumnsDECLARE @__kql_set_maxoutputcolumns INT = 3001;
notruncationDECLARE @__kql_set_notruncation BIT = 1;
norequesttimeoutDECLARE @__kql_set_norequesttimeout BIT = 0;

To set request properties with KQL, see set statement.

Coverage

The query environment offers limited support for T-SQL. The following table outlines the T-SQL statements and features that aren’t supported or are partially supported.

T-SQL statement or featureDescription
CREATE, INSERT, DROP, and ALTERNot supported
Schema or data modificationsNot supported
ANY, ALL, and EXISTSNot supported
WITHIN GROUPNot supported
TOP PERCENTNot supported
TOP WITH TIESEvaluated as regular TOP
TRUNCATEReturns the nearest value
SELECT *Column order may differ from expectation. Use column names if order matters.
AT TIME ZONENot supported
SQL cursorsNot supported
Correlated subqueriesNot supported
Recursive CTEsNot supported
Dynamic statementsNot supported
Flow control statementsOnly IF THEN ELSE statements with an identical schema for THEN and ELSE are supported.
Duplicate column namesNot supported. The original name is preserved for one column.
Data typesData returned may differ in type from SQL Server. For example, TINYINT and SMALLINT have no equivalent in Kusto, and may return as INT32 or INT64 instead of BYTE or INT16.