This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Aggregation functions

1 - Aggregation Functions

Learn how to use aggregation functions to perform calculations on a set of values and return a single value.

An aggregation function performs a calculation on a set of values, and returns a single value. These functions are used in conjunction with the summarize operator. This article lists all available aggregation functions grouped by type. For scalar functions, see Scalar function types.

Binary functions

Function NameDescription
binary_all_and()Returns aggregated value using the binary AND of the group.
binary_all_or()Returns aggregated value using the binary OR of the group.
binary_all_xor()Returns aggregated value using the binary XOR of the group.

Dynamic functions

Function NameDescription
buildschema()Returns the minimal schema that admits all values of the dynamic input.
make_bag(), make_bag_if()Returns a property bag of dynamic values within the group without/with a predicate.
make_list(), make_list_if()Returns a list of all the values within the group without/with a predicate.
make_list_with_nulls()Returns a list of all the values within the group, including null values.
make_set(), make_set_if()Returns a set of distinct values within the group without/with a predicate.

Row selector functions

Function NameDescription
arg_max()Returns one or more expressions when the argument is maximized.
arg_min()Returns one or more expressions when the argument is minimized.
take_any(), take_anyif()Returns a random non-empty value for the group without/with a predicate.

Statistical functions

Function NameDescription
avg()Returns an average value across the group.
avgif()Returns an average value across the group (with predicate).
count(), countif()Returns a count of the group without/with a predicate.
count_distinct(), count_distinctif()Returns a count of unique elements in the group without/with a predicate.
dcount(), dcountif()Returns an approximate distinct count of the group elements without/with a predicate.
hll()Returns the HyperLogLog (HLL) results of the group elements, an intermediate value of the dcount approximation.
hll_if()Returns the HyperLogLog (HLL) results of the group elements, an intermediate value of the dcount approximation (with predicate).
hll_merge()Returns a value for merged HLL results.
max(), maxif()Returns the maximum value across the group without/with a predicate.
min(), minif()Returns the minimum value across the group without/with a predicate.
percentile()Returns a percentile estimation of the group.
percentiles()Returns percentile estimations of the group.
percentiles_array()Returns the percentile approximates of the array.
percentilesw()Returns the weighted percentile approximate of the group.
percentilesw_array()Returns the weighted percentile approximate of the array.
stdev(), stdevif()Returns the standard deviation across the group for a population that is considered a sample without/with a predicate.
stdevp()Returns the standard deviation across the group for a population that is considered representative.
sum(), sumif()Returns the sum of the elements within the group without/with a predicate.
tdigest()Returns an intermediate result for the percentiles approximation, the weighted percentile approximate of the group.
tdigest_merge()Returns the merged tdigest value across the group.
variance(), varianceif()Returns the variance across the group without/with a predicate.
variancep()Returns the variance across the group for a population that is considered representative.

2 - arg_max() (aggregation function)

Learn how to use the arg_max() aggregation function to find a row in a table that maximizes the input expression.

Finds a row in the table that maximizes the specified expression. It returns all columns of the input table or specified columns.

Syntax

arg_max (ExprToMaximize, * | ExprToReturn [, …])

Parameters

NameTypeRequiredDescription
ExprToMaximizestring✔️The expression for which the maximum value is determined.
ExprToReturnstring✔️The expression determines which columns’ values are returned, from the row that has the maximum value for ExprToMaximize. Use a wildcard * to return all columns.

Returns

Returns a row in the table that maximizes the specified expression ExprToMaximize, and the values of columns specified in ExprToReturn.

Examples

Find maximum latitude

The following example finds the maximum latitude of a storm event in each state.

StormEvents 
| summarize arg_max(BeginLat, BeginLocation) by State

Output

The results table displays only the first 10 rows.

StateBeginLatBeginLocation
MISSISSIPPI34.97BARTON
VERMONT45NORTH TROY
AMERICAN SAMOA-14.2OFU
HAWAII22.2113PRINCEVILLE
MINNESOTA49.35ARNESEN
RHODE ISLAND42WOONSOCKET
INDIANA41.73FREMONT
WEST VIRGINIA40.62CHESTER
SOUTH CAROLINA35.18LANDRUM
TEXAS36.4607DARROUZETT

Find last state fatal event

The following example finds the last time an event with a direct death happened in each state, showing all the columns.

The query first filters the events to include only those events where there was at least one direct death. Then the query returns the entire row with the most recent StartTime.

StormEvents
| where DeathsDirect > 0
| summarize arg_max(StartTime, *) by State

Output

The results table displays only the first 10 rows and first three columns.

StateStartTimeEndTime
GUAM2007-01-27T11:15:00Z2007-01-27T11:30:00Z
MASSACHUSETTS2007-02-03T22:00:00Z2007-02-04T10:00:00Z
AMERICAN SAMOA2007-02-17T13:00:00Z2007-02-18T11:00:00Z
IDAHO2007-02-17T13:00:00Z2007-02-17T15:00:00Z
DELAWARE2007-02-25T13:00:00Z2007-02-26T01:00:00Z
WYOMING2007-03-10T17:00:00Z2007-03-10T17:00:00Z
NEW MEXICO2007-03-23T18:42:00Z2007-03-23T19:06:00Z
INDIANA2007-05-15T14:14:00Z2007-05-15T14:14:00Z
MONTANA2007-05-18T14:20:00Z2007-05-18T14:20:00Z
LAKE MICHIGAN2007-06-07T13:00:00Z2007-06-07T13:00:00Z

Handle nulls

The following example demonstrates null handling.

datatable(Fruit: string, Color: string, Version: int) [
    "Apple", "Red", 1,
    "Apple", "Green", int(null),
    "Banana", "Yellow", int(null),
    "Banana", "Green", int(null),
    "Pear", "Brown", 1,
    "Pear", "Green", 2,
]
| summarize arg_max(Version, *) by Fruit

Output

FruitVersionColor
Apple1Red
BananaYellow
Pear2Green

Comparison to max()

The arg_max() function differs from the max() function. The arg_max() function allows you to return other columns along with the maximum value, and max() only returns the maximum value itself.

Examples

arg_max()

Find the last time an event with a direct death happened, showing all the columns in the table.

The query first filters the events to only include events where there was at least one direct death. Then the query returns the entire row with the most recent (maximum) StartTime.

StormEvents
| where DeathsDirect > 0
| summarize arg_max(StartTime, *)

The results table returns all the columns for the row containing the highest value in the expression specified.

| StartTime | EndTime | EpisodeId | EventId | State | EventType | … | |–|–|–|–| | 2007-12-31T15:00:00Z | 2007-12-31T15:00:00 | 12688 | 69700 | UTAH | Avalanche | … |

max()

Find the last time an event with a direct death happened.

The query filters events to only include events where there is at least one direct death, and then returns the maximum value for StartTime.

StormEvents
| where DeathsDirect > 0
| summarize max(StartTime)

The results table returns the maximum value of StartTime, without returning other columns for this record.

max_StartTime
2007-12-31T15:00:00Z

3 - arg_min() (aggregation function)

Learn how to use the arg_min() aggregation function to find a row in a table that minimizes the input expression.

Finds a row in the table that minimizes the specified expression. It returns all columns of the input table or specified columns.

Syntax

arg_min (ExprToMinimize, * | ExprToReturn [, …])

Parameters

NameTypeRequiredDescription
ExprToMinimizestring✔️The expression for which the minimum value is determined.
ExprToReturnstring✔️The expression determines which columns’ values are returned, from the row that has the minimum value for ExprToMinimize. Use a wildcard * to return all columns.

Null handling

When ExprToMinimize is null for all rows in a table, one row in the table is picked. Otherwise, rows where ExprToMinimize is null are ignored.

Returns

Returns a row in the table that minimizes ExprToMinimize, and the values of columns specified in ExprToReturn. Use or * to return the entire row.

Examples

Find the minimum latitude of a storm event in each state.

StormEvents 
| summarize arg_min(BeginLat, BeginLocation) by State

Output

The results table shown includes only the first 10 rows.

StateBeginLatBeginLocation
AMERICAN SAMOA-14.3PAGO PAGO
CALIFORNIA32.5709NESTOR
MINNESOTA43.5BIGELOW
WASHINGTON45.58WASHOUGAL
GEORGIA30.67FARGO
ILLINOIS37CAIRO
FLORIDA24.6611SUGARLOAF KEY
KENTUCKY36.5HAZEL
TEXAS25.92BROWNSVILLE
OHIO38.42SOUTH PT

Find the first time an event with a direct death happened in each state, showing all of the columns.

The query first filters the events to only include those where there was at least one direct death. Then the query returns the entire row with the lowest value for StartTime.

StormEvents
| where DeathsDirect > 0
| summarize arg_min(StartTime, *) by State

Output

The results table shown includes only the first 10 rows and first 3 columns.

StateStartTimeEndTime
INDIANA2007-01-01T00:00:00Z2007-01-22T18:49:00Z
FLORIDA2007-01-03T10:55:00Z2007-01-03T10:55:00Z
NEVADA2007-01-04T09:00:00Z2007-01-05T14:00:00Z
LOUISIANA2007-01-04T15:45:00Z2007-01-04T15:52:00Z
WASHINGTON2007-01-09T17:00:00Z2007-01-09T18:00:00Z
CALIFORNIA2007-01-11T22:00:00Z2007-01-24T10:00:00Z
OKLAHOMA2007-01-12T00:00:00Z2007-01-18T23:59:00Z
MISSOURI2007-01-13T03:00:00Z2007-01-13T08:30:00Z
TEXAS2007-01-13T10:30:00Z2007-01-13T14:30:00Z
ARKANSAS2007-01-14T03:00:00Z2007-01-14T03:00:00Z

The following example demonstrates null handling.

datatable(Fruit: string, Color: string, Version: int) [
    "Apple", "Red", 1,
    "Apple", "Green", int(null),
    "Banana", "Yellow", int(null),
    "Banana", "Green", int(null),
    "Pear", "Brown", 1,
    "Pear", "Green", 2,
]
| summarize arg_min(Version, *) by Fruit

Output

FruitVersionColor
Apple1Red
BananaYellow
Pear1Brown

Comparison to min()

The arg_min() function differs from the min() function. The arg_min() function allows you to return additional columns along with the minimum value, and min() only returns the minimum value itself.

Examples

arg_min()

Find the first time an event with a direct death happened, showing all the columns in the table.

The query first filters the events to only include those where there was at least one direct death. Then the query returns the entire row with the lowest value for StartTime.

StormEvents
| where DeathsDirect > 0
| summarize arg_min(StartTime, *)

The results table returns all the columns for the row containing the lowest value in the expression specified.

| StartTime | EndTime | EpisodeId | EventId | State | EventType | … | |–|–|–|–| | 2007-01-01T00:00:00Z | 2007-01-22T18:49:00Z | 2408 | 11929 | INDIANA | Flood | … |

min()

Find the first time an event with a direct death happened.

The query filters events to only include those where there is at least one direct death, and then returns the minimum value for StartTime.

StormEvents
| where DeathsDirect > 0
| summarize min(StartTime)

The results table returns the lowest value in the specific column only.

min_StartTime
2007-01-01T00:00:00Z

4 - avg() (aggregation function)

Learn how to use the avg() function to calculate the average value of an expression.

Calculates the average (arithmetic mean) of expr across the group.

Syntax

avg(expr)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression used for aggregation calculation. Records with null values are ignored and not included in the calculation.

Returns

Returns the average value of expr across the group.

Example

The following example returns the average number of damaged crops per state.

StormEvents
| summarize AvgDamageToCrops = avg(DamageCrops) by State

The results table shown includes only the first 10 rows.

StateAvgDamageToCrops
TEXAS7524.569241
KANSAS15366.86671
IOWA4332.477535
ILLINOIS44568.00198
MISSOURI340719.2212
GEORGIA490702.5214
MINNESOTA2835.991494
WISCONSIN17764.37838
NEBRASKA21366.36467
NEW YORK5.714285714

5 - avgif() (aggregation function)

Learn how to use the avgif() function to return the average value of an expression where the predicate evaluates to true.

Calculates the average of expr in records for which predicate evaluates to true.

Syntax

avgif (expr, predicate)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression used for aggregation calculation. Records with null values are ignored and not included in the calculation.
predicatestring✔️The predicate that if true, the expr calculated value is added to the average.

Returns

Returns the average value of expr in records where predicate evaluates to true.

Example

The following example calculates the average damage by state in cases where there was any damage.

StormEvents
| summarize Averagedamage=tolong(avg( DamageCrops)),AverageWhenDamage=tolong(avgif(DamageCrops,DamageCrops >0)) by State

Output

The results table shown includes only the first 10 rows.

StateAveragedamageAveragewhendamage
TEXAS7524491291
KANSAS15366695021
IOWA433228203
ILLINOIS445682574757
MISSOURI3407198806281
GEORGIA49070257239005
MINNESOTA2835144175
WISCONSIN17764438188
NEBRASKA21366187726
NEW YORK510000

6 - binary_all_and() (aggregation function)

Learn how to use the binary_all_and() function to aggregate values using the binary AND operation.

Accumulates values using the binary AND operation for each summarization group, or in total if a group isn’t specified.

Syntax

binary_all_and (expr)

Parameters

NameTypeRequiredDescription
exprlong✔️The value used for the binary AND calculation.

Returns

Returns an aggregated value using the binary AND operation over records for each summarization group, or in total if a group isn’t specified.

Example

The following example produces CAFEF00D using binary AND operations:

datatable(num:long)
[
  0xFFFFFFFF, 
  0xFFFFF00F,
  0xCFFFFFFD,
  0xFAFEFFFF,
]
| summarize result = toupper(tohex(binary_all_and(num)))

Output

result
CAFEF00D

7 - binary_all_or() (aggregation function)

Learn how to use the binary_all_or() function to aggregate values using the binary OR operation.

Accumulates values using the binary OR operation for each summarization group, or in total if a group isn’t specified.

Syntax

binary_all_or (expr)

Parameters

NameTypeRequiredDescription
exprlong✔️The value used for the binary OR calculation.

Returns

Returns an aggregated value using the binary OR operation over records for each summarization group, or in total if a group isn’t specified.

Example

The following example produces CAFEF00D using binary OR operations:

datatable(num:long)
[
  0x88888008,
  0x42000000,
  0x00767000,
  0x00000005, 
]
| summarize result = toupper(tohex(binary_all_or(num)))

Output

result
CAFEF00D

8 - binary_all_xor() (aggregation function)

Learn how to use the binary_all_xor() function to aggregate values using the binary XOR operation.

Accumulates values using the binary XOR operation for each summarization group, or in total if a group is not specified.

Syntax

binary_all_xor (expr)

Parameters

NameTypeRequiredDescription
exprlong✔️The value used for the binary XOR calculation.

Returns

Returns a value that is aggregated using the binary XOR operation over records for each summarization group, or in total if a group isn’t specified.

Example

The following example produces CAFEF00D using binary XOR operations:

datatable(num:long)
[
  0x44404440,
  0x1E1E1E1E,
  0x90ABBA09,
  0x000B105A,
]
| summarize result = toupper(tohex(binary_all_xor(num)))

Output

results
CAFEF00D

9 - buildschema() (aggregation function)

Learn how to use the buildschema() function to build a table schema from a dynamic expression.

Builds the minimal schema that admits all values of DynamicExpr.

Syntax

buildschema (DynamicExpr)

Parameters

NameTypeRequiredDescription
DynamicExprdynamic✔️Expression used for the aggregation calculation.

Returns

Returns the minimal schema that admits all values of DynamicExpr.

Example

The following example builds a schema based on:

  • {"x":1, "y":3.5}
  • {"x":"somevalue", "z":[1, 2, 3]}
  • {"y":{"w":"zzz"}, "t":["aa", "bb"], "z":["foo"]}
datatable(value: dynamic) [
    dynamic({"x":1, "y":3.5}),
    dynamic({"x":"somevalue", "z":[1, 2, 3]}),
    dynamic({"y":{"w":"zzz"}, "t":["aa", "bb"], "z":["foo"]})
]
| summarize buildschema(value)

Results

schema_value
{“x”:[“long”,“string”],“y”:[“double”,{“w”:“string”}],“z”:{"indexer":[“long”,“string”]},“t”:{"indexer":“string”}}

Schema breakdown

In the resulting schema:

  • The root object is a container with four properties named x, y, z, and t.
  • Property x is either type long or type string.
  • Property y is either type double or another container with a property w of type string.
  • Property z is an array, indicated by the indexer keyword, where each item can be either type long or type string.
  • Property t is an array, indicated by the indexer keyword, where each item is a string.
  • Every property is implicitly optional, and any array might be empty.

10 - count_distinct() (aggregation function) - (preview)

Learn how to use the count_distinct() (aggregation function) to count unique values specified by a scalar expression per summary group.

Counts unique values specified by the scalar expression per summary group, or the total number of unique values if the summary group is omitted.

If you only need an estimation of unique values count, we recommend using the less resource-consuming dcount aggregation function.

To count only records for which a predicate returns true, use the count_distinctif aggregation function.

Syntax

count_distinct (expr)

Parameters

NameTypeRequiredDescription
exprscalar✔️The expression whose unique values are to be counted.

Returns

Long integer value indicating the number of unique values of expr per summary group.

Example

The following example shows how many types of storm events happened in each state.

Function performance can be degraded when operating on multiple data sources from different clusters.

StormEvents
| summarize UniqueEvents=count_distinct(EventType) by State
| top 5 by UniqueEvents

Output

StateUniqueEvents
TEXAS27
CALIFORNIA26
PENNSYLVANIA25
GEORGIA24
NORTH CAROLINA23

11 - count_distinctif() (aggregation function) - (preview)

Learn how to use the count_distinctif() function to count unique values of a scalar expression in records for which the predicate evaluates to true.

Conditionally counts unique values specified by the scalar expression per summary group, or the total number of unique values if the summary group is omitted. Only records for which predicate evaluates to true are counted.

If you only need an estimation of unique values count, we recommend using the less resource-consuming dcountif aggregation function.

Syntax

count_distinctif (expr, predicate)

Parameters

NameTypeRequiredDescription
exprscalar✔️The expression whose unique values are to be counted.
predicatestring✔️The expression used to filter records to be aggregated.

Returns

Integer value indicating the number of unique values of expr per summary group, for all records for which the predicate evaluates to true.

Example

The following example shows how many types of death-causing storm events happened in each state. Only storm events with a nonzero count of deaths are counted.

StormEvents
| summarize UniqueFatalEvents=count_distinctif(EventType,(DeathsDirect + DeathsIndirect)>0) by State
| where UniqueFatalEvents > 0
| top 5 by UniqueFatalEvents

Output

StateUniqueFatalEvents
TEXAS12
CALIFORNIA12
OKLAHOMA10
NEW YORK9
KANSAS9

12 - count() (aggregation function)

Learn how to use the count() function to count the number of records in a group.

Counts the number of records per summarization group, or total if summarization is done without grouping.

To only count records for which a predicate returns true, use countif().

Syntax

count()

Returns

Returns a count of the records per summarization group, or in total if summarization is done without grouping.

Example

The following example returns a count of events in states:

StormEvents
| summarize Count=count() by State

Output

StateCount
TEXAS4701
KANSAS3166
IOWA2337
ILLINOIS2022
MISSOURI2016
GEORGIA1983
MINNESOTA1881
WISCONSIN1850
NEBRASKA1766
NEW YORK1750

13 - countif() (aggregation function)

Learn how to use the countif() function to count the rows where the predicate evaluates to true.

Counts the rows in which predicate evaluates to true.

Syntax

countif (predicate)

Parameters

NameTypeRequiredDescription
predicatestring✔️The expression used for aggregation calculation. The value can be any scalar expression with a return type of bool.

Returns

Returns a count of rows in which predicate evaluates to true.

Examples

Count storms by state

This example shows the number of storms with damage to crops by state.

StormEvents
| summarize TotalCount=count(),TotalWithDamage=countif(DamageCrops >0) by State

The results table shown includes only the first 10 rows.

StateTotalCountTotalWithDamage
TEXAS470172
KANSAS316670
IOWA2337359
ILLINOIS202235
MISSOURI201678
GEORGIA198317
MINNESOTA188137
WISCONSIN185075
NEBRASKA1766201
NEW YORK17501

Count based on string length

This example shows the number of names with more than four letters.

let T = datatable(name:string, day_of_birth:long)
[
   "John", 9,
   "Paul", 18,
   "George", 25,
   "Ringo", 7
];
T
| summarize countif(strlen(name) > 4)

Output

countif_
2

14 - dcount() (aggregation function)

Learn how to use the dcount() function to return an estimate of the number of distinct values of an expression within a group.

Calculates an estimate of the number of distinct values that are taken by a scalar expression in the summary group.

Syntax

dcount (expr[, accuracy])

Parameters

NameTypeRequiredDescription
exprstring✔️The input whose distinct values are to be counted.
accuracyintThe value that defines the requested estimation accuracy. The default value is 1. See Estimation accuracy for supported values.

Returns

Returns an estimate of the number of distinct values of expr in the group.

Example

This example shows how many types of storm events happened in each state.

StormEvents
| summarize DifferentEvents=dcount(EventType) by State
| order by DifferentEvents

The results table shown includes only the first 10 rows.

StateDifferentEvents
TEXAS27
CALIFORNIA26
PENNSYLVANIA25
GEORGIA24
ILLINOIS23
MARYLAND23
NORTH CAROLINA23
MICHIGAN22
FLORIDA22
OREGON21
KANSAS21

Estimation accuracy

15 - dcountif() (aggregation function)

Learn how to use the dcountif() function to return an estimate of the number of distinct values of an expression for rows where the predicate evaluates to true.

Estimates the number of distinct values of expr for rows in which predicate evaluates to true.

Syntax

dcountif (expr, predicate, [, accuracy])

Parameters

NameTypeRequiredDescription
exprstring✔️The expression used for the aggregation calculation.
predicatestring✔️The expression used to filter rows.
accuracyintThe control between speed and accuracy. If unspecified, the default value is 1. See Estimation accuracy for supported values.

Returns

Returns an estimate of the number of distinct values of expr for rows in which predicate evaluates to true.

Example

This example shows how many types of fatal storm events happened in each state.

StormEvents
| summarize DifferentFatalEvents=dcountif(EventType,(DeathsDirect + DeathsIndirect)>0) by State
| where DifferentFatalEvents > 0
| order by DifferentFatalEvents 

The results table shown includes only the first 10 rows.

StateDifferentFatalEvents
CALIFORNIA12
TEXAS12
OKLAHOMA10
ILLINOIS9
KANSAS9
NEW YORK9
NEW JERSEY7
WASHINGTON7
MICHIGAN7
MISSOURI7

Estimation accuracy

16 - hll_if() (aggregation function)

Learn how to use the hll_if() function to calculate the intermediate results of the dcount() function.

Calculates the intermediate results of dcount in records for which the predicate evaluates to true.

Read about the underlying algorithm (HyperLogLog) and the estimation accuracy.

Syntax

hll_if (expr, predicate [, accuracy])

Parameters

NameTypeRequiredDescription
exprstring✔️The expression used for the aggregation calculation.
predicatestring✔️The Expr used to filter records to add to the intermediate result of dcount.
accuracyintThe value that controls the balance between speed and accuracy. If unspecified, the default value is 1. For supported values, see Estimation accuracy.

Returns

Returns the intermediate results of distinct count of Expr for which Predicate evaluates to true.

Examples

The following query results in the number of unique flood event sources in Iowa and Kansas. It uses the hll_if() function to show only flood events.

StormEvents
| where State in ("IOWA", "KANSAS")
| summarize hll_flood = hll_if(Source, EventType == "Flood") by State
| project State, SourcesOfFloodEvents = dcount_hll(hll_flood)

Output

StateSourcesOfFloodEvents
KANSAS11
IOWA7

Estimation accuracy

AccuracySpeedError (%)
0Fastest1.6
1Balanced0.8
2Slow0.4
3Slow0.28
4Slowest0.2

17 - hll_merge() (aggregation function)

Learn how to use the hll_merge() function to merge HLL results into a single HLL value.

Merges HLL results across the group into a single HLL value.

For more information, see the underlying algorithm (HyperLogLog) and estimation accuracy.

Syntax

hll_merge (hll)

Parameters

NameTypeRequiredDescription
hllstring✔️The column name containing HLL values to merge.

Returns

The function returns the merged HLL values of hll across the group.

Example

The following example shows HLL results across a group merged into a single HLL value.

StormEvents
| summarize hllRes = hll(DamageProperty) by bin(StartTime,10m)
| summarize hllMerged = hll_merge(hllRes)

Output

The results show only the first five results in the array.

hllMerged
[[1024,14],["-6903255281122589438","-7413697181929588220","-2396604341988936699",“5824198135224880646”,"-6257421034880415225", …],[]]

Estimation accuracy

18 - hll() (aggregation function)

Learn how to use the hll() function to calculate the results of the dcount() function.

The hll() function is a way to estimate the number of unique values in a set of values. It does so by calculating intermediate results for aggregation within the summarize operator for a group of data using the dcount function.

Read about the underlying algorithm (HyperLogLog) and the estimation accuracy.

Syntax

hll (expr [, accuracy])

Parameters

NameTypeRequiredDescription
exprstring✔️The expression used for the aggregation calculation.
accuracyintThe value that controls the balance between speed and accuracy. If unspecified, the default value is 1. For supported values, see Estimation accuracy.

Returns

Returns the intermediate results of distinct count of expr across the group.

Example

In the following example, the hll() function is used to estimate the number of unique values of the DamageProperty column within each 10-minute time bin of the StartTime column.

StormEvents
| summarize hll(DamageProperty) by bin(StartTime,10m)

Output

The results table shown includes only the first 10 rows.

StartTimehll_DamageProperty
2007-01-01T00:20:00Z[[1024,14],[“3803688792395291579”],[]]
2007-01-01T01:00:00Z[[1024,14],[“7755241107725382121”,"-5665157283053373866",“3803688792395291579”,"-1003235211361077779"],[]]
2007-01-01T02:00:00Z[[1024,14],["-1003235211361077779","-5665157283053373866",“7755241107725382121”],[]]
2007-01-01T02:20:00Z[[1024,14],[“7755241107725382121”],[]]
2007-01-01T03:30:00Z[[1024,14],[“3803688792395291579”],[]]
2007-01-01T03:40:00Z[[1024,14],["-5665157283053373866"],[]]
2007-01-01T04:30:00Z[[1024,14],[“3803688792395291579”],[]]
2007-01-01T05:30:00Z[[1024,14],[“3803688792395291579”],[]]
2007-01-01T06:30:00Z[[1024,14],[“1589522558235929902”],[]]

Estimation accuracy

19 - make_bag_if() (aggregation function)

Learn how to use the make_bag_if() function to create a dynamic JSON property bag of expression values where the predicate evaluates to true.

Creates a dynamic JSON property bag (dictionary) of expr values in records for which predicate evaluates to true.

Syntax

make_bag_if(expr, predicate [, maxSize])

Parameters

NameTypeRequiredDescription
exprdynamic✔️The expression used for the aggregation calculation.
predicatebool✔️The predicate that evaluates to true, in order for expr to be added to the result.
maxSizeintThe limit on the maximum number of elements returned. The default and max value is 1048576.

Returns

Returns a dynamic JSON property bag (dictionary) of expr values in records for which predicate evaluates to true. Nondictionary values are skipped. If a key appears in more than one row, an arbitrary value, out of the possible values for this key, are selected.

Example

The following example shows a packed JSON property bag.

let T = datatable(prop:string, value:string, predicate:bool)
[
    "prop01", "val_a", true,
    "prop02", "val_b", false,
    "prop03", "val_c", true
];
T
| extend p = bag_pack(prop, value)
| summarize dict=make_bag_if(p, predicate)

Output

dict
{ “prop01”: “val_a”, “prop03”: “val_c” }

Use bag_unpack() plugin for transforming the bag keys in the make_bag_if() output into columns.

let T = datatable(prop:string, value:string, predicate:bool)
[
    "prop01", "val_a", true,
    "prop02", "val_b", false,
    "prop03", "val_c", true
];
T
| extend p = bag_pack(prop, value)
| summarize bag=make_bag_if(p, predicate)
| evaluate bag_unpack(bag)

Output

prop01prop03
val_aval_c

20 - make_bag() (aggregation function)

Learn how to use the make_bag() aggregation function to create a dynamic JSON property bag.

Creates a dynamic JSON property bag (dictionary) of all the values of expr in the group.

Syntax

make_bag (expr [, maxSize])

Parameters

NameTypeRequiredDescription
exprdynamic✔️The expression used for the aggregation calculation.
maxSizeintThe limit on the maximum number of elements returned. The default and max value is 1048576.

Returns

Returns a dynamic JSON property bag (dictionary) of all the values of Expr in the group, which are property bags. Nondictionary values are skipped. If a key appears in more than one row, an arbitrary value, out of the possible values for this key, is selected.

Example

The following example shows a packed JSON property bag.

let T = datatable(prop:string, value:string)
[
    "prop01", "val_a",
    "prop02", "val_b",
    "prop03", "val_c",
];
T
| extend p = bag_pack(prop, value)
| summarize dict=make_bag(p)

Output

dict
{ “prop01”: “val_a”, “prop02”: “val_b”, “prop03”: “val_c” }

Use the bag_unpack() plugin for transforming the bag keys in the make_bag() output into columns.

let T = datatable(prop:string, value:string)
[
    "prop01", "val_a",
    "prop02", "val_b",
    "prop03", "val_c",
];
T
| extend p = bag_pack(prop, value)
| summarize bag=make_bag(p)
| evaluate bag_unpack(bag)

Output

prop01prop02prop03
val_aval_bval_c

21 - make_list_if() (aggregation function)

Learn how to use the make_list_if() aggregation function to create a dynamic JSON object of expression values where the predicate evaluates to true.

Creates a dynamic array of expr values in the group for which predicate evaluates to true.

Syntax

make_list_if(expr, predicate [, maxSize])

Parameters

NameTypeRequiredDescription
exprstring✔️The expression used for the aggregation calculation.
predicatestring✔️A predicate that has to evaluate to true in order for expr to be added to the result.
maxSizeintegerThe maximum number of elements returned. The default and max value is 1048576.

Returns

Returns a dynamic array of expr values in the group for which predicate evaluates to true. If the input to the summarize operator isn’t sorted, the order of elements in the resulting array is undefined. If the input to the summarize operator is sorted, the order of elements in the resulting array tracks that of the input.

Example

The following example shows a list of names with more than 4 letters.

let T = datatable(name:string, day_of_birth:long)
[
   "John", 9,
   "Paul", 18,
   "George", 25,
   "Ringo", 7
];
T
| summarize make_list_if(name, strlen(name) > 4)

Output

list_name
[“George”, “Ringo”]

22 - make_list_with_nulls() (aggregation function)

Learn how to use the make_list_with_nulls() aggregation function to create a dynamic JSON object (array) which includes null values.

Creates a dynamic array of all the values of expr in the group, including null values.

Syntax

make_list_with_nulls(expr)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression that to use to create the array.

Returns

Returns a dynamic JSON object (array) of all the values of expr in the group, including null values. If the input to the summarize operator isn’t sorted, the order of elements in the resulting array is undefined. If the input to the summarize operator is sorted, the order of elements in the resulting array tracks that of the input.

Example

The following example shows null values in the results.

let shapes = datatable (name:string , sideCount: int)
[
    "triangle", int(null),
    "square", 4,
    "rectangle", 4,
    "pentagon", 5,
    "hexagon", 6,
    "heptagon", 7,
    "octagon", 8,
    "nonagon", 9,
    "decagon", 10
];
shapes
| summarize mylist = make_list_with_nulls(sideCount)

Output

mylist
[null,4,4,5,6,7,8,9,10]

23 - make_list() (aggregation function)

Learn how to use the make_list() function to create a dynamic JSON object array of all the values of the expressions in the group.

Creates a dynamic array of all the values of expr in the group.

Syntax

make_list(expr [, maxSize])

Parameters

NameTypeRequiredDescription
exprdynamic✔️The expression used for the aggregation calculation.
maxSizeintThe maximum number of elements returned. The default and max value is 1048576.

Returns

Returns a dynamic array of all the values of expr in the group. If the input to the summarize operator isn’t sorted, the order of elements in the resulting array is undefined. If the input to the summarize operator is sorted, the order of elements in the resulting array tracks that of the input.

Examples

The examples in this section show how to use the syntax to help you get started.

One column

The following example uses the datatable, shapes, to return a list of shapes in a single column.

let shapes = datatable (name: string, sideCount: int)
[
    "triangle", 3,
    "square", 4,
    "rectangle", 4,
    "pentagon", 5,
    "hexagon", 6,
    "heptagon", 7,
    "octagon", 8,
    "nonagon", 9,
    "decagon", 10
];
shapes
| summarize mylist = make_list(name)

Output

mylist
[“triangle”,“square”,“rectangle”,“pentagon”,“hexagon”,“heptagon”,“octagon”,“nonagon”,“decagon”]

Using the ‘by’ clause

The following example uses the make_list function and the by clause to create two lists of objects grouped by whether they have an even or odd number of sides.

let shapes = datatable (name: string, sideCount: int)
[
    "triangle", 3,
    "square", 4,
    "rectangle", 4,
    "pentagon", 5,
    "hexagon", 6,
    "heptagon", 7,
    "octagon", 8,
    "nonagon", 9,
    "decagon", 10
];
shapes
| summarize mylist = make_list(name) by isEvenSideCount = sideCount % 2 == 0

Output

isEvenSideCountmylist
false[“triangle”,“pentagon”,“heptagon”,“nonagon”]
true[“square”,“rectangle”,“hexagon”,“octagon”,“decagon”]

Packing a dynamic object

The following examples show how to pack a dynamic object in a column before making it a list. It returns a column with a boolean table isEvenSideCount indicating whether the side count is even or odd and a mylist column that contains lists of packed bags int each category.

let shapes = datatable (name: string, sideCount: int)
[
    "triangle", 3,
    "square", 4,
    "rectangle", 4,
    "pentagon", 5,
    "hexagon", 6,
    "heptagon", 7,
    "octagon", 8,
    "nonagon", 9,
    "decagon", 10
];
shapes
| extend d = bag_pack("name", name, "sideCount", sideCount)
| summarize mylist = make_list(d) by isEvenSideCount = sideCount % 2 == 0

Output

isEvenSideCountmylist
false[{“name”:“triangle”,“sideCount”:3},{“name”:“pentagon”,“sideCount”:5},{“name”:“heptagon”,“sideCount”:7},{“name”:“nonagon”,“sideCount”:9}]
true[{“name”:“square”,“sideCount”:4},{“name”:“rectangle”,“sideCount”:4},{“name”:“hexagon”,“sideCount”:6},{“name”:“octagon”,“sideCount”:8},{“name”:“decagon”,“sideCount”:10}]

24 - make_set_if() (aggregation function)

Learn how to use the make_set_if() function to create a dynamic JSON object of a set of distinct values that an expression takes where the predicate evaluates to true.

Creates a dynamic array of the set of distinct values that expr takes in records for which predicate evaluates to true.

Syntax

make_set_if(expr, predicate [, maxSize])

Parameters

NameTypeRequiredDescription
exprstring✔️The expression used for the aggregation calculation.
predicatestring✔️A predicate that has to evaluate to true in order for expr to be added to the result.
maxSizeintThe maximum number of elements returned. The default and max value is 1048576.

Returns

Returns a dynamic array of the set of distinct values that expr takes in records for which predicate evaluates to true. The array’s sort order is undefined.

Example

The following example shows a list of names with more than four letters.

let T = datatable(name:string, day_of_birth:long)
[
   "John", 9,
   "Paul", 18,
   "George", 25,
   "Ringo", 7
];
T
| summarize make_set_if(name, strlen(name) > 4)

Output

set_name
[“George”, “Ringo”]

25 - make_set() (aggregation function)

Learn how to use the make_set() function to return a JSON array of the distinct values that the expression takes in the group.

Creates a dynamic array of the set of distinct values that expr takes in the group.

Syntax

make_set(expr [, maxSize])

Parameters

NameTypeRequiredDescription
exprstring✔️The expression used for the aggregation calculation.
maxSizeintThe maximum number of elements returned. The default and max value is 1048576.

Returns

Returns a dynamic array of the set of distinct values that expr takes in the group. The array’s sort order is undefined.

Example

Set from a scalar column

The following example shows the set of states grouped with the same amount of crop damage.

StormEvents 
| summarize states=make_set(State) by DamageCrops

The results table shown includes only the first 10 rows.

DamageCropsstates
0[“NORTH CAROLINA”,“WISCONSIN”,“NEW YORK”,“ALASKA”,“DELAWARE”,“OKLAHOMA”,“INDIANA”,“ILLINOIS”,“MINNESOTA”,“SOUTH DAKOTA”,“TEXAS”,“UTAH”,“COLORADO”,“VERMONT”,“NEW JERSEY”,“VIRGINIA”,“CALIFORNIA”,“PENNSYLVANIA”,“MONTANA”,“WASHINGTON”,“OREGON”,“HAWAII”,“IDAHO”,“PUERTO RICO”,“MICHIGAN”,“FLORIDA”,“WYOMING”,“GULF OF MEXICO”,“NEVADA”,“LOUISIANA”,“TENNESSEE”,“KENTUCKY”,“MISSISSIPPI”,“ALABAMA”,“GEORGIA”,“SOUTH CAROLINA”,“OHIO”,“NEW MEXICO”,“ATLANTIC SOUTH”,“NEW HAMPSHIRE”,“ATLANTIC NORTH”,“NORTH DAKOTA”,“IOWA”,“NEBRASKA”,“WEST VIRGINIA”,“MARYLAND”,“KANSAS”,“MISSOURI”,“ARKANSAS”,“ARIZONA”,“MASSACHUSETTS”,“MAINE”,“CONNECTICUT”,“GUAM”,“HAWAII WATERS”,“AMERICAN SAMOA”,“LAKE HURON”,“DISTRICT OF COLUMBIA”,“RHODE ISLAND”,“LAKE MICHIGAN”,“LAKE SUPERIOR”,“LAKE ST CLAIR”,“LAKE ERIE”,“LAKE ONTARIO”,“E PACIFIC”,“GULF OF ALASKA”]
30000[“TEXAS”,“NEBRASKA”,“IOWA”,“MINNESOTA”,“WISCONSIN”]
4000000[“CALIFORNIA”,“KENTUCKY”,“NORTH DAKOTA”,“WISCONSIN”,“VIRGINIA”]
3000000[“CALIFORNIA”,“ILLINOIS”,“MISSOURI”,“SOUTH CAROLINA”,“NORTH CAROLINA”,“MISSISSIPPI”,“NORTH DAKOTA”,“OHIO”]
14000000[“CALIFORNIA”,“NORTH DAKOTA”]
400000[“CALIFORNIA”,“MISSOURI”,“MISSISSIPPI”,“NEBRASKA”,“WISCONSIN”,“NORTH DAKOTA”]
50000[“CALIFORNIA”,“GEORGIA”,“NEBRASKA”,“TEXAS”,“WEST VIRGINIA”,“KANSAS”,“MISSOURI”,“MISSISSIPPI”,“NEW MEXICO”,“IOWA”,“NORTH DAKOTA”,“OHIO”,“WISCONSIN”,“ILLINOIS”,“MINNESOTA”,“KENTUCKY”]
18000[“WASHINGTON”,“WISCONSIN”]
107900000[“CALIFORNIA”]
28900000[“CALIFORNIA”]

Set from array column

The following example shows the set of elements in an array.

datatable (Val: int, Arr1: dynamic)
[
    1, dynamic(['A1', 'A2', 'A3']), 
    5, dynamic(['A2', 'C1']),
    7, dynamic(['C2', 'A3']),
    5, dynamic(['C2', 'A1'])
] 
| summarize Val_set=make_set(Val), Arr1_set=make_set(Arr1)
Val_setArr1_set
[1,5,7][“A1”,“A2”,“A3”,“C1”,“C2”]

26 - max() (aggregation function)

Learn how to use the max() function to find the maximum value of the expression in the table.

Finds the maximum value of the expression in the table.

Syntax

max(expr)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression for which the maximum value is determined.

Returns

Returns the value in the table that maximizes the specified expression.

Example

The following example returns the last record in a table by querying the maximum value for StartTime.

StormEvents
| summarize LatestEvent=max(StartTime)

Output

LatestEvent
2007-12-31T23:53:00Z

27 - maxif() (aggregation function)

Learn how to use the maxif() function to calculate the maximum value of an expression where the predicate evaluates to true.

Calculates the maximum value of expr in records for which predicate evaluates to true.

See also - max() function, which returns the maximum value across the group without predicate expression.

Syntax

maxif(expr,predicate)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression used for the aggregation calculation.
predicatestring✔️The expression used to filter rows.

Returns

Returns the maximum value of expr in records for which predicate evaluates to true.

Example

This example shows the maximum damage for events with no casualties.

StormEvents
| extend Damage=DamageCrops + DamageProperty, Deaths=DeathsDirect + DeathsIndirect
| summarize MaxDamageNoCasualties=maxif(Damage, Deaths == 0) by State

Output

The results table shown includes only the first 10 rows.

TEXAS25000000
KANSAS37500000
IOWA15000000
ILLINOIS5000000
MISSOURI500005000
GEORGIA344000000
MINNESOTA38390000
WISCONSIN45000000
NEBRASKA4000000
NEW YORK26000000

28 - min() (aggregation function)

Learn how to use the min() function to find the minimum value in a table.

Finds the minimum value of the expression in the table.

Syntax

min (expr)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression for which the minimum value is determined.

Returns

Returns the minimum value of expr across the table.

Example

This example returns the first record in a table.

StormEvents
| summarize FirstEvent=min(StartTime)

Output

FirstEvent
2007-01-01T00:00:00Z

29 - minif() (aggregation function)

Learn how to use the minif() function to return the minimum value of an expression where the predicate evaluates to true.

Returns the minimum of Expr in records for which Predicate evaluates to true.

  • Can be used only in context of aggregation inside summarize

See also - min() function, which returns the minimum value across the group without predicate expression.

Syntax

minif (Expr,Predicate)

Parameters

NameTypeRequiredDescription
Exprstring✔️Expression that will be used for aggregation calculation.
Predicatestring✔️Expression that will be used to filter rows.

Returns

The minimum value of Expr in records for which Predicate evaluates to true.

Example

This example shows the minimum damage for events with casualties (Except 0)

StormEvents
| extend Damage=DamageCrops+DamageProperty, Deaths=DeathsDirect+DeathsIndirect
| summarize MinDamageWithCasualties=minif(Damage,(Deaths >0) and (Damage >0)) by State 
| where MinDamageWithCasualties >0 and isnotnull(MinDamageWithCasualties)

Output

The results table shown includes only the first 10 rows.

StateMinDamageWithCasualties
TEXAS8000
KANSAS5000
IOWA45000
ILLINOIS100000
MISSOURI10000
GEORGIA500000
MINNESOTA200000
WISCONSIN10000
NEW YORK25000
NORTH CAROLINA15000

30 - percentile(), percentiles()

Learn how to use the percentile(), percentiles() functions to calculate estimates for nearest rank percentiles.

The percentile() function calculates an estimate for the specified nearest-rank percentile of the population defined by expr. The accuracy depends on the density of population in the region of the percentile.

percentiles() works similarly to percentile(). However, percentiles() can calculate multiple percentile values at once, which is more efficient than calculating each percentile value separately.

To calculate weighted percentiles, see percentilesw().

Syntax

percentile(expr, percentile)

percentiles(expr, percentiles)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression to use for aggregation calculation.
percentileint or long✔️A constant that specifies the percentile.
percentilesint or long✔️One or more comma-separated percentiles.

Returns

Returns a table with the estimates for expr of the specified percentiles in the group, each in a separate column.

Examples

Calculate single percentile

The following example shows the value of DamageProperty being larger than 95% of the sample set and smaller than 5% of the sample set.

StormEvents | summarize percentile(DamageProperty, 95) by State

Output

The results table shown includes only the first 10 rows.

Statepercentile_DamageProperty_95
ATLANTIC SOUTH0
FLORIDA40000
GEORGIA143333
MISSISSIPPI80000
AMERICAN SAMOA250000
KENTUCKY35000
OHIO150000
KANSAS51392
MICHIGAN49167
ALABAMA50000

Calculate multiple percentiles

The following example shows the value of DamageProperty simultaneously calculated using 5, 50 (median) and 95.

StormEvents | summarize percentiles(DamageProperty, 5, 50, 95) by State

Output

The results table shown includes only the first 10 rows.

Statepercentile_DamageProperty_5percentile_DamageProperty_50percentile_DamageProperty_95
ATLANTIC SOUTH000
FLORIDA0040000
GEORGIA00143333
MISSISSIPPI0080000
AMERICAN SAMOA00250000
KENTUCKY0035000
OHIO02000150000
KANSAS0051392
MICHIGAN0049167
ALABAMA0050000

Return percentiles as an array

Instead of returning the values in individual columns, use the percentiles_array() function to return the percentiles in a single column of dynamic array type.

Syntax

percentiles_array(expr, percentiles)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression to use for aggregation calculation.
percentilesint, long, or dynamic✔️One or more comma-separated percentiles or a dynamic array of percentiles. Each percentile can be an integer or long value.

Returns

Returns an estimate for expr of the specified percentiles in the group as a single column of dynamic array type.

Examples

Comma-separated percentiles

Multiple percentiles can be obtained as an array in a single dynamic column, instead of in multiple columns as with percentiles().

TransformedSensorsData
| summarize percentiles_array(Value, 5, 25, 50, 75, 95), avg(Value) by SensorName

Output

The results table displays only the first 10 rows.

SensorNamepercentiles_Valueavg_Value
sensor-82[“0.048141473520867069”,“0.24407515500271132”,“0.48974511106780577”,“0.74160998970950343”,“0.94587903204190071”]0.493950914
sensor-130[“0.049200214398937764”,“0.25735850440187535”,“0.51206374010048239”,“0.74182335059053839”,“0.95210342463616771”]0.505111463
sensor-56[“0.04857779335488676”,“0.24709868149337144”,“0.49668762923789589”,“0.74458470404241883”,“0.94889104840865857”]0.497955018
sensor-24[“0.051507199150534679”,“0.24803904945640423”,“0.50397070213183581”,“0.75653888126010793”,“0.9518782718727431”]0.501084379
sensor-47[“0.045991246974755672”,“0.24644331118208851”,“0.48089197707088743”,“0.74475142784472248”,“0.9518322864959039”]0.49386228
sensor-135[“0.05132897529660399”,“0.24204987641954018”,“0.48470113942206461”,“0.74275730068433621”,“0.94784079559229406”]0.494817619
sensor-74[“0.048914714739047828”,“0.25160926036445724”,“0.49832498850160978”,“0.75257887767110776”,“0.94932261924236094”]0.501627252
sensor-173[“0.048333149363009836”,“0.26084250046756496”,“0.51288012531934613”,“0.74964772791583412”,“0.95156058795294”]0.505401226
sensor-28[“0.048511161184567046”,“0.2547387968731824”,“0.50101318228599656”,“0.75693845702682039”,“0.95243122486483989”]0.502066244
sensor-34[“0.049980293859462954”,“0.25094722564949412”,“0.50914023067384762”,“0.75571549713447961”,“0.95176564809278674”]0.504309494

Dynamic array of percentiles

Percentiles for percentiles_array can be specified in a dynamic array of integer or floating-point numbers. The array must be constant but doesn’t have to be literal.

TransformedSensorsData
| summarize percentiles_array(Value, dynamic([5, 25, 50, 75, 95])), avg(Value) by SensorName

Output

The results table displays only the first 10 rows.

SensorNamepercentiles_Valueavg_Value
sensor-82[“0.048141473520867069”,“0.24407515500271132”,“0.48974511106780577”,“0.74160998970950343”,“0.94587903204190071”]0.493950914
sensor-130[“0.049200214398937764”,“0.25735850440187535”,“0.51206374010048239”,“0.74182335059053839”,“0.95210342463616771”]0.505111463
sensor-56[“0.04857779335488676”,“0.24709868149337144”,“0.49668762923789589”,“0.74458470404241883”,“0.94889104840865857”]0.497955018
sensor-24[“0.051507199150534679”,“0.24803904945640423”,“0.50397070213183581”,“0.75653888126010793”,“0.9518782718727431”]0.501084379
sensor-47[“0.045991246974755672”,“0.24644331118208851”,“0.48089197707088743”,“0.74475142784472248”,“0.9518322864959039”]0.49386228
sensor-135[“0.05132897529660399”,“0.24204987641954018”,“0.48470113942206461”,“0.74275730068433621”,“0.94784079559229406”]0.494817619
sensor-74[“0.048914714739047828”,“0.25160926036445724”,“0.49832498850160978”,“0.75257887767110776”,“0.94932261924236094”]0.501627252
sensor-173[“0.048333149363009836”,“0.26084250046756496”,“0.51288012531934613”,“0.74964772791583412”,“0.95156058795294”]0.505401226
sensor-28[“0.048511161184567046”,“0.2547387968731824”,“0.50101318228599656”,“0.75693845702682039”,“0.95243122486483989”]0.502066244
sensor-34[“0.049980293859462954”,“0.25094722564949412”,“0.50914023067384762”,“0.75571549713447961”,“0.95176564809278674”]0.504309494

Nearest-rank percentile

P-th percentile (0 < P <= 100) of a list of ordered values, sorted in ascending order, is the smallest value in the list. The P percent of the data is less or equal to P-th percentile value (from Wikipedia article on percentiles).

Define 0-th percentiles to be the smallest member of the population.

Estimation error in percentiles

The percentiles aggregate provides an approximate value using T-Digest.

31 - percentilew(), percentilesw()

Learn how to use the percentilew(), percentilesw() functions to calculate weighted percentiles.

The percentilew() function calculates a weighted estimate for the specified nearest-rank percentile of the population defined by expr. percentilesw() works similarly to percentilew(). However, percentilesw() can calculate multiple weighted percentile values at once, which is more efficient than calculating each weighted percentile value separately.

Weighted percentiles calculate percentiles in a dataset by giving each value in the input dataset a weight. In this method, each value is considered to be repeated a number of times equal to its weight, which is then used to calculate the percentile. By giving more importance to certain values, weighted percentiles provide a way to calculate percentiles in a “weighted” manner.

To calculate unweighted percentiles, see percentiles().

Syntax

percentilew(expr, weightExpr, percentile)

percentilesw(expr, weightExpr, percentiles)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression to use for aggregation calculation.
weightExprlong✔️The weight to give each value.
percentileint or long✔️A constant that specifies the percentile.
percentilesint or long✔️One or more comma-separated percentiles.

Returns

Returns a table with the estimates for expr of the specified percentiles in the group, each in a separate column.

Examples

Calculate weighted percentiles

Assume you repetitively measure the time (Duration) it takes an action to complete. Instead of recording every value of the measurement, you record each value of Duration, rounded to 100 msec, and how many times the rounded value appeared (BucketSize).

Use summarize percentilesw(Duration, BucketSize, ...) to calculate the given percentiles in a “weighted” way. Treat each value of Duration as if it was repeated BucketSize times in the input, without actually needing to materialize those records.

The following example shows weighted percentiles. Using the following set of latency values in milliseconds: { 1, 1, 2, 2, 2, 5, 7, 7, 12, 12, 15, 15, 15, 18, 21, 22, 26, 35 }.

To reduce bandwidth and storage, do pre-aggregation to the following buckets: { 10, 20, 30, 40, 50, 100 }. Count the number of events in each bucket to produce the following table:

let latencyTable = datatable (ReqCount:long, LatencyBucket:long) 
[ 
    8, 10, 
    6, 20, 
    3, 30, 
    1, 40 
];
latencyTable

The table displays:

  • Eight events in the 10-ms bucket (corresponding to subset { 1, 1, 2, 2, 2, 5, 7, 7 })
  • Six events in the 20-ms bucket (corresponding to subset { 12, 12, 15, 15, 15, 18 })
  • Three events in the 30-ms bucket (corresponding to subset { 21, 22, 26 })
  • One event in the 40-ms bucket (corresponding to subset { 35 })

At this point, the original data is no longer available. Only the number of events in each bucket. To compute percentiles from this data, use the percentilesw() function. For the 50, 75, and 99.9 percentiles, use the following query:

let latencyTable = datatable (ReqCount:long, LatencyBucket:long) 
[ 
    8, 10, 
    6, 20, 
    3, 30, 
    1, 40 
];
latencyTable
| summarize percentilesw(LatencyBucket, ReqCount, 50, 75, 99.9)

Output

percentile_LatencyBucket_50percentile_LatencyBucket_75percentile_LatencyBucket_99_9
202040

Return percentiles as an array

Instead of returning the values in individual columns, use the percentilesw_array() function to return the percentiles in a single column of dynamic array type.

Syntax

percentilesw_array(expr, weightExpr, percentiles)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression to use for aggregation calculation.
percentilesint, long, or dynamic✔️One or more comma-separated percentiles or a dynamic array of percentiles. Each percentile can be an integer or long value.
weightExprlong✔️The weight to give each value.

Returns

Returns an estimate for expr of the specified percentiles in the group as a single column of dynamic array type.

Examples

Comma-separated percentiles

let latencyTable = datatable (ReqCount:long, LatencyBucket:long) 
[ 
    8, 10, 
    6, 20, 
    3, 30, 
    1, 40 
];
latencyTable
| summarize percentilesw_array(LatencyBucket, ReqCount, 50, 75, 99.9)

Output

percentile_LatencyBucket
[20, 20, 40]

Dynamic array of percentiles

let latencyTable = datatable (ReqCount:long, LatencyBucket:long) 
[ 
    8, 10, 
    6, 20, 
    3, 30, 
    1, 40 
];
latencyTable
| summarize percentilesw_array(LatencyBucket, ReqCount, dynamic([50, 75, 99.9]))

Output

percentile_LatencyBucket
[20, 20, 40]

32 - stdev() (aggregation function)

Learn how to use the stdev() aggregation function to calculate the standard deviation of an expression using Bessel’s correction.

Calculates the standard deviation of expr across the group, using Bessel’s correction for a small dataset that is considered a sample.

For a large dataset that is representative of the population, use stdevp() (aggregation function).

Formula

This function uses the following formula.

Image showing a Stdev sample formula.

Syntax

stdev(expr)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression used for the standard deviation aggregation calculation.

Returns

Returns the standard deviation value of expr across the group.

Example

The following example shows the standard deviation for the group.

range x from 1 to 5 step 1
| summarize make_list(x), stdev(x)

Output

list_xstdev_x
[ 1, 2, 3, 4, 5]1.58113883008419

33 - stdevif() (aggregation function)

Learn how to use the stdevif() function to calculate the standard deviation of an expression where the predicate evaluates to true.

Calculates the standard deviation of expr in records for which predicate evaluates to true.

Syntax

stdevif(expr,predicate)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression used for the standards deviation aggregation calculation.
predicatestring✔️The predicate that has to evaluate to true in order for expr to be added to the result.

Returns

Returns the standard deviation value of expr in records for which predicate evaluates to true.

Example

The following example shows the standard deviation in a range of 1 to 100.

range x from 1 to 100 step 1
| summarize stdevif(x, x % 2 == 0)

Output

stdevif_x
29.1547594742265

34 - stdevp() (aggregation function)

Learn how to use the stdevp() aggregation function to calculate the standard deviation of an expression.

Calculates the standard deviation of expr across the group, considering the group as a population for a large dataset that is representative of the population.

For a small dataset that is a sample, use stdev() (aggregation function).

Formula

This function uses the following formula.

Image showing a Stdev sample formula.

Syntax

stdevp(expr)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression used for the standards deviation aggregation calculation.

Returns

Returns the standard deviation value of expr across the group.

Example

range x from 1 to 5 step 1
| summarize make_list(x), stdevp(x)

Output

list_xstdevp_x
[ 1, 2, 3, 4, 5]1.4142135623731

35 - sum() (aggregation function)

Learn how to use the sum() (aggregation function) function to calculate the sum of an expression across the group.

Calculates the sum of expr across the group.

Syntax

sum(expr)

Parameters

NameTypeRequiredDescription
expr string✔️The expression used for the aggregation calculation.

Returns

Returns the sum value of expr across the group.

Example

This example returns the total value of crop and property damages by state, and sorted in descending value.

StormEvents 
| summarize EventCount=count(), TotalDamages = sum(DamageCrops+DamageProperty) by State 
| sort by TotalDamages

Output

The results table shown includes only the first 10 rows.

| State | Eventcount | TotalDamages | | —- | — | | CALIFORNIA | 898 | 2801954600 | | GEORGIA | 1983 | 1190448750 | | MISSOURI | 2016 | 1096887450 | | OKLAHOMA | 1716 | 916557300 | | MISSISSIPPI | 1218 | 802890160 | | KANSAS | 3166 | 738830000 | | TEXAS | 4701 | 572086700 | | OHIO | 1233 | 417989500 | | FLORIDA | 1042 | 379455260 | | NORTH DAKOTA | 905 | 342460100 | | … | … | … |

36 - sumif() (aggregation function)

Learn how to use the sumif() (aggregation function) function to calculate the sum of an expression value in records for which the predicate evaluates to true.

Calculates the sum of expr in records for which predicate evaluates to true.

You can also use the sum() function, which sums rows without predicate expression.

Syntax

sumif(expr,predicate)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression used for the aggregation calculation.
predicatestring✔️The expression used to filter rows. If the predicate evaluates to true, the row will be included in the result.

Returns

Returns the sum of expr for which predicate evaluates to true.

Example showing the sum of damages based on no casualty count

This example shows the sum total damage for storms without casualties.

StormEvents
| summarize DamageNoCasualties=sumif((DamageCrops+DamageProperty),(DeathsDirect+DeathsIndirect)==0) by State

Output

The results table shown includes only the first 10 rows.

StateDamageNoCasualties
TEXAS242638700
KANSAS407360000
IOWA135353700
ILLINOIS120394500
MISSOURI1096077450
GEORGIA1077448750
MINNESOTA230407300
WISCONSIN241550000
NEBRASKA70356050
NEW YORK58054000

Example showing the sum of birth dates

This example shows the sum of the birth dates for all names that have more than 4 letters.

let T = datatable(name:string, day_of_birth:long)
[
   "John", 9,
   "Paul", 18,
   "George", 25,
   "Ringo", 7
];
T
| summarize sumif(day_of_birth, strlen(name) > 4)

Output

sumif_day_of_birth
32

37 - take_any() (aggregation function)

Learn how to use the take_any() (aggregation function) to return the value of an arbitrarily selected record.

Arbitrarily chooses one record for each group in a summarize operator, and returns the value of one or more expressions over each such record.

Syntax

take_any(expr_1 [, expr_2 …])

take_any(*)

Parameters

NameTypeRequiredDescription
expr_Nstring✔️The expression used for selecting a record. If the wildcard value (*) is given in place of an expression, all records will be selected.

Returns

The take_any aggregation function returns the values of the expressions calculated for each of the records selected Indeterministically from each group of the summarize operator.

If the * argument is provided, the function behaves as if the expressions are all columns of the input to the summarize operator barring the group-by columns, if any.

Remarks

This function is useful when you want to get a sample value of one or more columns per value of the compound group key.

When the function is provided with a single column reference, it will attempt to return a non-null/non-empty value, if such value is present.

As a result of the indeterministic nature of this function, using this function multiple times in a single application of the summarize operator isn’t equivalent to using this function a single time with multiple expressions. The former may have each application select a different record, while the latter guarantees that all values are calculated over a single record (per distinct group).

Examples

Show indeterministic State:

StormEvents
| summarize take_any(State)

Output

State
ATLANTIC SOUTH

Show all the details for a random record:

StormEvents
| project StartTime, EpisodeId, State, EventType
| summarize take_any(*)

Output

StartTimeEpisodeIdStateEventType
2007-09-29 08:11:00.000000011091ATLANTIC SOUTHWaterspout

Show all the details of a random record for each State starting with ‘A’:

StormEvents
| where State startswith "A"
| project StartTime, EpisodeId, State, EventType
| summarize take_any(*) by State

Output

StateStartTimeEpisodeIdEventType
ALASKA2007-02-01 00:00:00.00000001733Flood
ATLANTIC SOUTH2007-09-29 08:11:00.000000011091Waterspout
ATLANTIC NORTH2007-11-27 00:00:00.000000011523Marine Thunderstorm Wind
ARIZONA2007-12-01 10:40:00.000000011955Flash Flood
AMERICAN SAMOA2007-12-07 14:00:00.000000013183Flash Flood
ARKANSAS2007-12-09 16:00:00.000000011319Lightning
ALABAMA2007-12-15 18:00:00.000000012580Heavy Rain

38 - take_anyif() (aggregation function)

Learn how to use the take_anyif() function to return the value of an arbitrarily selected record for which the predicate is ’true'.

Arbitrarily selects one record for each group in a summarize operator in records for which the predicate is ’true’. The function returns the value of an expression over each such record.

This function is useful when you want to get a sample value of one column per value of the compound group key, subject to some predicate that is true. If such a value is present, the function attempts to return a non-null/non-empty value.

Syntax

take_anyif( expr, predicate )

Parameters

NameTypeRequiredDescription
exprstring✔️The expression used for selecting a record.
predicatestring✔️Indicates which records may be considered for evaluation.

Returns

The take_anyif aggregation function returns the value of the expression calculated for each of the records randomly selected from each group of the summarize operator. Only records for which predicate returns ’true’ may be selected. If the predicate doesn’t return ’true’, a null value is produced.

Examples

Pick a random EventType from Storm events, where event description has a key phrase.

StormEvents
| summarize take_anyif(EventType, EventNarrative has 'strong wind')

Output

EventType
Strong Wind

39 - tdigest_merge() (aggregation functions)

Learn how to use the tdigest_merge() aggregation function to merge tdigest results across the group.

Merges tdigest results across the group.

For more information about the underlying algorithm (T-Digest) and the estimated error, see estimation error in percentiles.

Syntax

tdigest_merge(expr)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression used for the aggregation calculation.

Returns

Returns the merged tdigest values of expr across the group.

Example

StormEvents
| summarize PreAggDamageProperty=tdigest(DamageProperty) by State
| summarize tdigest_merge(PreAggDamageProperty)

Output

merge_tdigests_PreAggDamageProperty
[[7],[91,30,73667,966,110000000,24428,2500,20000,16500000,6292,40000,123208,1000000,133091,90583,20000000,977000,20007,547000,19000000,1221,9600000,300000,70072,55940,75000,417500,1410000,20400000,331500,15000000,62000000,50222,121690000,160400,6200000,252500,450,11000000,2200000,5700000,11566,12000000,263,50000,200000,3700000,13286,171000,100000000,28200000,65000000,17709,30693,16000000,7938,5200,2875,1500000,3480000,151100000,9800000,18200000,21600000,199,2570000,30000000,38000000,72000,891250,500000000,26385,80092,27000000,35000000,754500,11500000,3262500,113945,5000,62429,175294,9071,6500000,3321,15159,21850000,300000000,22683,3000,10000000,60055,600000,52000000,496000,15000,50000000,10140000,11900000,2100000,62600000,77125,310667,70000000,101000000,2088,1608571,19182,400000,179833,775000,612000,150000000,13500000,2600000,1250000,65400,45000000,297000,2500000,40000000,24846,30000,59067,1893,15762,142571,220666,195000,2000000,355000,2275000,6000000,46000000,38264,50857,4002,97333,27750,1000,1111429,7043,272500,455200,503,37500000,10000,1489,0,1200000,110538,60000000,250000,10730,1901429,291000,698750,649000,2716667,137000000,6400000,29286,41051,6850000,102000,4602,80000000,250000000,371667,8000000,729,8120000,5000000,20830,152400,803300,349667,202000,207000,81150000,48000000,750000,26000000,8900000,239143,75000000,248000,14342,74857,5992,500000,150000,938000,10533333,45248,105000000,7000000,35030,4000000,2000,7692500,3000000,25000000,4500000,87222,12054,100000,25000,9771,4840000,28000000,1307143,32024],[19,1,3,32,1,14,45,572,1,51,126,41,101,11,12,8,2,14,4,1,27,1,58,42,20,177,6,4,1,12,10,2,9,1,5,1,2,28,3,6,1,23,4,30,610,145,1,21,4,2,1,1,24,13,1,153,5,4,26,5,1,6,1,1,28,1,5,1,11,4,1,13,44,2,4,2,1,4,9,1672,7,17,47,2,39,17,2,1,17,666,16,71,21,3,1,530,10,1,1,2,1,4,6,4,1,20,7,11,40,6,2,1,1,2,1,3,5,2,1,21,2,13,271,3,14,23,7,15,2,41,1,2,7,1,27,7,205,3,4,1403,7,69,4,10,215,1,1472,127,45756,10,13,1,198,17,7,1,12,7,6,1,1,14,7,2,2,17,1,2,3,2,48,5,21,10,5,10,21,4,5,1,2,39,2,2,7,1,1,22,7,60,175,119,3,3,40,1,8,101,15,1135,4,22,3,3,9,76,430,611,12,1,2,7,8]]

40 - tdigest() (aggregation function)

Learn how to use the tdigest() (aggregation function) function to calculate the intermediate results of the weighted percentiles of expressions across the group.

Calculates the intermediate results of percentiles() across the group.

For more information, see the underlying algorithm (T-Digest) and the estimated error.

Syntax

tdigest(expr [, weight])

Parameters

NameTypeRequiredDescription
exprstring✔️The expression used for the aggregation calculation.
weightstringThe weights of the values for the aggregation calculation.

Returns

The Intermediate results of weighted percentiles of *expr* across the group.

Examples

Results per state

This example shows the results of the tdigest percentiles sorted by state.

StormEvents
| summarize tdigest(DamageProperty) by State

The results table shown includes only the first 10 rows.

Statetdigest_DamageProperty
NEBRASKA[[7],[800,250,300000,5000,240000,1500000,20000,550000,0,75000,100000,1000,10000,30000,13000,2000000,1000000,650000,125000,35000,7000,2500000,4000000,450000,85000,460000,500000,6000,150000,350000,4000,72500,1200000,180000,400000,25000,50000,2000,45000,8000,120000,200000,40000,1200,15000,55000,3000,250000],[5,1,3,72,1,1,44,1,1351,12,24,17,46,13,6,1,2,1,2,6,8,1,1,1,2,1,4,2,6,1,2,2,1,1,2,26,18,12,2,2,1,7,6,4,28,4,6,6]]
MINNESOTA[[7],[700,500,2000000,2500,1200000,12000000,16000,7000000,0,300000,425000,750,6000,30000,10000,22000000,10000000,9600000,600000,50000,4000,27000000,35000000,4000000,400000,5000000,6000000,3000,750000,2500000,2000,250000,11000000,38000000,3000000,20000,120000,1000,100000,5000,500000,1000000,60000,800,15000,200000,1500,1500000,900000],[1,3,1,3,1,2,1,1,1793,1,1,2,2,2,3,1,1,1,2,2,1,1,1,1,2,1,2,1,1,1,6,1,1,1,3,5,1,5,2,5,2,2,1,2,2,2,2,1,1]]
KANSAS[[7],[667,200,6000000,3400,80000,300000,18875,210000,0,45857,750000,37500000,10000,81150000,15000000,6400000,2570000,225000,59400,25000,5000,400000,7000000,4500000,2500000,6500000,200000,4500,70000,122500,2785,12000000,1900000,18200000,150000,1150000,27000000,2000,30000,2000000,250000000,75000,26000,1500,1500000,1000000,2500,100000,21600000,50000,335000,600000,175000,500000,160000,51000,40000,20000,15000,252500,7520,350000,250000,3400000,1000,338000,16000000,106000,4840000,305000,540000,337500,9800000,45000,12500,700000,4000000,71000,30000000,35000,3700000,22000,56000],[12,2,2,5,2,3,8,1,2751,7,2,1,37,1,1,1,1,2,5,12,33,8,1,1,1,2,10,1,5,2,7,1,4,1,5,1,1,9,11,4,1,5,2,6,4,8,2,23,1,44,2,3,2,3,1,1,1,18,5,2,5,1,7,1,25,1,1,3,1,1,1,2,6,1,1,2,1,1,1,3,1,1,1]]
NEW MEXICO[[7],[600,500,2500000,7000,1500,28000,40000,10000,0,500000,20000,1000,21000,70000,25000,3500000,200000,16500000,50000,100000,15000,4000,5000,2000],[1,3,1,1,1,1,1,7,466,1,7,4,1,1,2,1,1,1,1,2,1,4,10,8]]
KENTUCKY[[7],[600,200,700000,5000,400000,12000,15000,100000,0,60000,80000,1000,9000,20000,10000,50000,30000,300000,120000,25000,7000,3000,500000,11500000,75000,35000,8000,6000,150000,1500000,4000,56000,1911,250000,2500000,18000,45000,2000],[6,2,1,42,1,3,9,8,999,2,1,52,1,21,37,25,7,2,3,14,11,35,1,1,6,10,9,10,4,1,13,1,9,3,1,2,1,37]]
VIRGINIA[[7],[536,500,125000,3000,100000,7250,8000,60000,0,40000,50000,956,6000,11500,7000,25000,15000,98000,70000,12000,4000,2000,120000,1000000,45000,16000,5000,3500,75000,175000,2500,30000,1000,80000,300000,10000,20000,1500],[7,11,1,48,2,2,2,1,1025,2,6,9,2,2,1,5,16,1,3,5,12,122,1,1,1,1,64,2,2,1,1,7,209,3,2,42,19,6]]
OREGON[[7],[5000,1000,60000,434000,20000,50000,100000,500000,0,1500000,20400000,6000,62600000],[8,2,1,1,1,1,3,1,401,1,1,1,1]]
ALASKA[[7],[5000,1000,25000,700000,12060,15000,100000,1600000,0,10000],[5,1,1,1,1,2,1,2,242,1]]
CONNECTICUT[[7],[5000,1000,2000000,0,50000,750000,6000],[1,1,1,142,1,1,1]]
NEVADA[[7],[5000,1000,200000,1000000,30000,40000,297000,5000000,0,10000],[4,2,1,1,1,1,1,1,148,3]]

Convert pre-existing centroids

The following example shows how one can convert pre-existing T-Digest centroids for long-term storage. The V column holds the value of each centroid, and the W column is its weight (relative count). The tdigest() aggregate function is then applied to convert the data in table DT into the internal representation, and percentile_tdigest() is used to demonstrate how ot find the 50-tile value.

let DT=datatable(V:real, W:long) [
    1.0, 1,
    2.0, 2
];
DT
| summarize TD=tdigest(V, W)
| project P50=percentile_tdigest(TD, 50)
P50
2

41 - variance() (aggregation function)

Learn how to use the variance() aggregation function to calculate the sample variance of the expression across the group.

Calculates the variance of expr across the group, considering the group as a sample.

The following formula is used:

Image showing a variance sample formula.

Syntax

variance(expr)

Parameters

NameTypeRequiredDescription
exprreal✔️The expression used for the variance calculation.

Returns

Returns the variance value of expr across the group.

Example

range x from 1 to 5 step 1
| summarize make_list(x), variance(x) 

Output

list_xvariance_x
[ 1, 2, 3, 4, 5]2.5

42 - varianceif() (aggregation function)

Learn how to use the varianceif() function to calculate the variance in an expression where the predicate evaluates to true.

Calculates the variance of expr in records for which predicate evaluates to true.

Syntax

varianceif(expr, predicate)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression to use for the variance calculation.
predicatestring✔️If predicate evaluates to true, the expr calculated value will be added to the variance.

Returns

Returns the variance value of expr in records for which predicate evaluates to true.

Example

range x from 1 to 100 step 1
| summarize varianceif(x, x%2 == 0)

Output

varianceif_x
850

43 - variancep() (aggregation function)

Learn how to use the variancep() aggregation function to calculate the population variance of an expression across the group.

Calculates the variance of expr across the group, considering the group as a population.

The following formula is used:

Image showing a variance sample formula.

Syntax

variancep(expr)

Parameters

NameTypeRequiredDescription
exprstring✔️The expression to use for the variance calculation.

Returns

Returns the variance value of expr across the group.

Example

range x from 1 to 5 step 1
| summarize make_list(x), variancep(x) 

Output

list_xvariance_x
[ 1, 2, 3, 4, 5]2