This is the multi-page printable view of this section. Click here to print.
Aggregation functions
- 1: Aggregation Functions
- 2: arg_max() (aggregation function)
- 3: arg_min() (aggregation function)
- 4: avg() (aggregation function)
- 5: avgif() (aggregation function)
- 6: binary_all_and() (aggregation function)
- 7: binary_all_or() (aggregation function)
- 8: binary_all_xor() (aggregation function)
- 9: buildschema() (aggregation function)
- 10: count_distinct() (aggregation function) - (preview)
- 11: count_distinctif() (aggregation function) - (preview)
- 12: count() (aggregation function)
- 13: countif() (aggregation function)
- 14: dcount() (aggregation function)
- 15: dcountif() (aggregation function)
- 16: hll_if() (aggregation function)
- 17: hll_merge() (aggregation function)
- 18: hll() (aggregation function)
- 19: make_bag_if() (aggregation function)
- 20: make_bag() (aggregation function)
- 21: make_list_if() (aggregation function)
- 22: make_list_with_nulls() (aggregation function)
- 23: make_list() (aggregation function)
- 24: make_set_if() (aggregation function)
- 25: make_set() (aggregation function)
- 26: max() (aggregation function)
- 27: maxif() (aggregation function)
- 28: min() (aggregation function)
- 29: minif() (aggregation function)
- 30: percentile(), percentiles()
- 31: percentilew(), percentilesw()
- 32: stdev() (aggregation function)
- 33: stdevif() (aggregation function)
- 34: stdevp() (aggregation function)
- 35: sum() (aggregation function)
- 36: sumif() (aggregation function)
- 37: take_any() (aggregation function)
- 38: take_anyif() (aggregation function)
- 39: tdigest_merge() (aggregation functions)
- 40: tdigest() (aggregation function)
- 41: variance() (aggregation function)
- 42: varianceif() (aggregation function)
- 43: variancep() (aggregation function)
1 - Aggregation Functions
An aggregation function performs a calculation on a set of values, and returns a single value. These functions are used in conjunction with the summarize operator. This article lists all available aggregation functions grouped by type. For scalar functions, see Scalar function types.
Binary functions
Function Name | Description |
---|---|
binary_all_and() | Returns aggregated value using the binary AND of the group. |
binary_all_or() | Returns aggregated value using the binary OR of the group. |
binary_all_xor() | Returns aggregated value using the binary XOR of the group. |
Dynamic functions
Function Name | Description |
---|---|
buildschema() | Returns the minimal schema that admits all values of the dynamic input. |
make_bag(), make_bag_if() | Returns a property bag of dynamic values within the group without/with a predicate. |
make_list(), make_list_if() | Returns a list of all the values within the group without/with a predicate. |
make_list_with_nulls() | Returns a list of all the values within the group, including null values. |
make_set(), make_set_if() | Returns a set of distinct values within the group without/with a predicate. |
Row selector functions
Function Name | Description |
---|---|
arg_max() | Returns one or more expressions when the argument is maximized. |
arg_min() | Returns one or more expressions when the argument is minimized. |
take_any(), take_anyif() | Returns a random non-empty value for the group without/with a predicate. |
Statistical functions
Function Name | Description |
---|---|
avg() | Returns an average value across the group. |
avgif() | Returns an average value across the group (with predicate). |
count(), countif() | Returns a count of the group without/with a predicate. |
count_distinct(), count_distinctif() | Returns a count of unique elements in the group without/with a predicate. |
dcount(), dcountif() | Returns an approximate distinct count of the group elements without/with a predicate. |
hll() | Returns the HyperLogLog (HLL) results of the group elements, an intermediate value of the dcount approximation. |
hll_if() | Returns the HyperLogLog (HLL) results of the group elements, an intermediate value of the dcount approximation (with predicate). |
hll_merge() | Returns a value for merged HLL results. |
max(), maxif() | Returns the maximum value across the group without/with a predicate. |
min(), minif() | Returns the minimum value across the group without/with a predicate. |
percentile() | Returns a percentile estimation of the group. |
percentiles() | Returns percentile estimations of the group. |
percentiles_array() | Returns the percentile approximates of the array. |
percentilesw() | Returns the weighted percentile approximate of the group. |
percentilesw_array() | Returns the weighted percentile approximate of the array. |
stdev(), stdevif() | Returns the standard deviation across the group for a population that is considered a sample without/with a predicate. |
stdevp() | Returns the standard deviation across the group for a population that is considered representative. |
sum(), sumif() | Returns the sum of the elements within the group without/with a predicate. |
tdigest() | Returns an intermediate result for the percentiles approximation, the weighted percentile approximate of the group. |
tdigest_merge() | Returns the merged tdigest value across the group. |
variance(), varianceif() | Returns the variance across the group without/with a predicate. |
variancep() | Returns the variance across the group for a population that is considered representative. |
2 - arg_max() (aggregation function)
Finds a row in the table that maximizes the specified expression. It returns all columns of the input table or specified columns.
Syntax
arg_max
(
ExprToMaximize,
* | ExprToReturn [,
…])
Parameters
Name | Type | Required | Description |
---|---|---|---|
ExprToMaximize | string | ✔️ | The expression for which the maximum value is determined. |
ExprToReturn | string | ✔️ | The expression determines which columns’ values are returned, from the row that has the maximum value for ExprToMaximize. Use a wildcard * to return all columns. |
Returns
Returns a row in the table that maximizes the specified expression ExprToMaximize, and the values of columns specified in ExprToReturn.
Examples
Find maximum latitude
The following example finds the maximum latitude of a storm event in each state.
StormEvents
| summarize arg_max(BeginLat, BeginLocation) by State
Output
The results table displays only the first 10 rows.
State | BeginLat | BeginLocation |
---|---|---|
MISSISSIPPI | 34.97 | BARTON |
VERMONT | 45 | NORTH TROY |
AMERICAN SAMOA | -14.2 | OFU |
HAWAII | 22.2113 | PRINCEVILLE |
MINNESOTA | 49.35 | ARNESEN |
RHODE ISLAND | 42 | WOONSOCKET |
INDIANA | 41.73 | FREMONT |
WEST VIRGINIA | 40.62 | CHESTER |
SOUTH CAROLINA | 35.18 | LANDRUM |
TEXAS | 36.4607 | DARROUZETT |
… | … | … |
Find last state fatal event
The following example finds the last time an event with a direct death happened in each state, showing all the columns.
The query first filters the events to include only those events where there was at least one direct death. Then the query returns the entire row with the most recent StartTime
.
StormEvents
| where DeathsDirect > 0
| summarize arg_max(StartTime, *) by State
Output
The results table displays only the first 10 rows and first three columns.
State | StartTime | EndTime | … |
---|---|---|---|
GUAM | 2007-01-27T11:15:00Z | 2007-01-27T11:30:00Z | … |
MASSACHUSETTS | 2007-02-03T22:00:00Z | 2007-02-04T10:00:00Z | … |
AMERICAN SAMOA | 2007-02-17T13:00:00Z | 2007-02-18T11:00:00Z | … |
IDAHO | 2007-02-17T13:00:00Z | 2007-02-17T15:00:00Z | … |
DELAWARE | 2007-02-25T13:00:00Z | 2007-02-26T01:00:00Z | … |
WYOMING | 2007-03-10T17:00:00Z | 2007-03-10T17:00:00Z | … |
NEW MEXICO | 2007-03-23T18:42:00Z | 2007-03-23T19:06:00Z | … |
INDIANA | 2007-05-15T14:14:00Z | 2007-05-15T14:14:00Z | … |
MONTANA | 2007-05-18T14:20:00Z | 2007-05-18T14:20:00Z | … |
LAKE MICHIGAN | 2007-06-07T13:00:00Z | 2007-06-07T13:00:00Z | … |
… | … | … | … |
Handle nulls
The following example demonstrates null handling.
datatable(Fruit: string, Color: string, Version: int) [
"Apple", "Red", 1,
"Apple", "Green", int(null),
"Banana", "Yellow", int(null),
"Banana", "Green", int(null),
"Pear", "Brown", 1,
"Pear", "Green", 2,
]
| summarize arg_max(Version, *) by Fruit
Output
Fruit | Version | Color |
---|---|---|
Apple | 1 | Red |
Banana | Yellow | |
Pear | 2 | Green |
Comparison to max()
The arg_max() function differs from the max() function. The arg_max() function allows you to return other columns along with the maximum value, and max() only returns the maximum value itself.
Examples
arg_max()
Find the last time an event with a direct death happened, showing all the columns in the table.
The query first filters the events to only include events where there was at least one direct death. Then the query returns the entire row with the most recent (maximum) StartTime.
StormEvents
| where DeathsDirect > 0
| summarize arg_max(StartTime, *)
The results table returns all the columns for the row containing the highest value in the expression specified.
| StartTime | EndTime | EpisodeId | EventId | State | EventType | … | |–|–|–|–| | 2007-12-31T15:00:00Z | 2007-12-31T15:00:00 | 12688 | 69700 | UTAH | Avalanche | … |
max()
Find the last time an event with a direct death happened.
The query filters events to only include events where there is at least one direct death, and then returns the maximum value for StartTime.
StormEvents
| where DeathsDirect > 0
| summarize max(StartTime)
The results table returns the maximum value of StartTime, without returning other columns for this record.
max_StartTime |
---|
2007-12-31T15:00:00Z |
Related content
3 - arg_min() (aggregation function)
Finds a row in the table that minimizes the specified expression. It returns all columns of the input table or specified columns.
Syntax
arg_min
(
ExprToMinimize,
* | ExprToReturn [,
…])
Parameters
Name | Type | Required | Description |
---|---|---|---|
ExprToMinimize | string | ✔️ | The expression for which the minimum value is determined. |
ExprToReturn | string | ✔️ | The expression determines which columns’ values are returned, from the row that has the minimum value for ExprToMinimize. Use a wildcard * to return all columns. |
Null handling
When ExprToMinimize is null for all rows in a table, one row in the table is picked. Otherwise, rows where ExprToMinimize is null are ignored.
Returns
Returns a row in the table that minimizes ExprToMinimize, and the values of columns specified in ExprToReturn. Use or *
to return the entire row.
Examples
Find the minimum latitude of a storm event in each state.
StormEvents
| summarize arg_min(BeginLat, BeginLocation) by State
Output
The results table shown includes only the first 10 rows.
State | BeginLat | BeginLocation |
---|---|---|
AMERICAN SAMOA | -14.3 | PAGO PAGO |
CALIFORNIA | 32.5709 | NESTOR |
MINNESOTA | 43.5 | BIGELOW |
WASHINGTON | 45.58 | WASHOUGAL |
GEORGIA | 30.67 | FARGO |
ILLINOIS | 37 | CAIRO |
FLORIDA | 24.6611 | SUGARLOAF KEY |
KENTUCKY | 36.5 | HAZEL |
TEXAS | 25.92 | BROWNSVILLE |
OHIO | 38.42 | SOUTH PT |
… | … | … |
Find the first time an event with a direct death happened in each state, showing all of the columns.
The query first filters the events to only include those where there was at least one direct death. Then the query returns the entire row with the lowest value for StartTime.
StormEvents
| where DeathsDirect > 0
| summarize arg_min(StartTime, *) by State
Output
The results table shown includes only the first 10 rows and first 3 columns.
State | StartTime | EndTime | … |
---|---|---|---|
INDIANA | 2007-01-01T00:00:00Z | 2007-01-22T18:49:00Z | … |
FLORIDA | 2007-01-03T10:55:00Z | 2007-01-03T10:55:00Z | … |
NEVADA | 2007-01-04T09:00:00Z | 2007-01-05T14:00:00Z | … |
LOUISIANA | 2007-01-04T15:45:00Z | 2007-01-04T15:52:00Z | … |
WASHINGTON | 2007-01-09T17:00:00Z | 2007-01-09T18:00:00Z | … |
CALIFORNIA | 2007-01-11T22:00:00Z | 2007-01-24T10:00:00Z | … |
OKLAHOMA | 2007-01-12T00:00:00Z | 2007-01-18T23:59:00Z | … |
MISSOURI | 2007-01-13T03:00:00Z | 2007-01-13T08:30:00Z | … |
TEXAS | 2007-01-13T10:30:00Z | 2007-01-13T14:30:00Z | … |
ARKANSAS | 2007-01-14T03:00:00Z | 2007-01-14T03:00:00Z | … |
… | … | … | … |
The following example demonstrates null handling.
datatable(Fruit: string, Color: string, Version: int) [
"Apple", "Red", 1,
"Apple", "Green", int(null),
"Banana", "Yellow", int(null),
"Banana", "Green", int(null),
"Pear", "Brown", 1,
"Pear", "Green", 2,
]
| summarize arg_min(Version, *) by Fruit
Output
Fruit | Version | Color |
---|---|---|
Apple | 1 | Red |
Banana | Yellow | |
Pear | 1 | Brown |
Comparison to min()
The arg_min() function differs from the min() function. The arg_min() function allows you to return additional columns along with the minimum value, and min() only returns the minimum value itself.
Examples
arg_min()
Find the first time an event with a direct death happened, showing all the columns in the table.
The query first filters the events to only include those where there was at least one direct death. Then the query returns the entire row with the lowest value for StartTime.
StormEvents
| where DeathsDirect > 0
| summarize arg_min(StartTime, *)
The results table returns all the columns for the row containing the lowest value in the expression specified.
| StartTime | EndTime | EpisodeId | EventId | State | EventType | … | |–|–|–|–| | 2007-01-01T00:00:00Z | 2007-01-22T18:49:00Z | 2408 | 11929 | INDIANA | Flood | … |
min()
Find the first time an event with a direct death happened.
The query filters events to only include those where there is at least one direct death, and then returns the minimum value for StartTime.
StormEvents
| where DeathsDirect > 0
| summarize min(StartTime)
The results table returns the lowest value in the specific column only.
min_StartTime |
---|
2007-01-01T00:00:00Z |
Related content
4 - avg() (aggregation function)
Calculates the average (arithmetic mean) of expr across the group.
Syntax
avg(
expr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression used for aggregation calculation. Records with null values are ignored and not included in the calculation. |
Returns
Returns the average value of expr across the group.
Example
The following example returns the average number of damaged crops per state.
StormEvents
| summarize AvgDamageToCrops = avg(DamageCrops) by State
The results table shown includes only the first 10 rows.
State | AvgDamageToCrops |
---|---|
TEXAS | 7524.569241 |
KANSAS | 15366.86671 |
IOWA | 4332.477535 |
ILLINOIS | 44568.00198 |
MISSOURI | 340719.2212 |
GEORGIA | 490702.5214 |
MINNESOTA | 2835.991494 |
WISCONSIN | 17764.37838 |
NEBRASKA | 21366.36467 |
NEW YORK | 5.714285714 |
… | … |
Related content
5 - avgif() (aggregation function)
Calculates the average of expr in records for which predicate evaluates to true
.
Syntax
avgif
(
expr,
predicate)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression used for aggregation calculation. Records with null values are ignored and not included in the calculation. |
predicate | string | ✔️ | The predicate that if true, the expr calculated value is added to the average. |
Returns
Returns the average value of expr in records where predicate evaluates to true
.
Example
The following example calculates the average damage by state in cases where there was any damage.
StormEvents
| summarize Averagedamage=tolong(avg( DamageCrops)),AverageWhenDamage=tolong(avgif(DamageCrops,DamageCrops >0)) by State
Output
The results table shown includes only the first 10 rows.
State | Averagedamage | Averagewhendamage |
---|---|---|
TEXAS | 7524 | 491291 |
KANSAS | 15366 | 695021 |
IOWA | 4332 | 28203 |
ILLINOIS | 44568 | 2574757 |
MISSOURI | 340719 | 8806281 |
GEORGIA | 490702 | 57239005 |
MINNESOTA | 2835 | 144175 |
WISCONSIN | 17764 | 438188 |
NEBRASKA | 21366 | 187726 |
NEW YORK | 5 | 10000 |
… | … | … |
Related content
6 - binary_all_and() (aggregation function)
Accumulates values using the binary AND
operation for each summarization group, or in total if a group isn’t specified.
Syntax
binary_all_and
(
expr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | long | ✔️ | The value used for the binary AND calculation. |
Returns
Returns an aggregated value using the binary AND
operation over records for each summarization group, or in total if a group isn’t specified.
Example
The following example produces CAFEF00D
using binary AND
operations:
datatable(num:long)
[
0xFFFFFFFF,
0xFFFFF00F,
0xCFFFFFFD,
0xFAFEFFFF,
]
| summarize result = toupper(tohex(binary_all_and(num)))
Output
result |
---|
CAFEF00D |
Related content
7 - binary_all_or() (aggregation function)
Accumulates values using the binary OR
operation for each summarization group, or in total if a group isn’t specified.
Syntax
binary_all_or
(
expr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | long | ✔️ | The value used for the binary OR calculation. |
Returns
Returns an aggregated value using the binary OR
operation over records for each summarization group, or in total if a group isn’t specified.
Example
The following example produces CAFEF00D
using binary OR
operations:
datatable(num:long)
[
0x88888008,
0x42000000,
0x00767000,
0x00000005,
]
| summarize result = toupper(tohex(binary_all_or(num)))
Output
result |
---|
CAFEF00D |
Related content
8 - binary_all_xor() (aggregation function)
Accumulates values using the binary XOR
operation for each summarization group, or in total if a group is not specified.
Syntax
binary_all_xor
(
expr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | long | ✔️ | The value used for the binary XOR calculation. |
Returns
Returns a value that is aggregated using the binary XOR
operation over records for each summarization group, or in total if a group isn’t specified.
Example
The following example produces CAFEF00D
using binary XOR
operations:
datatable(num:long)
[
0x44404440,
0x1E1E1E1E,
0x90ABBA09,
0x000B105A,
]
| summarize result = toupper(tohex(binary_all_xor(num)))
Output
results |
---|
CAFEF00D |
Related content
9 - buildschema() (aggregation function)
Builds the minimal schema that admits all values of DynamicExpr.
Syntax
buildschema
(
DynamicExpr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
DynamicExpr | dynamic | ✔️ | Expression used for the aggregation calculation. |
Returns
Returns the minimal schema that admits all values of DynamicExpr.
Example
The following example builds a schema based on:
{"x":1, "y":3.5}
{"x":"somevalue", "z":[1, 2, 3]}
{"y":{"w":"zzz"}, "t":["aa", "bb"], "z":["foo"]}
datatable(value: dynamic) [
dynamic({"x":1, "y":3.5}),
dynamic({"x":"somevalue", "z":[1, 2, 3]}),
dynamic({"y":{"w":"zzz"}, "t":["aa", "bb"], "z":["foo"]})
]
| summarize buildschema(value)
Results
schema_value |
---|
{“x”:[“long”,“string”],“y”:[“double”,{“w”:“string”}],“z”:{"indexer ":[“long”,“string”]},“t”:{"indexer ":“string”}} |
Schema breakdown
In the resulting schema:
- The root object is a container with four properties named
x
,y
,z
, andt
. - Property
x
is either type long or type string. - Property
y
is either type double or another container with a propertyw
of type string. - Property
z
is an array, indicated by theindexer
keyword, where each item can be either type long or type string. - Property
t
is an array, indicated by theindexer
keyword, where each item is a string. - Every property is implicitly optional, and any array might be empty.
Related content
10 - count_distinct() (aggregation function) - (preview)
Counts unique values specified by the scalar expression per summary group, or the total number of unique values if the summary group is omitted.
If you only need an estimation of unique values count, we recommend using the less resource-consuming dcount
aggregation function.
To count only records for which a predicate returns true
, use the count_distinctif aggregation function.
Syntax
count_distinct
(
expr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | scalar | ✔️ | The expression whose unique values are to be counted. |
Returns
Long integer value indicating the number of unique values of expr per summary group.
Example
The following example shows how many types of storm events happened in each state.
Function performance can be degraded when operating on multiple data sources from different clusters.
StormEvents
| summarize UniqueEvents=count_distinct(EventType) by State
| top 5 by UniqueEvents
Output
State | UniqueEvents |
---|---|
TEXAS | 27 |
CALIFORNIA | 26 |
PENNSYLVANIA | 25 |
GEORGIA | 24 |
NORTH CAROLINA | 23 |
Related content
11 - count_distinctif() (aggregation function) - (preview)
Conditionally counts unique values specified by the scalar expression per summary group, or the total number of unique values if the summary group is omitted. Only records for which predicate evaluates to true
are counted.
If you only need an estimation of unique values count, we recommend using the less resource-consuming dcountif aggregation function.
Syntax
count_distinctif
(
expr,
predicate)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | scalar | ✔️ | The expression whose unique values are to be counted. |
predicate | string | ✔️ | The expression used to filter records to be aggregated. |
Returns
Integer value indicating the number of unique values of expr per summary group, for all records for which the predicate evaluates to true
.
Example
The following example shows how many types of death-causing storm events happened in each state. Only storm events with a nonzero count of deaths are counted.
StormEvents
| summarize UniqueFatalEvents=count_distinctif(EventType,(DeathsDirect + DeathsIndirect)>0) by State
| where UniqueFatalEvents > 0
| top 5 by UniqueFatalEvents
Output
State | UniqueFatalEvents |
---|---|
TEXAS | 12 |
CALIFORNIA | 12 |
OKLAHOMA | 10 |
NEW YORK | 9 |
KANSAS | 9 |
Related content
12 - count() (aggregation function)
Counts the number of records per summarization group, or total if summarization is done without grouping.
To only count records for which a predicate returns true
, use countif().
Syntax
count()
Returns
Returns a count of the records per summarization group, or in total if summarization is done without grouping.
Example
The following example returns a count of events in states:
StormEvents
| summarize Count=count() by State
Output
State | Count |
---|---|
TEXAS | 4701 |
KANSAS | 3166 |
IOWA | 2337 |
ILLINOIS | 2022 |
MISSOURI | 2016 |
GEORGIA | 1983 |
MINNESOTA | 1881 |
WISCONSIN | 1850 |
NEBRASKA | 1766 |
NEW YORK | 1750 |
… | … |
Related content
13 - countif() (aggregation function)
Counts the rows in which predicate evaluates to true
.
Syntax
countif
(
predicate)
Parameters
Name | Type | Required | Description |
---|---|---|---|
predicate | string | ✔️ | The expression used for aggregation calculation. The value can be any scalar expression with a return type of bool. |
Returns
Returns a count of rows in which predicate evaluates to true
.
Examples
Count storms by state
This example shows the number of storms with damage to crops by state.
StormEvents
| summarize TotalCount=count(),TotalWithDamage=countif(DamageCrops >0) by State
The results table shown includes only the first 10 rows.
State | TotalCount | TotalWithDamage |
---|---|---|
TEXAS | 4701 | 72 |
KANSAS | 3166 | 70 |
IOWA | 2337 | 359 |
ILLINOIS | 2022 | 35 |
MISSOURI | 2016 | 78 |
GEORGIA | 1983 | 17 |
MINNESOTA | 1881 | 37 |
WISCONSIN | 1850 | 75 |
NEBRASKA | 1766 | 201 |
NEW YORK | 1750 | 1 |
… | … | … |
Count based on string length
This example shows the number of names with more than four letters.
let T = datatable(name:string, day_of_birth:long)
[
"John", 9,
"Paul", 18,
"George", 25,
"Ringo", 7
];
T
| summarize countif(strlen(name) > 4)
Output
countif_ |
---|
2 |
Related content
14 - dcount() (aggregation function)
Calculates an estimate of the number of distinct values that are taken by a scalar expression in the summary group.
Syntax
dcount
(
expr[,
accuracy])
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The input whose distinct values are to be counted. |
accuracy | int | The value that defines the requested estimation accuracy. The default value is 1 . See Estimation accuracy for supported values. |
Returns
Returns an estimate of the number of distinct values of expr in the group.
Example
This example shows how many types of storm events happened in each state.
StormEvents
| summarize DifferentEvents=dcount(EventType) by State
| order by DifferentEvents
The results table shown includes only the first 10 rows.
State | DifferentEvents |
---|---|
TEXAS | 27 |
CALIFORNIA | 26 |
PENNSYLVANIA | 25 |
GEORGIA | 24 |
ILLINOIS | 23 |
MARYLAND | 23 |
NORTH CAROLINA | 23 |
MICHIGAN | 22 |
FLORIDA | 22 |
OREGON | 21 |
KANSAS | 21 |
… | … |
Estimation accuracy
Related content
15 - dcountif() (aggregation function)
Estimates the number of distinct values of expr for rows in which predicate evaluates to true
.
Syntax
dcountif
(
expr, predicate, [,
accuracy])
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression used for the aggregation calculation. |
predicate | string | ✔️ | The expression used to filter rows. |
accuracy | int | The control between speed and accuracy. If unspecified, the default value is 1 . See Estimation accuracy for supported values. |
Returns
Returns an estimate of the number of distinct values of expr for rows in which predicate evaluates to true
.
Example
This example shows how many types of fatal storm events happened in each state.
StormEvents
| summarize DifferentFatalEvents=dcountif(EventType,(DeathsDirect + DeathsIndirect)>0) by State
| where DifferentFatalEvents > 0
| order by DifferentFatalEvents
The results table shown includes only the first 10 rows.
State | DifferentFatalEvents |
---|---|
CALIFORNIA | 12 |
TEXAS | 12 |
OKLAHOMA | 10 |
ILLINOIS | 9 |
KANSAS | 9 |
NEW YORK | 9 |
NEW JERSEY | 7 |
WASHINGTON | 7 |
MICHIGAN | 7 |
MISSOURI | 7 |
… | … |
Estimation accuracy
Related content
16 - hll_if() (aggregation function)
Calculates the intermediate results of dcount
in records for which the predicate evaluates to true
.
Read about the underlying algorithm (HyperLogLog) and the estimation accuracy.
Syntax
hll_if
(
expr, predicate [,
accuracy])
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression used for the aggregation calculation. |
predicate | string | ✔️ | The Expr used to filter records to add to the intermediate result of dcount . |
accuracy | int | The value that controls the balance between speed and accuracy. If unspecified, the default value is 1 . For supported values, see Estimation accuracy. |
Returns
Returns the intermediate results of distinct count of Expr for which Predicate evaluates to true
.
Examples
The following query results in the number of unique flood event sources in Iowa and Kansas. It uses the hll_if()
function to show only flood events.
StormEvents
| where State in ("IOWA", "KANSAS")
| summarize hll_flood = hll_if(Source, EventType == "Flood") by State
| project State, SourcesOfFloodEvents = dcount_hll(hll_flood)
Output
State | SourcesOfFloodEvents |
---|---|
KANSAS | 11 |
IOWA | 7 |
Estimation accuracy
Accuracy | Speed | Error (%) | |
---|---|---|---|
0 | Fastest | 1.6 | |
1 | Balanced | 0.8 | |
2 | Slow | 0.4 | |
3 | Slow | 0.28 | |
4 | Slowest | 0.2 |
Related content
17 - hll_merge() (aggregation function)
Merges HLL results across the group into a single HLL value.
For more information, see the underlying algorithm (HyperLogLog) and estimation accuracy.
Syntax
hll_merge
(
hll)
Parameters
Name | Type | Required | Description |
---|---|---|---|
hll | string | ✔️ | The column name containing HLL values to merge. |
Returns
The function returns the merged HLL values of hll across the group.
Example
The following example shows HLL results across a group merged into a single HLL value.
StormEvents
| summarize hllRes = hll(DamageProperty) by bin(StartTime,10m)
| summarize hllMerged = hll_merge(hllRes)
Output
The results show only the first five results in the array.
hllMerged |
---|
[[1024,14],["-6903255281122589438","-7413697181929588220","-2396604341988936699",“5824198135224880646”,"-6257421034880415225", …],[]] |
Estimation accuracy
Related content
18 - hll() (aggregation function)
The hll()
function is a way to estimate the number of unique values in a set of values. It does so by calculating intermediate results for aggregation within the summarize operator for a group of data using the dcount
function.
Read about the underlying algorithm (HyperLogLog) and the estimation accuracy.
Syntax
hll
(
expr [,
accuracy])
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression used for the aggregation calculation. |
accuracy | int | The value that controls the balance between speed and accuracy. If unspecified, the default value is 1 . For supported values, see Estimation accuracy. |
Returns
Returns the intermediate results of distinct count of expr across the group.
Example
In the following example, the hll()
function is used to estimate the number of unique values of the DamageProperty
column within each 10-minute time bin of the StartTime
column.
StormEvents
| summarize hll(DamageProperty) by bin(StartTime,10m)
Output
The results table shown includes only the first 10 rows.
StartTime | hll_DamageProperty |
---|---|
2007-01-01T00:20:00Z | [[1024,14],[“3803688792395291579”],[]] |
2007-01-01T01:00:00Z | [[1024,14],[“7755241107725382121”,"-5665157283053373866",“3803688792395291579”,"-1003235211361077779"],[]] |
2007-01-01T02:00:00Z | [[1024,14],["-1003235211361077779","-5665157283053373866",“7755241107725382121”],[]] |
2007-01-01T02:20:00Z | [[1024,14],[“7755241107725382121”],[]] |
2007-01-01T03:30:00Z | [[1024,14],[“3803688792395291579”],[]] |
2007-01-01T03:40:00Z | [[1024,14],["-5665157283053373866"],[]] |
2007-01-01T04:30:00Z | [[1024,14],[“3803688792395291579”],[]] |
2007-01-01T05:30:00Z | [[1024,14],[“3803688792395291579”],[]] |
2007-01-01T06:30:00Z | [[1024,14],[“1589522558235929902”],[]] |
Estimation accuracy
Related content
19 - make_bag_if() (aggregation function)
Creates a dynamic
JSON property bag (dictionary) of expr values in records for which predicate evaluates to true
.
Syntax
make_bag_if(
expr,
predicate [,
maxSize])
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | dynamic | ✔️ | The expression used for the aggregation calculation. |
predicate | bool | ✔️ | The predicate that evaluates to true , in order for expr to be added to the result. |
maxSize | int | The limit on the maximum number of elements returned. The default and max value is 1048576. |
Returns
Returns a dynamic
JSON property bag (dictionary) of expr values in records for which predicate evaluates to true
. Nondictionary values are skipped.
If a key appears in more than one row, an arbitrary value, out of the possible values for this key, are selected.
Example
The following example shows a packed JSON property bag.
let T = datatable(prop:string, value:string, predicate:bool)
[
"prop01", "val_a", true,
"prop02", "val_b", false,
"prop03", "val_c", true
];
T
| extend p = bag_pack(prop, value)
| summarize dict=make_bag_if(p, predicate)
Output
dict |
---|
{ “prop01”: “val_a”, “prop03”: “val_c” } |
Use bag_unpack() plugin for transforming the bag keys in the make_bag_if() output into columns.
let T = datatable(prop:string, value:string, predicate:bool)
[
"prop01", "val_a", true,
"prop02", "val_b", false,
"prop03", "val_c", true
];
T
| extend p = bag_pack(prop, value)
| summarize bag=make_bag_if(p, predicate)
| evaluate bag_unpack(bag)
Output
prop01 | prop03 |
---|---|
val_a | val_c |
Related content
20 - make_bag() (aggregation function)
Creates a dynamic
JSON property bag (dictionary) of all the values of expr in the group.
Syntax
make_bag
(
expr [,
maxSize])
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | dynamic | ✔️ | The expression used for the aggregation calculation. |
maxSize | int | The limit on the maximum number of elements returned. The default and max value is 1048576. |
Returns
Returns a dynamic
JSON property bag (dictionary) of all the values of Expr in the group, which are property bags. Nondictionary values are skipped.
If a key appears in more than one row, an arbitrary value, out of the possible values for this key, is selected.
Example
The following example shows a packed JSON property bag.
let T = datatable(prop:string, value:string)
[
"prop01", "val_a",
"prop02", "val_b",
"prop03", "val_c",
];
T
| extend p = bag_pack(prop, value)
| summarize dict=make_bag(p)
Output
dict |
---|
{ “prop01”: “val_a”, “prop02”: “val_b”, “prop03”: “val_c” } |
Use the bag_unpack() plugin for transforming the bag keys in the make_bag() output into columns.
let T = datatable(prop:string, value:string)
[
"prop01", "val_a",
"prop02", "val_b",
"prop03", "val_c",
];
T
| extend p = bag_pack(prop, value)
| summarize bag=make_bag(p)
| evaluate bag_unpack(bag)
Output
prop01 | prop02 | prop03 |
---|---|---|
val_a | val_b | val_c |
Related content
21 - make_list_if() (aggregation function)
Creates a dynamic
array of expr values in the group for which predicate evaluates to true
.
Syntax
make_list_if(
expr,
predicate [,
maxSize])
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression used for the aggregation calculation. |
predicate | string | ✔️ | A predicate that has to evaluate to true in order for expr to be added to the result. |
maxSize | integer | The maximum number of elements returned. The default and max value is 1048576. |
Returns
Returns a dynamic
array of expr values in the group for which predicate evaluates to true
.
If the input to the summarize
operator isn’t sorted, the order of elements in the resulting array is undefined.
If the input to the summarize
operator is sorted, the order of elements in the resulting array tracks that of the input.
Example
The following example shows a list of names with more than 4 letters.
let T = datatable(name:string, day_of_birth:long)
[
"John", 9,
"Paul", 18,
"George", 25,
"Ringo", 7
];
T
| summarize make_list_if(name, strlen(name) > 4)
Output
list_name |
---|
[“George”, “Ringo”] |
Related content
22 - make_list_with_nulls() (aggregation function)
dynamic
JSON object (array) which includes null values.Creates a dynamic
array of all the values of expr in the group, including null values.
Syntax
make_list_with_nulls(
expr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression that to use to create the array. |
Returns
Returns a dynamic
JSON object (array) of all the values of expr in the group, including null values.
If the input to the summarize
operator isn’t sorted, the order of elements in the resulting array is undefined.
If the input to the summarize
operator is sorted, the order of elements in the resulting array tracks that of the input.
Example
The following example shows null values in the results.
let shapes = datatable (name:string , sideCount: int)
[
"triangle", int(null),
"square", 4,
"rectangle", 4,
"pentagon", 5,
"hexagon", 6,
"heptagon", 7,
"octagon", 8,
"nonagon", 9,
"decagon", 10
];
shapes
| summarize mylist = make_list_with_nulls(sideCount)
Output
mylist |
---|
[null,4,4,5,6,7,8,9,10] |
23 - make_list() (aggregation function)
Creates a dynamic
array of all the values of expr in the group.
Syntax
make_list(
expr [,
maxSize])
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | dynamic | ✔️ | The expression used for the aggregation calculation. |
maxSize | int | The maximum number of elements returned. The default and max value is 1048576. |
Returns
Returns a dynamic
array of all the values of expr in the group.
If the input to the summarize
operator isn’t sorted, the order of elements in the resulting array is undefined.
If the input to the summarize
operator is sorted, the order of elements in the resulting array tracks that of the input.
Examples
The examples in this section show how to use the syntax to help you get started.
One column
The following example uses the datatable, shapes
, to return a list of shapes in a single column.
let shapes = datatable (name: string, sideCount: int)
[
"triangle", 3,
"square", 4,
"rectangle", 4,
"pentagon", 5,
"hexagon", 6,
"heptagon", 7,
"octagon", 8,
"nonagon", 9,
"decagon", 10
];
shapes
| summarize mylist = make_list(name)
Output
mylist |
---|
[“triangle”,“square”,“rectangle”,“pentagon”,“hexagon”,“heptagon”,“octagon”,“nonagon”,“decagon”] |
Using the ‘by’ clause
The following example uses the make_list
function and the by
clause to create two lists of objects grouped by whether they have an even or odd number of sides.
let shapes = datatable (name: string, sideCount: int)
[
"triangle", 3,
"square", 4,
"rectangle", 4,
"pentagon", 5,
"hexagon", 6,
"heptagon", 7,
"octagon", 8,
"nonagon", 9,
"decagon", 10
];
shapes
| summarize mylist = make_list(name) by isEvenSideCount = sideCount % 2 == 0
Output
isEvenSideCount | mylist |
---|---|
false | [“triangle”,“pentagon”,“heptagon”,“nonagon”] |
true | [“square”,“rectangle”,“hexagon”,“octagon”,“decagon”] |
Packing a dynamic object
The following examples show how to pack a dynamic object in a column before making it a list. It returns a column with a boolean table isEvenSideCount
indicating whether the side count is even or odd and a mylist
column that contains lists of packed bags int each category.
let shapes = datatable (name: string, sideCount: int)
[
"triangle", 3,
"square", 4,
"rectangle", 4,
"pentagon", 5,
"hexagon", 6,
"heptagon", 7,
"octagon", 8,
"nonagon", 9,
"decagon", 10
];
shapes
| extend d = bag_pack("name", name, "sideCount", sideCount)
| summarize mylist = make_list(d) by isEvenSideCount = sideCount % 2 == 0
Output
isEvenSideCount | mylist |
---|---|
false | [{“name”:“triangle”,“sideCount”:3},{“name”:“pentagon”,“sideCount”:5},{“name”:“heptagon”,“sideCount”:7},{“name”:“nonagon”,“sideCount”:9}] |
true | [{“name”:“square”,“sideCount”:4},{“name”:“rectangle”,“sideCount”:4},{“name”:“hexagon”,“sideCount”:6},{“name”:“octagon”,“sideCount”:8},{“name”:“decagon”,“sideCount”:10}] |
Related content
24 - make_set_if() (aggregation function)
Creates a dynamic
array of the set of distinct values that expr takes in records for which predicate evaluates to true
.
Syntax
make_set_if(
expr,
predicate [,
maxSize])
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression used for the aggregation calculation. |
predicate | string | ✔️ | A predicate that has to evaluate to true in order for expr to be added to the result. |
maxSize | int | The maximum number of elements returned. The default and max value is 1048576. |
Returns
Returns a dynamic
array of the set of distinct values that expr takes in records for which predicate evaluates to true
. The array’s sort order is undefined.
Example
The following example shows a list of names with more than four letters.
let T = datatable(name:string, day_of_birth:long)
[
"John", 9,
"Paul", 18,
"George", 25,
"Ringo", 7
];
T
| summarize make_set_if(name, strlen(name) > 4)
Output
set_name |
---|
[“George”, “Ringo”] |
Related content
25 - make_set() (aggregation function)
Creates a dynamic
array of the set of distinct values that expr takes in the group.
Syntax
make_set(
expr [,
maxSize])
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression used for the aggregation calculation. |
maxSize | int | The maximum number of elements returned. The default and max value is 1048576. |
Returns
Returns a dynamic
array of the set of distinct values that expr takes in the group.
The array’s sort order is undefined.
Example
Set from a scalar column
The following example shows the set of states grouped with the same amount of crop damage.
StormEvents
| summarize states=make_set(State) by DamageCrops
The results table shown includes only the first 10 rows.
DamageCrops | states |
---|---|
0 | [“NORTH CAROLINA”,“WISCONSIN”,“NEW YORK”,“ALASKA”,“DELAWARE”,“OKLAHOMA”,“INDIANA”,“ILLINOIS”,“MINNESOTA”,“SOUTH DAKOTA”,“TEXAS”,“UTAH”,“COLORADO”,“VERMONT”,“NEW JERSEY”,“VIRGINIA”,“CALIFORNIA”,“PENNSYLVANIA”,“MONTANA”,“WASHINGTON”,“OREGON”,“HAWAII”,“IDAHO”,“PUERTO RICO”,“MICHIGAN”,“FLORIDA”,“WYOMING”,“GULF OF MEXICO”,“NEVADA”,“LOUISIANA”,“TENNESSEE”,“KENTUCKY”,“MISSISSIPPI”,“ALABAMA”,“GEORGIA”,“SOUTH CAROLINA”,“OHIO”,“NEW MEXICO”,“ATLANTIC SOUTH”,“NEW HAMPSHIRE”,“ATLANTIC NORTH”,“NORTH DAKOTA”,“IOWA”,“NEBRASKA”,“WEST VIRGINIA”,“MARYLAND”,“KANSAS”,“MISSOURI”,“ARKANSAS”,“ARIZONA”,“MASSACHUSETTS”,“MAINE”,“CONNECTICUT”,“GUAM”,“HAWAII WATERS”,“AMERICAN SAMOA”,“LAKE HURON”,“DISTRICT OF COLUMBIA”,“RHODE ISLAND”,“LAKE MICHIGAN”,“LAKE SUPERIOR”,“LAKE ST CLAIR”,“LAKE ERIE”,“LAKE ONTARIO”,“E PACIFIC”,“GULF OF ALASKA”] |
30000 | [“TEXAS”,“NEBRASKA”,“IOWA”,“MINNESOTA”,“WISCONSIN”] |
4000000 | [“CALIFORNIA”,“KENTUCKY”,“NORTH DAKOTA”,“WISCONSIN”,“VIRGINIA”] |
3000000 | [“CALIFORNIA”,“ILLINOIS”,“MISSOURI”,“SOUTH CAROLINA”,“NORTH CAROLINA”,“MISSISSIPPI”,“NORTH DAKOTA”,“OHIO”] |
14000000 | [“CALIFORNIA”,“NORTH DAKOTA”] |
400000 | [“CALIFORNIA”,“MISSOURI”,“MISSISSIPPI”,“NEBRASKA”,“WISCONSIN”,“NORTH DAKOTA”] |
50000 | [“CALIFORNIA”,“GEORGIA”,“NEBRASKA”,“TEXAS”,“WEST VIRGINIA”,“KANSAS”,“MISSOURI”,“MISSISSIPPI”,“NEW MEXICO”,“IOWA”,“NORTH DAKOTA”,“OHIO”,“WISCONSIN”,“ILLINOIS”,“MINNESOTA”,“KENTUCKY”] |
18000 | [“WASHINGTON”,“WISCONSIN”] |
107900000 | [“CALIFORNIA”] |
28900000 | [“CALIFORNIA”] |
Set from array column
The following example shows the set of elements in an array.
datatable (Val: int, Arr1: dynamic)
[
1, dynamic(['A1', 'A2', 'A3']),
5, dynamic(['A2', 'C1']),
7, dynamic(['C2', 'A3']),
5, dynamic(['C2', 'A1'])
]
| summarize Val_set=make_set(Val), Arr1_set=make_set(Arr1)
Val_set | Arr1_set |
---|---|
[1,5,7] | [“A1”,“A2”,“A3”,“C1”,“C2”] |
Related content
26 - max() (aggregation function)
Finds the maximum value of the expression in the table.
Syntax
max(
expr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression for which the maximum value is determined. |
Returns
Returns the value in the table that maximizes the specified expression.
Example
The following example returns the last record in a table by querying the maximum value for StartTime.
StormEvents
| summarize LatestEvent=max(StartTime)
Output
LatestEvent |
---|
2007-12-31T23:53:00Z |
Related content
27 - maxif() (aggregation function)
Calculates the maximum value of expr in records for which predicate evaluates to true
.
See also - max() function, which returns the maximum value across the group without predicate expression.
Syntax
maxif(
expr,
predicate)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression used for the aggregation calculation. |
predicate | string | ✔️ | The expression used to filter rows. |
Returns
Returns the maximum value of expr in records for which predicate evaluates to true
.
Example
This example shows the maximum damage for events with no casualties.
StormEvents
| extend Damage=DamageCrops + DamageProperty, Deaths=DeathsDirect + DeathsIndirect
| summarize MaxDamageNoCasualties=maxif(Damage, Deaths == 0) by State
Output
The results table shown includes only the first 10 rows.
– | – |
---|---|
TEXAS | 25000000 |
KANSAS | 37500000 |
IOWA | 15000000 |
ILLINOIS | 5000000 |
MISSOURI | 500005000 |
GEORGIA | 344000000 |
MINNESOTA | 38390000 |
WISCONSIN | 45000000 |
NEBRASKA | 4000000 |
NEW YORK | 26000000 |
… | … |
Related content
28 - min() (aggregation function)
Finds the minimum value of the expression in the table.
Syntax
min
(
expr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression for which the minimum value is determined. |
Returns
Returns the minimum value of expr across the table.
Example
This example returns the first record in a table.
StormEvents
| summarize FirstEvent=min(StartTime)
Output
FirstEvent |
---|
2007-01-01T00:00:00Z |
Related content
29 - minif() (aggregation function)
Returns the minimum of Expr in records for which Predicate evaluates to true
.
- Can be used only in context of aggregation inside summarize
See also - min() function, which returns the minimum value across the group without predicate expression.
Syntax
minif
(
Expr,
Predicate)
Parameters
Name | Type | Required | Description |
---|---|---|---|
Expr | string | ✔️ | Expression that will be used for aggregation calculation. |
Predicate | string | ✔️ | Expression that will be used to filter rows. |
Returns
The minimum value of Expr in records for which Predicate evaluates to true
.
Example
This example shows the minimum damage for events with casualties (Except 0)
StormEvents
| extend Damage=DamageCrops+DamageProperty, Deaths=DeathsDirect+DeathsIndirect
| summarize MinDamageWithCasualties=minif(Damage,(Deaths >0) and (Damage >0)) by State
| where MinDamageWithCasualties >0 and isnotnull(MinDamageWithCasualties)
Output
The results table shown includes only the first 10 rows.
State | MinDamageWithCasualties |
---|---|
TEXAS | 8000 |
KANSAS | 5000 |
IOWA | 45000 |
ILLINOIS | 100000 |
MISSOURI | 10000 |
GEORGIA | 500000 |
MINNESOTA | 200000 |
WISCONSIN | 10000 |
NEW YORK | 25000 |
NORTH CAROLINA | 15000 |
… | … |
Related content
30 - percentile(), percentiles()
The percentile()
function calculates an estimate for the specified nearest-rank percentile of the population defined by expr.
The accuracy depends on the density of population in the region of the percentile.
percentiles()
works similarly to percentile()
. However, percentiles()
can calculate multiple percentile values at once, which is more efficient than calculating each percentile value separately.
To calculate weighted percentiles, see percentilesw().
Syntax
percentile(
expr,
percentile)
percentiles(
expr,
percentiles)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression to use for aggregation calculation. |
percentile | int or long | ✔️ | A constant that specifies the percentile. |
percentiles | int or long | ✔️ | One or more comma-separated percentiles. |
Returns
Returns a table with the estimates for expr of the specified percentiles in the group, each in a separate column.
Examples
Calculate single percentile
The following example shows the value of DamageProperty
being larger than 95% of the sample set and smaller than 5% of the sample set.
StormEvents | summarize percentile(DamageProperty, 95) by State
Output
The results table shown includes only the first 10 rows.
State | percentile_DamageProperty_95 |
---|---|
ATLANTIC SOUTH | 0 |
FLORIDA | 40000 |
GEORGIA | 143333 |
MISSISSIPPI | 80000 |
AMERICAN SAMOA | 250000 |
KENTUCKY | 35000 |
OHIO | 150000 |
KANSAS | 51392 |
MICHIGAN | 49167 |
ALABAMA | 50000 |
Calculate multiple percentiles
The following example shows the value of DamageProperty
simultaneously calculated using 5, 50 (median) and 95.
StormEvents | summarize percentiles(DamageProperty, 5, 50, 95) by State
Output
The results table shown includes only the first 10 rows.
State | percentile_DamageProperty_5 | percentile_DamageProperty_50 | percentile_DamageProperty_95 |
---|---|---|---|
ATLANTIC SOUTH | 0 | 0 | 0 |
FLORIDA | 0 | 0 | 40000 |
GEORGIA | 0 | 0 | 143333 |
MISSISSIPPI | 0 | 0 | 80000 |
AMERICAN SAMOA | 0 | 0 | 250000 |
KENTUCKY | 0 | 0 | 35000 |
OHIO | 0 | 2000 | 150000 |
KANSAS | 0 | 0 | 51392 |
MICHIGAN | 0 | 0 | 49167 |
ALABAMA | 0 | 0 | 50000 |
… | … |
Return percentiles as an array
Instead of returning the values in individual columns, use the percentiles_array()
function to return the percentiles in a single column of dynamic array type.
Syntax
percentiles_array(
expr,
percentiles)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression to use for aggregation calculation. |
percentiles | int, long, or dynamic | ✔️ | One or more comma-separated percentiles or a dynamic array of percentiles. Each percentile can be an integer or long value. |
Returns
Returns an estimate for expr of the specified percentiles in the group as a single column of dynamic array type.
Examples
Comma-separated percentiles
Multiple percentiles can be obtained as an array in a single dynamic column, instead of in multiple columns as with percentiles().
TransformedSensorsData
| summarize percentiles_array(Value, 5, 25, 50, 75, 95), avg(Value) by SensorName
Output
The results table displays only the first 10 rows.
SensorName | percentiles_Value | avg_Value |
---|---|---|
sensor-82 | [“0.048141473520867069”,“0.24407515500271132”,“0.48974511106780577”,“0.74160998970950343”,“0.94587903204190071”] | 0.493950914 |
sensor-130 | [“0.049200214398937764”,“0.25735850440187535”,“0.51206374010048239”,“0.74182335059053839”,“0.95210342463616771”] | 0.505111463 |
sensor-56 | [“0.04857779335488676”,“0.24709868149337144”,“0.49668762923789589”,“0.74458470404241883”,“0.94889104840865857”] | 0.497955018 |
sensor-24 | [“0.051507199150534679”,“0.24803904945640423”,“0.50397070213183581”,“0.75653888126010793”,“0.9518782718727431”] | 0.501084379 |
sensor-47 | [“0.045991246974755672”,“0.24644331118208851”,“0.48089197707088743”,“0.74475142784472248”,“0.9518322864959039”] | 0.49386228 |
sensor-135 | [“0.05132897529660399”,“0.24204987641954018”,“0.48470113942206461”,“0.74275730068433621”,“0.94784079559229406”] | 0.494817619 |
sensor-74 | [“0.048914714739047828”,“0.25160926036445724”,“0.49832498850160978”,“0.75257887767110776”,“0.94932261924236094”] | 0.501627252 |
sensor-173 | [“0.048333149363009836”,“0.26084250046756496”,“0.51288012531934613”,“0.74964772791583412”,“0.95156058795294”] | 0.505401226 |
sensor-28 | [“0.048511161184567046”,“0.2547387968731824”,“0.50101318228599656”,“0.75693845702682039”,“0.95243122486483989”] | 0.502066244 |
sensor-34 | [“0.049980293859462954”,“0.25094722564949412”,“0.50914023067384762”,“0.75571549713447961”,“0.95176564809278674”] | 0.504309494 |
… | … | … |
Dynamic array of percentiles
Percentiles for percentiles_array
can be specified in a dynamic array of integer or floating-point numbers. The array must be constant but doesn’t have to be literal.
TransformedSensorsData
| summarize percentiles_array(Value, dynamic([5, 25, 50, 75, 95])), avg(Value) by SensorName
Output
The results table displays only the first 10 rows.
SensorName | percentiles_Value | avg_Value |
---|---|---|
sensor-82 | [“0.048141473520867069”,“0.24407515500271132”,“0.48974511106780577”,“0.74160998970950343”,“0.94587903204190071”] | 0.493950914 |
sensor-130 | [“0.049200214398937764”,“0.25735850440187535”,“0.51206374010048239”,“0.74182335059053839”,“0.95210342463616771”] | 0.505111463 |
sensor-56 | [“0.04857779335488676”,“0.24709868149337144”,“0.49668762923789589”,“0.74458470404241883”,“0.94889104840865857”] | 0.497955018 |
sensor-24 | [“0.051507199150534679”,“0.24803904945640423”,“0.50397070213183581”,“0.75653888126010793”,“0.9518782718727431”] | 0.501084379 |
sensor-47 | [“0.045991246974755672”,“0.24644331118208851”,“0.48089197707088743”,“0.74475142784472248”,“0.9518322864959039”] | 0.49386228 |
sensor-135 | [“0.05132897529660399”,“0.24204987641954018”,“0.48470113942206461”,“0.74275730068433621”,“0.94784079559229406”] | 0.494817619 |
sensor-74 | [“0.048914714739047828”,“0.25160926036445724”,“0.49832498850160978”,“0.75257887767110776”,“0.94932261924236094”] | 0.501627252 |
sensor-173 | [“0.048333149363009836”,“0.26084250046756496”,“0.51288012531934613”,“0.74964772791583412”,“0.95156058795294”] | 0.505401226 |
sensor-28 | [“0.048511161184567046”,“0.2547387968731824”,“0.50101318228599656”,“0.75693845702682039”,“0.95243122486483989”] | 0.502066244 |
sensor-34 | [“0.049980293859462954”,“0.25094722564949412”,“0.50914023067384762”,“0.75571549713447961”,“0.95176564809278674”] | 0.504309494 |
… | … | … |
Nearest-rank percentile
P-th percentile (0 < P <= 100) of a list of ordered values, sorted in ascending order, is the smallest value in the list. The P percent of the data is less or equal to P-th percentile value (from Wikipedia article on percentiles).
Define 0-th percentiles to be the smallest member of the population.
Estimation error in percentiles
The percentiles aggregate provides an approximate value using T-Digest.
Related content
31 - percentilew(), percentilesw()
The percentilew()
function calculates a weighted estimate for the specified nearest-rank percentile of the population defined by expr. percentilesw()
works similarly to percentilew()
. However, percentilesw()
can calculate multiple weighted percentile values at once, which is more efficient than calculating each weighted percentile value separately.
Weighted percentiles calculate percentiles in a dataset by giving each value in the input dataset a weight. In this method, each value is considered to be repeated a number of times equal to its weight, which is then used to calculate the percentile. By giving more importance to certain values, weighted percentiles provide a way to calculate percentiles in a “weighted” manner.
To calculate unweighted percentiles, see percentiles().
Syntax
percentilew(
expr,
weightExpr,
percentile)
percentilesw(
expr,
weightExpr,
percentiles)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression to use for aggregation calculation. |
weightExpr | long | ✔️ | The weight to give each value. |
percentile | int or long | ✔️ | A constant that specifies the percentile. |
percentiles | int or long | ✔️ | One or more comma-separated percentiles. |
Returns
Returns a table with the estimates for expr of the specified percentiles in the group, each in a separate column.
Examples
Calculate weighted percentiles
Assume you repetitively measure the time (Duration) it takes an action to complete. Instead of recording every value of the measurement, you record each value of Duration, rounded to 100 msec, and how many times the rounded value appeared (BucketSize).
Use summarize percentilesw(Duration, BucketSize, ...)
to calculate the given
percentiles in a “weighted” way. Treat each value of Duration as if it was repeated BucketSize times in the input, without actually needing to materialize those records.
The following example shows weighted percentiles.
Using the following set of latency values in milliseconds:
{ 1, 1, 2, 2, 2, 5, 7, 7, 12, 12, 15, 15, 15, 18, 21, 22, 26, 35 }
.
To reduce bandwidth and storage, do pre-aggregation to the
following buckets: { 10, 20, 30, 40, 50, 100 }
. Count the number of events in each bucket to produce the following table:
let latencyTable = datatable (ReqCount:long, LatencyBucket:long)
[
8, 10,
6, 20,
3, 30,
1, 40
];
latencyTable
The table displays:
- Eight events in the 10-ms bucket (corresponding to subset
{ 1, 1, 2, 2, 2, 5, 7, 7 }
) - Six events in the 20-ms bucket (corresponding to subset
{ 12, 12, 15, 15, 15, 18 }
) - Three events in the 30-ms bucket (corresponding to subset
{ 21, 22, 26 }
) - One event in the 40-ms bucket (corresponding to subset
{ 35 }
)
At this point, the original data is no longer available. Only the number of events in each bucket. To compute percentiles from this data, use the percentilesw()
function.
For the 50, 75, and 99.9 percentiles, use the following query:
let latencyTable = datatable (ReqCount:long, LatencyBucket:long)
[
8, 10,
6, 20,
3, 30,
1, 40
];
latencyTable
| summarize percentilesw(LatencyBucket, ReqCount, 50, 75, 99.9)
Output
percentile_LatencyBucket_50 | percentile_LatencyBucket_75 | percentile_LatencyBucket_99_9 |
---|---|---|
20 | 20 | 40 |
Return percentiles as an array
Instead of returning the values in individual columns, use the percentilesw_array()
function to return the percentiles in a single column of dynamic array type.
Syntax
percentilesw_array(
expr,
weightExpr,
percentiles)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression to use for aggregation calculation. |
percentiles | int, long, or dynamic | ✔️ | One or more comma-separated percentiles or a dynamic array of percentiles. Each percentile can be an integer or long value. |
weightExpr | long | ✔️ | The weight to give each value. |
Returns
Returns an estimate for expr of the specified percentiles in the group as a single column of dynamic array type.
Examples
Comma-separated percentiles
let latencyTable = datatable (ReqCount:long, LatencyBucket:long)
[
8, 10,
6, 20,
3, 30,
1, 40
];
latencyTable
| summarize percentilesw_array(LatencyBucket, ReqCount, 50, 75, 99.9)
Output
percentile_LatencyBucket | ||
---|---|---|
[20, 20, 40] |
Dynamic array of percentiles
let latencyTable = datatable (ReqCount:long, LatencyBucket:long)
[
8, 10,
6, 20,
3, 30,
1, 40
];
latencyTable
| summarize percentilesw_array(LatencyBucket, ReqCount, dynamic([50, 75, 99.9]))
Output
percentile_LatencyBucket | ||
---|---|---|
[20, 20, 40] |
Related content
32 - stdev() (aggregation function)
Calculates the standard deviation of expr across the group, using Bessel’s correction for a small dataset that is considered a sample.
For a large dataset that is representative of the population, use stdevp() (aggregation function).
Formula
This function uses the following formula.
Syntax
stdev(
expr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression used for the standard deviation aggregation calculation. |
Returns
Returns the standard deviation value of expr across the group.
Example
The following example shows the standard deviation for the group.
range x from 1 to 5 step 1
| summarize make_list(x), stdev(x)
Output
list_x | stdev_x |
---|---|
[ 1, 2, 3, 4, 5] | 1.58113883008419 |
33 - stdevif() (aggregation function)
Calculates the standard deviation of expr in records for which predicate evaluates to true
.
Syntax
stdevif(
expr,
predicate)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression used for the standards deviation aggregation calculation. |
predicate | string | ✔️ | The predicate that has to evaluate to true in order for expr to be added to the result. |
Returns
Returns the standard deviation value of expr in records for which predicate evaluates to true
.
Example
The following example shows the standard deviation in a range of 1 to 100.
range x from 1 to 100 step 1
| summarize stdevif(x, x % 2 == 0)
Output
stdevif_x |
---|
29.1547594742265 |
34 - stdevp() (aggregation function)
Calculates the standard deviation of expr across the group, considering the group as a population for a large dataset that is representative of the population.
For a small dataset that is a sample, use stdev() (aggregation function).
Formula
This function uses the following formula.
Syntax
stdevp(
expr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression used for the standards deviation aggregation calculation. |
Returns
Returns the standard deviation value of expr across the group.
Example
range x from 1 to 5 step 1
| summarize make_list(x), stdevp(x)
Output
list_x | stdevp_x |
---|---|
[ 1, 2, 3, 4, 5] | 1.4142135623731 |
35 - sum() (aggregation function)
Calculates the sum of expr across the group.
Syntax
sum(
expr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr string | ✔️ | The expression used for the aggregation calculation. |
Returns
Returns the sum value of expr across the group.
Example
This example returns the total value of crop and property damages by state, and sorted in descending value.
StormEvents
| summarize EventCount=count(), TotalDamages = sum(DamageCrops+DamageProperty) by State
| sort by TotalDamages
Output
The results table shown includes only the first 10 rows.
| State | Eventcount | TotalDamages | | —- | — | | CALIFORNIA | 898 | 2801954600 | | GEORGIA | 1983 | 1190448750 | | MISSOURI | 2016 | 1096887450 | | OKLAHOMA | 1716 | 916557300 | | MISSISSIPPI | 1218 | 802890160 | | KANSAS | 3166 | 738830000 | | TEXAS | 4701 | 572086700 | | OHIO | 1233 | 417989500 | | FLORIDA | 1042 | 379455260 | | NORTH DAKOTA | 905 | 342460100 | | … | … | … |
36 - sumif() (aggregation function)
Calculates the sum of expr in records for which predicate evaluates to true
.
You can also use the sum() function, which sums rows without predicate expression.
Syntax
sumif(
expr,
predicate)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression used for the aggregation calculation. |
predicate | string | ✔️ | The expression used to filter rows. If the predicate evaluates to true , the row will be included in the result. |
Returns
Returns the sum of expr for which predicate evaluates to true
.
Example showing the sum of damages based on no casualty count
This example shows the sum total damage for storms without casualties.
StormEvents
| summarize DamageNoCasualties=sumif((DamageCrops+DamageProperty),(DeathsDirect+DeathsIndirect)==0) by State
Output
The results table shown includes only the first 10 rows.
State | DamageNoCasualties |
---|---|
TEXAS | 242638700 |
KANSAS | 407360000 |
IOWA | 135353700 |
ILLINOIS | 120394500 |
MISSOURI | 1096077450 |
GEORGIA | 1077448750 |
MINNESOTA | 230407300 |
WISCONSIN | 241550000 |
NEBRASKA | 70356050 |
NEW YORK | 58054000 |
… | … |
Example showing the sum of birth dates
This example shows the sum of the birth dates for all names that have more than 4 letters.
let T = datatable(name:string, day_of_birth:long)
[
"John", 9,
"Paul", 18,
"George", 25,
"Ringo", 7
];
T
| summarize sumif(day_of_birth, strlen(name) > 4)
Output
sumif_day_of_birth |
---|
32 |
37 - take_any() (aggregation function)
Arbitrarily chooses one record for each group in a summarize operator, and returns the value of one or more expressions over each such record.
Syntax
take_any(
expr_1 [,
expr_2 …])
take_any(
*)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr_N | string | ✔️ | The expression used for selecting a record. If the wildcard value (* ) is given in place of an expression, all records will be selected. |
Returns
The take_any
aggregation function returns the values of the expressions calculated
for each of the records selected Indeterministically from each group of the summarize operator.
If the *
argument is provided, the function behaves as if the expressions are all columns
of the input to the summarize operator barring the group-by columns, if any.
Remarks
This function is useful when you want to get a sample value of one or more columns per value of the compound group key.
When the function is provided with a single column reference, it will attempt to return a non-null/non-empty value, if such value is present.
As a result of the indeterministic nature of this function, using this function multiple times in
a single application of the summarize
operator isn’t equivalent to using
this function a single time with multiple expressions. The former may have each application
select a different record, while the latter guarantees that all values are calculated
over a single record (per distinct group).
Examples
Show indeterministic State:
StormEvents
| summarize take_any(State)
Output
State |
---|
ATLANTIC SOUTH |
Show all the details for a random record:
StormEvents
| project StartTime, EpisodeId, State, EventType
| summarize take_any(*)
Output
StartTime | EpisodeId | State | EventType |
---|---|---|---|
2007-09-29 08:11:00.0000000 | 11091 | ATLANTIC SOUTH | Waterspout |
Show all the details of a random record for each State starting with ‘A’:
StormEvents
| where State startswith "A"
| project StartTime, EpisodeId, State, EventType
| summarize take_any(*) by State
Output
State | StartTime | EpisodeId | EventType |
---|---|---|---|
ALASKA | 2007-02-01 00:00:00.0000000 | 1733 | Flood |
ATLANTIC SOUTH | 2007-09-29 08:11:00.0000000 | 11091 | Waterspout |
ATLANTIC NORTH | 2007-11-27 00:00:00.0000000 | 11523 | Marine Thunderstorm Wind |
ARIZONA | 2007-12-01 10:40:00.0000000 | 11955 | Flash Flood |
AMERICAN SAMOA | 2007-12-07 14:00:00.0000000 | 13183 | Flash Flood |
ARKANSAS | 2007-12-09 16:00:00.0000000 | 11319 | Lightning |
ALABAMA | 2007-12-15 18:00:00.0000000 | 12580 | Heavy Rain |
Related content
38 - take_anyif() (aggregation function)
Arbitrarily selects one record for each group in a summarize operator in records for which the predicate is ’true’. The function returns the value of an expression over each such record.
This function is useful when you want to get a sample value of one column per value of the compound group key, subject to some predicate that is true. If such a value is present, the function attempts to return a non-null/non-empty value.
Syntax
take_anyif(
expr,
predicate )
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression used for selecting a record. |
predicate | string | ✔️ | Indicates which records may be considered for evaluation. |
Returns
The take_anyif
aggregation function returns the value of the expression calculated
for each of the records randomly selected from each group of the summarize operator. Only records for which predicate returns ’true’ may be selected. If the predicate doesn’t return ’true’, a null value is produced.
Examples
Pick a random EventType from Storm events, where event description has a key phrase.
StormEvents
| summarize take_anyif(EventType, EventNarrative has 'strong wind')
Output
EventType |
---|
Strong Wind |
Related content
39 - tdigest_merge() (aggregation functions)
Merges tdigest results across the group.
For more information about the underlying algorithm (T-Digest) and the estimated error, see estimation error in percentiles.
Syntax
tdigest_merge(
expr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression used for the aggregation calculation. |
Returns
Returns the merged tdigest values of expr across the group.
Example
StormEvents
| summarize PreAggDamageProperty=tdigest(DamageProperty) by State
| summarize tdigest_merge(PreAggDamageProperty)
Output
merge_tdigests_PreAggDamageProperty |
---|
[[7],[91,30,73667,966,110000000,24428,2500,20000,16500000,6292,40000,123208,1000000,133091,90583,20000000,977000,20007,547000,19000000,1221,9600000,300000,70072,55940,75000,417500,1410000,20400000,331500,15000000,62000000,50222,121690000,160400,6200000,252500,450,11000000,2200000,5700000,11566,12000000,263,50000,200000,3700000,13286,171000,100000000,28200000,65000000,17709,30693,16000000,7938,5200,2875,1500000,3480000,151100000,9800000,18200000,21600000,199,2570000,30000000,38000000,72000,891250,500000000,26385,80092,27000000,35000000,754500,11500000,3262500,113945,5000,62429,175294,9071,6500000,3321,15159,21850000,300000000,22683,3000,10000000,60055,600000,52000000,496000,15000,50000000,10140000,11900000,2100000,62600000,77125,310667,70000000,101000000,2088,1608571,19182,400000,179833,775000,612000,150000000,13500000,2600000,1250000,65400,45000000,297000,2500000,40000000,24846,30000,59067,1893,15762,142571,220666,195000,2000000,355000,2275000,6000000,46000000,38264,50857,4002,97333,27750,1000,1111429,7043,272500,455200,503,37500000,10000,1489,0,1200000,110538,60000000,250000,10730,1901429,291000,698750,649000,2716667,137000000,6400000,29286,41051,6850000,102000,4602,80000000,250000000,371667,8000000,729,8120000,5000000,20830,152400,803300,349667,202000,207000,81150000,48000000,750000,26000000,8900000,239143,75000000,248000,14342,74857,5992,500000,150000,938000,10533333,45248,105000000,7000000,35030,4000000,2000,7692500,3000000,25000000,4500000,87222,12054,100000,25000,9771,4840000,28000000,1307143,32024],[19,1,3,32,1,14,45,572,1,51,126,41,101,11,12,8,2,14,4,1,27,1,58,42,20,177,6,4,1,12,10,2,9,1,5,1,2,28,3,6,1,23,4,30,610,145,1,21,4,2,1,1,24,13,1,153,5,4,26,5,1,6,1,1,28,1,5,1,11,4,1,13,44,2,4,2,1,4,9,1672,7,17,47,2,39,17,2,1,17,666,16,71,21,3,1,530,10,1,1,2,1,4,6,4,1,20,7,11,40,6,2,1,1,2,1,3,5,2,1,21,2,13,271,3,14,23,7,15,2,41,1,2,7,1,27,7,205,3,4,1403,7,69,4,10,215,1,1472,127,45756,10,13,1,198,17,7,1,12,7,6,1,1,14,7,2,2,17,1,2,3,2,48,5,21,10,5,10,21,4,5,1,2,39,2,2,7,1,1,22,7,60,175,119,3,3,40,1,8,101,15,1135,4,22,3,3,9,76,430,611,12,1,2,7,8]] |
40 - tdigest() (aggregation function)
Calculates the intermediate results of percentiles()
across the group.
For more information, see the underlying algorithm (T-Digest) and the estimated error.
Syntax
tdigest(
expr [,
weight])
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression used for the aggregation calculation. |
weight | string | The weights of the values for the aggregation calculation. |
Returns
The Intermediate results of weighted percentiles of *expr*
across the group.
Examples
Results per state
This example shows the results of the tdigest percentiles sorted by state.
StormEvents
| summarize tdigest(DamageProperty) by State
The results table shown includes only the first 10 rows.
State | tdigest_DamageProperty |
---|---|
NEBRASKA | [[7],[800,250,300000,5000,240000,1500000,20000,550000,0,75000,100000,1000,10000,30000,13000,2000000,1000000,650000,125000,35000,7000,2500000,4000000,450000,85000,460000,500000,6000,150000,350000,4000,72500,1200000,180000,400000,25000,50000,2000,45000,8000,120000,200000,40000,1200,15000,55000,3000,250000],[5,1,3,72,1,1,44,1,1351,12,24,17,46,13,6,1,2,1,2,6,8,1,1,1,2,1,4,2,6,1,2,2,1,1,2,26,18,12,2,2,1,7,6,4,28,4,6,6]] |
MINNESOTA | [[7],[700,500,2000000,2500,1200000,12000000,16000,7000000,0,300000,425000,750,6000,30000,10000,22000000,10000000,9600000,600000,50000,4000,27000000,35000000,4000000,400000,5000000,6000000,3000,750000,2500000,2000,250000,11000000,38000000,3000000,20000,120000,1000,100000,5000,500000,1000000,60000,800,15000,200000,1500,1500000,900000],[1,3,1,3,1,2,1,1,1793,1,1,2,2,2,3,1,1,1,2,2,1,1,1,1,2,1,2,1,1,1,6,1,1,1,3,5,1,5,2,5,2,2,1,2,2,2,2,1,1]] |
KANSAS | [[7],[667,200,6000000,3400,80000,300000,18875,210000,0,45857,750000,37500000,10000,81150000,15000000,6400000,2570000,225000,59400,25000,5000,400000,7000000,4500000,2500000,6500000,200000,4500,70000,122500,2785,12000000,1900000,18200000,150000,1150000,27000000,2000,30000,2000000,250000000,75000,26000,1500,1500000,1000000,2500,100000,21600000,50000,335000,600000,175000,500000,160000,51000,40000,20000,15000,252500,7520,350000,250000,3400000,1000,338000,16000000,106000,4840000,305000,540000,337500,9800000,45000,12500,700000,4000000,71000,30000000,35000,3700000,22000,56000],[12,2,2,5,2,3,8,1,2751,7,2,1,37,1,1,1,1,2,5,12,33,8,1,1,1,2,10,1,5,2,7,1,4,1,5,1,1,9,11,4,1,5,2,6,4,8,2,23,1,44,2,3,2,3,1,1,1,18,5,2,5,1,7,1,25,1,1,3,1,1,1,2,6,1,1,2,1,1,1,3,1,1,1]] |
NEW MEXICO | [[7],[600,500,2500000,7000,1500,28000,40000,10000,0,500000,20000,1000,21000,70000,25000,3500000,200000,16500000,50000,100000,15000,4000,5000,2000],[1,3,1,1,1,1,1,7,466,1,7,4,1,1,2,1,1,1,1,2,1,4,10,8]] |
KENTUCKY | [[7],[600,200,700000,5000,400000,12000,15000,100000,0,60000,80000,1000,9000,20000,10000,50000,30000,300000,120000,25000,7000,3000,500000,11500000,75000,35000,8000,6000,150000,1500000,4000,56000,1911,250000,2500000,18000,45000,2000],[6,2,1,42,1,3,9,8,999,2,1,52,1,21,37,25,7,2,3,14,11,35,1,1,6,10,9,10,4,1,13,1,9,3,1,2,1,37]] |
VIRGINIA | [[7],[536,500,125000,3000,100000,7250,8000,60000,0,40000,50000,956,6000,11500,7000,25000,15000,98000,70000,12000,4000,2000,120000,1000000,45000,16000,5000,3500,75000,175000,2500,30000,1000,80000,300000,10000,20000,1500],[7,11,1,48,2,2,2,1,1025,2,6,9,2,2,1,5,16,1,3,5,12,122,1,1,1,1,64,2,2,1,1,7,209,3,2,42,19,6]] |
OREGON | [[7],[5000,1000,60000,434000,20000,50000,100000,500000,0,1500000,20400000,6000,62600000],[8,2,1,1,1,1,3,1,401,1,1,1,1]] |
ALASKA | [[7],[5000,1000,25000,700000,12060,15000,100000,1600000,0,10000],[5,1,1,1,1,2,1,2,242,1]] |
CONNECTICUT | [[7],[5000,1000,2000000,0,50000,750000,6000],[1,1,1,142,1,1,1]] |
NEVADA | [[7],[5000,1000,200000,1000000,30000,40000,297000,5000000,0,10000],[4,2,1,1,1,1,1,1,148,3]] |
Convert pre-existing centroids
The following example shows how one can convert pre-existing T-Digest centroids for long-term storage.
The V
column holds the value of each centroid, and the W
column is its weight (relative count).
The tdigest()
aggregate function is then applied to convert the data in table DT
into the internal
representation, and percentile_tdigest()
is used to demonstrate how ot find the 50-tile value.
let DT=datatable(V:real, W:long) [
1.0, 1,
2.0, 2
];
DT
| summarize TD=tdigest(V, W)
| project P50=percentile_tdigest(TD, 50)
P50 |
---|
2 |
41 - variance() (aggregation function)
Calculates the variance of expr across the group, considering the group as a sample.
The following formula is used:
Syntax
variance(
expr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | real | ✔️ | The expression used for the variance calculation. |
Returns
Returns the variance value of expr across the group.
Example
range x from 1 to 5 step 1
| summarize make_list(x), variance(x)
Output
list_x | variance_x |
---|---|
[ 1, 2, 3, 4, 5] | 2.5 |
42 - varianceif() (aggregation function)
Calculates the variance of expr in records for which predicate evaluates to true
.
Syntax
varianceif(
expr,
predicate)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression to use for the variance calculation. |
predicate | string | ✔️ | If predicate evaluates to true , the expr calculated value will be added to the variance. |
Returns
Returns the variance value of expr in records for which predicate evaluates to true
.
Example
range x from 1 to 100 step 1
| summarize varianceif(x, x%2 == 0)
Output
varianceif_x |
---|
850 |
43 - variancep() (aggregation function)
Calculates the variance of expr across the group, considering the group as a population.
The following formula is used:
Syntax
variancep(
expr)
Parameters
Name | Type | Required | Description |
---|---|---|---|
expr | string | ✔️ | The expression to use for the variance calculation. |
Returns
Returns the variance value of expr across the group.
Example
range x from 1 to 5 step 1
| summarize make_list(x), variancep(x)
Output
list_x | variance_x |
---|---|
[ 1, 2, 3, 4, 5] | 2 |