percentiles_linear_fl()
The function percentiles_linear_fl() is a user-defined function (UDF) that calculates percentiles using linear interpolation between closest ranks, the same method used by Excel’s PERCENTILES.INC function. Kusto native percentile functions use the nearest rank method. For large sets of values the difference between both methods is insignificant, and we recommend using the native function for best performance. For further details on these and additional percentile calculation methods have a look at percentile article on Wikipedia.
The function accepts a table containing the column to calculate on and an optional grouping key, and a dynamic array of the required percentiles, and returns a column containing dynamic array of the percentiles’ values per each group.
Syntax
T | invoke percentiles_linear_fl(val_col, pct_arr [, aggr_col ])
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
| val_col | string | ✔️ | The name of the column that contains the values with which to calculate the percentiles. |
| pct_arr | dynamic | ✔️ | A numerical array containing the required percentiles. Each percentile should be in the range [0-100]. |
| aggr_col | string | The name of the column that contains the grouping key. |
Function definition
You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:
Query-defined
Define the function using the following let statement. No permissions are required.
let percentiles_linear_fl=(tbl:(*), val_col:string, pct_arr:dynamic, aggr_col:string='')
{
tbl
| extend _vals = column_ifexists(val_col, 0.0)
| extend _key = column_ifexists(aggr_col, 'ALL')
| order by _key asc, _vals asc
| summarize _vals=make_list(_vals) by _key
| extend n = array_length(_vals)
| extend pct=pct_arr
| mv-apply pct to typeof(real) on (
extend index=pct/100.0*(n-1)
| extend low_index=tolong(floor(index, 1)), high_index=tolong(ceiling(index))
| extend interval=todouble(_vals[high_index])-todouble(_vals[low_index])
| extend pct_val=todouble(_vals[low_index])+(index-low_index)*interval
| summarize pct_arr=make_list(pct), pct_val=make_list(pct_val))
| project-away n
};
// Write your query to use the function here.
Stored
Define the stored function once using the following .create function. Database User permissions are required.
.create-or-alter function with (folder = "Packages\\Stats", docstring = "Calculate linear interpolated percentiles (identical to Excel's PERCENTILE.INC)")
percentiles_linear_fl(tbl:(*), val_col:string, pct_arr:dynamic, aggr_col:string='')
{
tbl
| extend _vals = column_ifexists(val_col, 0.0)
| extend _key = column_ifexists(aggr_col, 'ALL')
| order by _key asc, _vals asc
| summarize _vals=make_list(_vals) by _key
| extend n = array_length(_vals)
| extend pct=pct_arr
| mv-apply pct to typeof(real) on (
extend index=pct/100.0*(n-1)
| extend low_index=tolong(floor(index, 1)), high_index=tolong(ceiling(index))
| extend interval=todouble(_vals[high_index])-todouble(_vals[low_index])
| extend pct_val=todouble(_vals[low_index])+(index-low_index)*interval
| summarize pct_arr=make_list(pct), pct_val=make_list(pct_val))
| project-away n
}
Example
The following example uses the invoke operator to run the function.
Query-defined
To use a query-defined function, invoke it after the embedded function definition.
let percentiles_linear_fl=(tbl:(*), val_col:string, pct_arr:dynamic, aggr_col:string='')
{
tbl
| extend _vals = column_ifexists(val_col, 0.0)
| extend _key = column_ifexists(aggr_col, 'ALL')
| order by _key asc, _vals asc
| summarize _vals=make_list(_vals) by _key
| extend n = array_length(_vals)
| extend pct=pct_arr
| mv-apply pct to typeof(real) on (
extend index=pct/100.0*(n-1)
| extend low_index=tolong(floor(index, 1)), high_index=tolong(ceiling(index))
| extend interval=todouble(_vals[high_index])-todouble(_vals[low_index])
| extend pct_val=todouble(_vals[low_index])+(index-low_index)*interval
| summarize pct_arr=make_list(pct), pct_val=make_list(pct_val))
| project-away n
};
datatable(x:long, name:string) [
5, 'A',
9, 'A',
7, 'A',
5, 'B',
7, 'B',
7, 'B',
10, 'B',
]
| invoke percentiles_linear_fl('x', dynamic([0, 25, 50, 75, 100]), 'name')
| project-rename name=_key, x=_vals
Stored
datatable(x:long, name:string) [
5, 'A',
9, 'A',
7, 'A',
5, 'B',
7, 'B',
7, 'B',
10, 'B',
]
| invoke percentiles_linear_fl('x', dynamic([0, 25, 50, 75, 100]), 'name')
| project-rename name=_key, x=_vals
Output
| name | x | pct_arr | pct_val |
|---|---|---|---|
| A | [5,7,9] | [0,25,50,75,100] | [5,6,7,8,9] |
| B | [5,7,7,10] | [0,25,50,75,100] | [5,6.5,7,7.75,10] |
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.