This is the multi-page printable view of this section. Click here to print.
Plugins
- 1: Data reshaping plugins
- 1.1: bag_unpack plugin
- 1.2: narrow plugin
- 1.3: pivot plugin
- 2: General plugins
- 2.1: dcount_intersect plugin
- 2.2: infer_storage_schema plugin
- 2.3: infer_storage_schema_with_suggestions plugin
- 2.4: ipv4_lookup plugin
- 2.5: ipv6_lookup plugin
- 2.6: preview plugin
- 2.7: schema_merge plugin
- 3: Language plugins
- 3.1: Python plugin
- 3.2: Python plugin packages
- 3.3: R plugin (Preview)
- 4: Machine learning plugins
- 4.1: autocluster plugin
- 4.2: basket plugin
- 4.3: diffpatterns plugin
- 4.4: diffpatterns_text plugin
- 5: Query connectivity plugins
- 5.1: ai_embed_text plugin (Preview)
- 5.2: azure_digital_twins_query_request plugin
- 5.3: cosmosdb_sql_request plugin
- 5.4: http_request plugin
- 5.5: http_request_post plugin
- 5.6: mysql_request plugin
- 5.7: postgresql_request plugin
- 5.8: sql_request plugin
- 6: User and sequence analytics plugins
- 6.1: active_users_count plugin
- 6.2: activity_counts_metrics plugin
- 6.3: activity_engagement plugin
- 6.4: activity_metrics plugin
- 6.5: funnel_sequence plugin
- 6.6: funnel_sequence_completion plugin
- 6.7: new_activity_metrics plugin
- 6.8: rolling_percentile plugin
- 6.9: rows_near plugin
- 6.10: sequence_detect plugin
- 6.11: session_count plugin
- 6.12: sliding_window_counts plugin
- 6.13: User Analytics
1 - Data reshaping plugins
1.1 - bag_unpack plugin
The bag_unpack
plugin unpacks a single column of type dynamic
, by treating each property bag top-level slot as a column. The plugin is invoked with the evaluate
operator.
Syntax
T |
evaluate
bag_unpack(
Column [,
OutputColumnPrefix ] [,
columnsConflict ] [,
ignoredProperties ] )
[:
OutputSchema]
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose column Column is to be unpacked. |
Column | dynamic | ✔️ | The column of T to unpack. |
OutputColumnPrefix | string | A common prefix to add to all columns produced by the plugin. | |
columnsConflict | string | The direction for column conflict resolution. Valid values:error - Query produces an error (default)replace_source - Source column is replacedkeep_source - Source column is kept | |
ignoredProperties | dynamic | An optional set of bag properties to be ignored. } | |
OutputSchema | The names and types for the expected columns of the bag_unpack plugin output. Specifying the expected schema optimizes query execution by not having to first run the actual query to explore the schema. For syntax information, see Output schema syntax. |
Output schema syntax
(
ColumnName :
ColumnType [,
…] )
To add all columns of the input table to the plugin output, use a wildcard *
as the first parameter, as follows:
(
*
,
ColumnName :
ColumnType [,
…] )
Returns
The bag_unpack
plugin returns a table with as many records as its tabular input (T). The schema of the table is the same as the schema of its tabular input with the following modifications:
- The specified input column (Column) is removed.
- The schema is extended with as many columns as there are distinct slots in
the top-level property bag values of T. The name of each column corresponds
to the name of each slot, optionally prefixed by OutputColumnPrefix. Its
type is either the type of the slot, if all values of the same slot have the
same type, or
dynamic
, if the values differ in type.
Examples
Expand a bag
datatable(d:dynamic)
[
dynamic({"Name": "John", "Age":20}),
dynamic({"Name": "Dave", "Age":40}),
dynamic({"Name": "Jasmine", "Age":30}),
]
| evaluate bag_unpack(d)
Output
Age | Name |
---|---|
20 | John |
40 | Dave |
30 | Jasmine |
Expand a bag with OutputColumnPrefix
Expand a bag and use the OutputColumnPrefix
option to produce column names that begin with the prefix ‘Property_’.
datatable(d:dynamic)
[
dynamic({"Name": "John", "Age":20}),
dynamic({"Name": "Dave", "Age":40}),
dynamic({"Name": "Jasmine", "Age":30}),
]
| evaluate bag_unpack(d, 'Property_')
Output
Property_Age | Property_Name |
---|---|
20 | John |
40 | Dave |
30 | Jasmine |
Expand a bag with columnsConflict
Expand a bag and use the columnsConflict
option to resolve conflicts between existing columns and columns produced by the bag_unpack()
operator.
datatable(Name:string, d:dynamic)
[
'Old_name', dynamic({"Name": "John", "Age":20}),
'Old_name', dynamic({"Name": "Dave", "Age":40}),
'Old_name', dynamic({"Name": "Jasmine", "Age":30}),
]
| evaluate bag_unpack(d, columnsConflict='replace_source') // Use new name
Output
Age | Name |
---|---|
20 | John |
40 | Dave |
30 | Jasmine |
datatable(Name:string, d:dynamic)
[
'Old_name', dynamic({"Name": "John", "Age":20}),
'Old_name', dynamic({"Name": "Dave", "Age":40}),
'Old_name', dynamic({"Name": "Jasmine", "Age":30}),
]
| evaluate bag_unpack(d, columnsConflict='keep_source') // Keep old name
Output
Age | Name |
---|---|
20 | Old_name |
40 | Old_name |
30 | Old_name |
Expand a bag with ignoredProperties
Expand a bag and use the ignoredProperties
option to ignore certain properties in the property bag.
datatable(d:dynamic)
[
dynamic({"Name": "John", "Age":20, "Address": "Address-1" }),
dynamic({"Name": "Dave", "Age":40, "Address": "Address-2"}),
dynamic({"Name": "Jasmine", "Age":30, "Address": "Address-3"}),
]
// Ignore 'Age' and 'Address' properties
| evaluate bag_unpack(d, ignoredProperties=dynamic(['Address', 'Age']))
Output
Name |
---|
John |
Dave |
Jasmine |
Expand a bag with a query-defined OutputSchema
Expand a bag and use the OutputSchema
option to allow various optimizations to be evaluated before running the actual query.
datatable(d:dynamic)
[
dynamic({"Name": "John", "Age":20}),
dynamic({"Name": "Dave", "Age":40}),
dynamic({"Name": "Jasmine", "Age":30}),
]
| evaluate bag_unpack(d) : (Name:string, Age:long)
Output
Name | Age |
---|---|
John | 20 |
Dave | 40 |
Jasmine | 30 |
Expand a bag and use the OutputSchema
option to allow various optimizations to be evaluated before running the actual query. Use a wildcard *
to return all columns of the input table.
datatable(d:dynamic, Description: string)
[
dynamic({"Name": "John", "Age":20}), "Student",
dynamic({"Name": "Dave", "Age":40}), "Teacher",
dynamic({"Name": "Jasmine", "Age":30}), "Student",
]
| evaluate bag_unpack(d) : (*, Name:string, Age:long)
Output
Description | Name | Age |
---|---|---|
Student | John | 20 |
Teacher | Dave | 40 |
Student | Jasmine | 30 |
1.2 - narrow plugin
The narrow
plugin “unpivots” a wide table into a table with three columns:
- Row number
- Column type
- Column value (as
string
)
The narrow
plugin is designed mainly for display purposes, as it allows wide
tables to be displayed comfortably without the need of horizontal scrolling.
The plugin is invoked with the evaluate
operator.
Syntax
T | evaluate narrow()
Examples
The following example shows an easy way to read the output of the Kusto
.show diagnostics
management command.
.show diagnostics
| evaluate narrow()
The results of .show diagnostics
itself is a table with a single row and
33 columns. By using the narrow
plugin we “rotate” the output to something
like this:
Row | Column | Value |
---|---|---|
0 | IsHealthy | True |
0 | IsRebalanceRequired | False |
0 | IsScaleOutRequired | False |
0 | MachinesTotal | 2 |
0 | MachinesOffline | 0 |
0 | NodeLastRestartedOn | 2017-03-14 10:59:18.9263023 |
0 | AdminLastElectedOn | 2017-03-14 10:58:41.6741934 |
0 | ClusterWarmDataCapacityFactor | 0.130552847673333 |
0 | ExtentsTotal | 136 |
0 | DiskColdAllocationPercentage | 5 |
0 | InstancesTargetBasedOnDataCapacity | 2 |
0 | TotalOriginalDataSize | 5167628070 |
0 | TotalExtentSize | 1779165230 |
0 | IngestionsLoadFactor | 0 |
0 | IngestionsInProgress | 0 |
0 | IngestionsSuccessRate | 100 |
0 | MergesInProgress | 0 |
0 | BuildVersion | 1.0.6281.19882 |
0 | BuildTime | 2017-03-13 11:02:44.0000000 |
0 | ClusterDataCapacityFactor | 0.130552847673333 |
0 | IsDataWarmingRequired | False |
0 | RebalanceLastRunOn | 2017-03-21 09:14:53.8523455 |
0 | DataWarmingLastRunOn | 2017-03-21 09:19:54.1438800 |
0 | MergesSuccessRate | 100 |
0 | NotHealthyReason | [null] |
0 | IsAttentionRequired | False |
0 | AttentionRequiredReason | [null] |
0 | ProductVersion | KustoRelease_2017.03.13.2 |
0 | FailedIngestOperations | 0 |
0 | FailedMergeOperations | 0 |
0 | MaxExtentsInSingleTable | 64 |
0 | TableWithMaxExtents | KustoMonitoringPersistentDatabase.KustoMonitoringTable |
0 | WarmExtentSize | 1779165230 |
1.3 - pivot plugin
Rotates a table by turning the unique values from one column in the input table into multiple columns in the output table and performs aggregations as required on any remaining column values that will appear in the final output.
Syntax
T | evaluate pivot(
pivotColumn[,
aggregationFunction] [,
column1 [,
column2 … ]])
[:
OutputSchema]
Parameters
Name | Type | Required | Description |
---|---|---|---|
pivotColumn | string | ✔️ | The column to rotate. Each unique value from this column will be a column in the output table. |
aggregationFunction | string | An aggregation function used to aggregate multiple rows in the input table to a single row in the output table. Currently supported functions: min() , max() , take_any() , sum() , dcount() , avg() , stdev() , variance() , make_list() , make_bag() , make_set() , count() . The default is count() . | |
column1, column2, … | string | A column name or comma-separated list of column names. The output table will contain an additional column per each specified column. The default is all columns other than the pivoted column and the aggregation column. | |
OutputSchema | The names and types for the expected columns of the pivot plugin output.Syntax: ( ColumnName : ColumnType [, …] ) Specifying the expected schema optimizes query execution by not having to first run the actual query to explore the schema. An error is raised if the run-time schema doesn’t match the OutputSchema schema. |
Returns
Pivot returns the rotated table with specified columns (column1, column2, …) plus all unique values of the pivot columns. Each cell for the pivoted columns will contain the aggregate function computation.
Examples
Pivot by a column
For each EventType and State starting with ‘AL’, count the number of events of this type in this state.
StormEvents
| project State, EventType
| where State startswith "AL"
| where EventType has "Wind"
| evaluate pivot(State)
Output
EventType | ALABAMA | ALASKA |
---|---|---|
Thunderstorm Wind | 352 | 1 |
High Wind | 0 | 95 |
Extreme Cold/Wind Chill | 0 | 10 |
Strong Wind | 22 | 0 |
Pivot by a column with aggregation function
For each EventType and State starting with ‘AR’, display the total number of direct deaths.
StormEvents
| where State startswith "AR"
| project State, EventType, DeathsDirect
| where DeathsDirect > 0
| evaluate pivot(State, sum(DeathsDirect))
Output
EventType | ARKANSAS | ARIZONA |
---|---|---|
Heavy Rain | 1 | 0 |
Thunderstorm Wind | 1 | 0 |
Lightning | 0 | 1 |
Flash Flood | 0 | 6 |
Strong Wind | 1 | 0 |
Heat | 3 | 0 |
Pivot by a column with aggregation function and a single additional column
Result is identical to previous example.
StormEvents
| where State startswith "AR"
| project State, EventType, DeathsDirect
| where DeathsDirect > 0
| evaluate pivot(State, sum(DeathsDirect), EventType)
Output
EventType | ARKANSAS | ARIZONA |
---|---|---|
Heavy Rain | 1 | 0 |
Thunderstorm Wind | 1 | 0 |
Lightning | 0 | 1 |
Flash Flood | 0 | 6 |
Strong Wind | 1 | 0 |
Heat | 3 | 0 |
Specify the pivoted column, aggregation function, and multiple additional columns
For each event type, source, and state, sum the number of direct deaths.
StormEvents
| where State startswith "AR"
| where DeathsDirect > 0
| evaluate pivot(State, sum(DeathsDirect), EventType, Source)
Output
EventType | Source | ARKANSAS | ARIZONA |
---|---|---|---|
Heavy Rain | Emergency Manager | 1 | 0 |
Thunderstorm Wind | Emergency Manager | 1 | 0 |
Lightning | Newspaper | 0 | 1 |
Flash Flood | Trained Spotter | 0 | 2 |
Flash Flood | Broadcast Media | 0 | 3 |
Flash Flood | Newspaper | 0 | 1 |
Strong Wind | Law Enforcement | 1 | 0 |
Heat | Newspaper | 3 | 0 |
Pivot with a query-defined output schema
The following example selects specific columns in the StormEvents table. It uses an explicit schema definition that allows various optimizations to be evaluated before running the actual query.
StormEvents
| project State, EventType
| where EventType has "Wind"
| evaluate pivot(State): (EventType:string, ALABAMA:long, ALASKA:long)
Output
EventType | ALABAMA | ALASKA |
---|---|---|
Thunderstorm Wind | 352 | 1 |
High Wind | 0 | 95 |
Marine Thunderstorm Wind | 0 | 0 |
Strong Wind | 22 | 0 |
Extreme Cold/Wind Chill | 0 | 10 |
Cold/Wind Chill | 0 | 0 |
Marine Strong Wind | 0 | 0 |
Marine High Wind | 0 | 0 |
2 - General plugins
2.1 - dcount_intersect plugin
Calculates intersection between N sets based on hll
values (N in range of [2..16]), and returns N dcount
values. The plugin is invoked with the evaluate
operator.
Syntax
T | evaluate
dcount_intersect(
hll_1, hll_2, [,
hll_3,
…])
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input tabular expression. |
hll_i | The values of set Si calculated with the hll() function. |
Returns
Returns a table with N dcount
values (per column, representing set intersections).
Column names are s0, s1, … (until n-1).
Given sets S1, S2, .. Sn return values will be representing distinct counts of:
S1,
S1 ∩ S2,
S1 ∩ S2 ∩ S3,
… ,
S1 ∩ S2 ∩ … ∩ Sn
Examples
// Generate numbers from 1 to 100
range x from 1 to 100 step 1
| extend isEven = (x % 2 == 0), isMod3 = (x % 3 == 0), isMod5 = (x % 5 == 0)
// Calculate conditional HLL values (note that '0' is included in each of them as additional value, so we will subtract it later)
| summarize hll_even = hll(iif(isEven, x, 0), 2),
hll_mod3 = hll(iif(isMod3, x, 0), 2),
hll_mod5 = hll(iif(isMod5, x, 0), 2)
// Invoke the plugin that calculates dcount intersections
| evaluate dcount_intersect(hll_even, hll_mod3, hll_mod5)
| project evenNumbers = s0 - 1, // 100 / 2 = 50
even_and_mod3 = s1 - 1, // gcd(2,3) = 6, therefor: 100 / 6 = 16
even_and_mod3_and_mod5 = s2 - 1 // gcd(2,3,5) is 30, therefore: 100 / 30 = 3
Output
evenNumbers | even_and_mod3 | even_and_mod3_and_mod5 |
---|---|---|
50 | 16 | 3 |
Related content
2.2 - infer_storage_schema plugin
This plugin infers the schema of external data, and returns it as CSL schema string. The string can be used when creating external tables. The plugin is invoked with the evaluate
operator.
Authentication and authorization
In the properties of the request, you specify storage connection strings to access. Each storage connection string specifies the authorization method to use for access to the storage. Depending on the authorization method, the principal may need to be granted permissions on the external storage to perform the schema inference.
The following table lists the supported authentication methods and any required permissions by storage type.
Authentication method | Azure Blob Storage / Data Lake Storage Gen2 | Data Lake Storage Gen1 |
---|---|---|
Impersonation | Storage Blob Data Reader | Reader |
Shared Access (SAS) token | List + Read | This authentication method isn’t supported in Gen1. |
Microsoft Entra access token | ||
Storage account access key | This authentication method isn’t supported in Gen1. |
Syntax
evaluate
infer_storage_schema(
Options )
Parameters
Name | Type | Required | Description |
---|---|---|---|
Options | dynamic | ✔️ | A property bag specifying the properties of the request. |
Supported properties of the request
Name | Type | Required | Description |
---|---|---|---|
StorageContainers | dynamic | ✔️ | An array of storage connection strings that represent prefix URI for stored data artifacts. |
DataFormat | string | ✔️ | One of the supported data formats. |
FileExtension | string | If specified, the function only scans files ending with this file extension. Specifying the extension may speed up the process or eliminate data reading issues. | |
FileNamePrefix | string | If specified, the function only scans files starting with this prefix. Specifying the prefix may speed up the process. | |
Mode | string | The schema inference strategy. A value of: any , last , all . The function infers the data schema from the first found file, from the last written file, or from all files respectively. The default value is last . | |
InferenceOptions | dynamic | More inference options. Valid options: UseFirstRowAsHeader for delimited file formats. For example, 'InferenceOptions': {'UseFirstRowAsHeader': true} . |
Returns
The infer_storage_schema
plugin returns a single result table containing a single row/column containing CSL schema string.
Example
let options = dynamic({
'StorageContainers': [
h@'https://storageaccount.blob.core.windows.net/MobileEvents;secretKey'
],
'FileExtension': '.parquet',
'FileNamePrefix': 'part-',
'DataFormat': 'parquet'
});
evaluate infer_storage_schema(options)
Output
CslSchema |
---|
app_id:string, user_id:long, event_time:datetime, country:string, city:string, device_type:string, device_vendor:string, ad_network:string, campaign:string, site_id:string, event_type:string, event_name:string, organic:string, days_from_install:int, revenue:real |
Use the returned schema in external table definition:
.create external table MobileEvents(
app_id:string, user_id:long, event_time:datetime, country:string, city:string, device_type:string, device_vendor:string, ad_network:string, campaign:string, site_id:string, event_type:string, event_name:string, organic:string, days_from_install:int, revenue:real
)
kind=blob
partition by (dt:datetime = bin(event_time, 1d), app:string = app_id)
pathformat = ('app=' app '/dt=' datetime_pattern('yyyyMMdd', dt))
dataformat = parquet
(
h@'https://storageaccount.blob.core.windows.net/MovileEvents;secretKey'
)
Related content
2.3 - infer_storage_schema_with_suggestions plugin
This infer_storage_schema_with_suggestions
plugin infers the schema of external data and returns a JSON object. For each column, the object provides inferred type, a recommended type, and the recommended mapping transformation. The recommended type and mapping are provided by the suggestion logic that determines the optimal type using the following logic:
- Identity columns: If the inferred type for a column is
long
and the column name ends withid
, the suggested type isstring
since it provides optimized indexing for identity columns where equality filters are common. - Unix datetime columns: If the inferred type for a column is
long
and one of the unix-time to datetime mapping transformations produces a valid datetime value, the suggested type isdatetime
and the suggestedApplicableTransformationMapping
mapping is the one that produced a valid datetime value.
The plugin is invoked with the evaluate
operator. To obtain the table schema that uses the inferred schema for Create and alter Azure Storage external tables without suggestions, use the infer_storage_schema plugin.
Authentication and authorization
In the properties of the request, you specify storage connection strings to access. Each storage connection string specifies the authorization method to use for access to the storage. Depending on the authorization method, the principal may need to be granted permissions on the external storage to perform the schema inference.
The following table lists the supported authentication methods and any required permissions by storage type.
Authentication method | Azure Blob Storage / Data Lake Storage Gen2 | Data Lake Storage Gen1 |
---|---|---|
Impersonation | Storage Blob Data Reader | Reader |
Shared Access (SAS) token | List + Read | This authentication method isn’t supported in Gen1. |
Microsoft Entra access token | ||
Storage account access key | This authentication method isn’t supported in Gen1. |
Syntax
evaluate
infer_storage_schema_with_suggestions(
Options )
Parameters
Name | Type | Required | Description |
---|---|---|---|
Options | dynamic | ✔️ | A property bag specifying the properties of the request. |
Supported properties of the request
Name | Type | Required | Description |
---|---|---|---|
StorageContainers | dynamic | ✔️ | An array of storage connection strings that represent prefix URI for stored data artifacts. |
DataFormat | string | ✔️ | One of the supported Data formats supported for ingestion |
FileExtension | string | If specified, the function only scans files ending with this file extension. Specifying the extension may speed up the process or eliminate data reading issues. | |
FileNamePrefix | string | If specified, the function only scans files starting with this prefix. Specifying the prefix may speed up the process. | |
Mode | string | The schema inference strategy. A value of: any , last , all . The function infers the data schema from the first found file, from the last written file, or from all files respectively. The default value is last . | |
InferenceOptions | dynamic | More inference options. Valid options: UseFirstRowAsHeader for delimited file formats. For example, 'InferenceOptions': {'UseFirstRowAsHeader': true} . |
Returns
The infer_storage_schema_with_suggestions
plugin returns a single result table containing a single row/column containing a JSON string.
Example
let options = dynamic({
'StorageContainers': [
h@'https://storageaccount.blob.core.windows.net/MobileEvents;secretKey'
],
'FileExtension': '.json',
'FileNamePrefix': 'js-',
'DataFormat': 'json'
});
evaluate infer_storage_schema_with_suggestions(options)
Example input data
{
"source": "DataExplorer",
"created_at": "2022-04-10 15:47:57",
"author_id": 739144091473215488,
"time_millisec":1547083647000
}
Output
{
"Columns": [
{
"OriginalColumn": {
"Name": "source",
"CslType": {
"type": "string",
"IsNumeric": false,
"IsSummable": false
}
},
"RecommendedColumn": {
"Name": "source",
"CslType": {
"type": "string",
"IsNumeric": false,
"IsSummable": false
}
},
"ApplicableTransformationMapping": "None"
},
{
"OriginalColumn": {
"Name": "created_at",
"CslType": {
"type": "datetime",
"IsNumeric": false,
"IsSummable": true
}
},
"RecommendedColumn": {
"Name": "created_at",
"CslType": {
"type": "datetime",
"IsNumeric": false,
"IsSummable": true
}
},
"ApplicableTransformationMapping": "None"
},
{
"OriginalColumn": {
"Name": "author_id",
"CslType": {
"type": "long",
"IsNumeric": true,
"IsSummable": true
}
},
"RecommendedColumn": {
"Name": "author_id",
"CslType": {
"type": "string",
"IsNumeric": false,
"IsSummable": false
}
},
"ApplicableTransformationMapping": "None"
},
{
"OriginalColumn": {
"Name": "time_millisec",
"CslType": {
"type": "long",
"IsNumeric": true,
"IsSummable": true
}
},
"RecommendedColumn": {
"Name": "time_millisec",
"CslType": {
"type": "datetime",
"IsNumeric": false,
"IsSummable": true
}
},
"ApplicableTransformationMapping": "DateTimeFromUnixMilliseconds"
}
]
}
Related content
2.4 - ipv4_lookup plugin
The ipv4_lookup
plugin looks up an IPv4 value in a lookup table and returns rows with matched values. The plugin is invoked with the evaluate
operator.
Syntax
T |
evaluate
ipv4_lookup(
LookupTable ,
SourceIPv4Key ,
IPv4LookupKey [,
ExtraKey1 [.. ,
ExtraKeyN [,
return_unmatched ]]] )
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose column SourceIPv4Key is used for IPv4 matching. |
LookupTable | string | ✔️ | Table or tabular expression with IPv4 lookup data, whose column LookupKey is used for IPv4 matching. IPv4 values can be masked using IP-prefix notation. |
SourceIPv4Key | string | ✔️ | The column of T with IPv4 string to be looked up in LookupTable. IPv4 values can be masked using IP-prefix notation. |
IPv4LookupKey | string | ✔️ | The column of LookupTable with IPv4 string that is matched against each SourceIPv4Key value. |
ExtraKey1 .. ExtraKeyN | string | Additional column references that are used for lookup matches. Similar to join operation: records with equal values are considered matching. Column name references must exist both is source table T and LookupTable . | |
return_unmatched | bool | A boolean flag that defines if the result should include all or only matching rows (default: false - only matching rows returned). |
Returns
The ipv4_lookup
plugin returns a result of join (lookup) based on IPv4 key. The schema of the table is the union of the source table and the lookup table, similar to the result of the lookup
operator.
If the return_unmatched argument is set to true
, the resulting table includes both matched and unmatched rows (filled with nulls).
If the return_unmatched argument is set to false
, or omitted (the default value of false
is used), the resulting table has as many records as matching results. This variant of lookup has better performance compared to return_unmatched=true
execution.
Examples
IPv4 lookup - matching rows only
// IP lookup table: IP_Data
// Partial data from: https://raw.githubusercontent.com/datasets/geoip2-ipv4/master/data/geoip2-ipv4.csv
let IP_Data = datatable(network:string, continent_code:string ,continent_name:string, country_iso_code:string, country_name:string)
[
"111.68.128.0/17","AS","Asia","JP","Japan",
"5.8.0.0/19","EU","Europe","RU","Russia",
"223.255.254.0/24","AS","Asia","SG","Singapore",
"46.36.200.51/32","OC","Oceania","CK","Cook Islands",
"2.20.183.0/24","EU","Europe","GB","United Kingdom",
];
let IPs = datatable(ip:string)
[
'2.20.183.12', // United Kingdom
'5.8.1.2', // Russia
'192.165.12.17', // Unknown
];
IPs
| evaluate ipv4_lookup(IP_Data, ip, network)
Output
ip | network | continent_code | continent_name | country_iso_code | country_name |
---|---|---|---|---|---|
2.20.183.12 | 2.20.183.0/24 | EU | Europe | GB | United Kingdom |
5.8.1.2 | 5.8.0.0/19 | EU | Europe | RU | Russia |
IPv4 lookup - return both matching and nonmatching rows
// IP lookup table: IP_Data
// Partial data from:
// https://raw.githubusercontent.com/datasets/geoip2-ipv4/master/data/geoip2-ipv4.csv
let IP_Data = datatable(network:string,continent_code:string ,continent_name:string ,country_iso_code:string ,country_name:string )
[
"111.68.128.0/17","AS","Asia","JP","Japan",
"5.8.0.0/19","EU","Europe","RU","Russia",
"223.255.254.0/24","AS","Asia","SG","Singapore",
"46.36.200.51/32","OC","Oceania","CK","Cook Islands",
"2.20.183.0/24","EU","Europe","GB","United Kingdom",
];
let IPs = datatable(ip:string)
[
'2.20.183.12', // United Kingdom
'5.8.1.2', // Russia
'192.165.12.17', // Unknown
];
IPs
| evaluate ipv4_lookup(IP_Data, ip, network, return_unmatched = true)
Output
ip | network | continent_code | continent_name | country_iso_code | country_name |
---|---|---|---|---|---|
2.20.183.12 | 2.20.183.0/24 | EU | Europe | GB | United Kingdom |
5.8.1.2 | 5.8.0.0/19 | EU | Europe | RU | Russia |
192.165.12.17 |
IPv4 lookup - using source in external_data()
let IP_Data = external_data(network:string,geoname_id:long,continent_code:string,continent_name:string ,country_iso_code:string,country_name:string,is_anonymous_proxy:bool,is_satellite_provider:bool)
['https://raw.githubusercontent.com/datasets/geoip2-ipv4/master/data/geoip2-ipv4.csv'];
let IPs = datatable(ip:string)
[
'2.20.183.12', // United Kingdom
'5.8.1.2', // Russia
'192.165.12.17', // Sweden
];
IPs
| evaluate ipv4_lookup(IP_Data, ip, network, return_unmatched = true)
Output
ip | network | geoname_id | continent_code | continent_name | country_iso_code | country_name | is_anonymous_proxy | is_satellite_provider |
---|---|---|---|---|---|---|---|---|
2.20.183.12 | 2.20.183.0/24 | 2635167 | EU | Europe | GB | United Kingdom | 0 | 0 |
5.8.1.2 | 5.8.0.0/19 | 2017370 | EU | Europe | RU | Russia | 0 | 0 |
192.165.12.17 | 192.165.8.0/21 | 2661886 | EU | Europe | SE | Sweden | 0 | 0 |
IPv4 lookup - using extra columns for matching
let IP_Data = external_data(network:string,geoname_id:long,continent_code:string,continent_name:string ,country_iso_code:string,country_name:string,is_anonymous_proxy:bool,is_satellite_provider:bool)
['https://raw.githubusercontent.com/datasets/geoip2-ipv4/master/data/geoip2-ipv4.csv'];
let IPs = datatable(ip:string, continent_name:string, country_iso_code:string)
[
'2.20.183.12', 'Europe', 'GB', // United Kingdom
'5.8.1.2', 'Europe', 'RU', // Russia
'192.165.12.17', 'Europe', '', // Sweden is 'SE' - so it won't be matched
];
IPs
| evaluate ipv4_lookup(IP_Data, ip, network, continent_name, country_iso_code)
Output
ip | continent_name | country_iso_code | network | geoname_id | continent_code | country_name | is_anonymous_proxy | is_satellite_provider |
---|---|---|---|---|---|---|---|---|
2.20.183.12 | Europe | GB | 2.20.183.0/24 | 2635167 | EU | United Kingdom | 0 | 0 |
5.8.1.2 | Europe | RU | 5.8.0.0/19 | 2017370 | EU | Russia | 0 | 0 |
Related content
- Overview of IPv4/IPv6 functions
- Overview of IPv4 text match functions
2.5 - ipv6_lookup plugin
The ipv6_lookup
plugin looks up an IPv6 value in a lookup table and returns rows with matched values. The plugin is invoked with the evaluate
operator.
Syntax
T |
evaluate
ipv6_lookup(
LookupTable ,
SourceIPv6Key ,
IPv6LookupKey [,
return_unmatched ] )
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input whose column SourceIPv6Key is used for IPv6 matching. |
LookupTable | string | ✔️ | Table or tabular expression with IPv6 lookup data, whose column LookupKey is used for IPv6 matching. IPv6 values can be masked using IP-prefix notation. |
SourceIPv6Key | string | ✔️ | The column of T with IPv6 string to be looked up in LookupTable. IPv6 values can be masked using IP-prefix notation. |
IPv6LookupKey | string | ✔️ | The column of LookupTable with IPv6 string that is matched against each SourceIPv6Key value. |
return_unmatched | bool | A boolean flag that defines if the result should include all or only matching rows (default: false - only matching rows returned). |
Returns
The ipv6_lookup
plugin returns a result of join (lookup) based on IPv6 key. The schema of the table is the union of the source table and the lookup table, similar to the result of the lookup
operator.
If the return_unmatched argument is set to true
, the resulting table includes both matched and unmatched rows (filled with nulls).
If the return_unmatched argument is set to false
, or omitted (the default value of false
is used), the resulting table has as many records as matching results. This variant of lookup has better performance compared to return_unmatched=true
execution.
Examples
IPv6 lookup - matching rows only
// IP lookup table: IP_Data (the data is generated by ChatGPT).
let IP_Data = datatable(network:string, continent_code:string ,continent_name:string, country_iso_code:string, country_name:string)
[
"2001:0db8:85a3::/48","NA","North America","US","United States",
"2404:6800:4001::/48","AS","Asia","JP","Japan",
"2a00:1450:4001::/48","EU","Europe","DE","Germany",
"2800:3f0:4001::/48","SA","South America","BR","Brazil",
"2c0f:fb50:4001::/48","AF","Africa","ZA","South Africa",
"2607:f8b0:4001::/48","NA","North America","CA","Canada",
"2a02:26f0:4001::/48","EU","Europe","FR","France",
"2400:cb00:4001::/48","AS","Asia","IN","India",
"2801:0db8:85a3::/48","SA","South America","AR","Argentina",
"2a03:2880:4001::/48","EU","Europe","GB","United Kingdom"
];
let IPs = datatable(ip:string)
[
"2001:0db8:85a3:0000:0000:8a2e:0370:7334", // United States
"2404:6800:4001:0001:0000:8a2e:0370:7334", // Japan
"2a02:26f0:4001:0006:0000:8a2e:0370:7334", // France
"a5e:f127:8a9d:146d:e102:b5d3:c755:abcd", // N/A
"a5e:f127:8a9d:146d:e102:b5d3:c755:abce" // N/A
];
IPs
| evaluate ipv6_lookup(IP_Data, ip, network)
Output
network | continent_code | continent_name | country_iso_code | country_name | ip |
---|---|---|---|---|---|
2001:0db8:85a3::/48 | NA | North America | US | United States | 2001:0db8:85a3:0000:0000:8a2e:0370:7334 |
2404:6800:4001::/48 | AS | Asia | JP | Japan | 2404:6800:4001:0001:0000:8a2e:0370:7334 |
2a02:26f0:4001::/48 | EU | Europe | FR | France | 2a02:26f0:4001:0006:0000:8a2e:0370:7334 |
IPv6 lookup - return both matching and nonmatching rows
// IP lookup table: IP_Data (the data is generated by ChatGPT).
let IP_Data = datatable(network:string, continent_code:string ,continent_name:string, country_iso_code:string, country_name:string)
[
"2001:0db8:85a3::/48","NA","North America","US","United States",
"2404:6800:4001::/48","AS","Asia","JP","Japan",
"2a00:1450:4001::/48","EU","Europe","DE","Germany",
"2800:3f0:4001::/48","SA","South America","BR","Brazil",
"2c0f:fb50:4001::/48","AF","Africa","ZA","South Africa",
"2607:f8b0:4001::/48","NA","North America","CA","Canada",
"2a02:26f0:4001::/48","EU","Europe","FR","France",
"2400:cb00:4001::/48","AS","Asia","IN","India",
"2801:0db8:85a3::/48","SA","South America","AR","Argentina",
"2a03:2880:4001::/48","EU","Europe","GB","United Kingdom"
];
let IPs = datatable(ip:string)
[
"2001:0db8:85a3:0000:0000:8a2e:0370:7334", // United States
"2404:6800:4001:0001:0000:8a2e:0370:7334", // Japan
"2a02:26f0:4001:0006:0000:8a2e:0370:7334", // France
"a5e:f127:8a9d:146d:e102:b5d3:c755:abcd", // N/A
"a5e:f127:8a9d:146d:e102:b5d3:c755:abce" // N/A
];
IPs
| evaluate ipv6_lookup(IP_Data, ip, network, true)
Output
network | continent_code | continent_name | country_iso_code | country_name | ip |
---|---|---|---|---|---|
2001:0db8:85a3::/48 | NA | North America | US | United States | 2001:0db8:85a3:0000:0000:8a2e:0370:7334 |
2404:6800:4001::/48 | AS | Asia | JP | Japan | 2404:6800:4001:0001:0000:8a2e:0370:7334 |
2a02:26f0:4001::/48 | EU | Europe | FR | France | 2a02:26f0:4001:0006:0000:8a2e:0370:7334 |
a5e:f127:8a9d:146d:e102:b5d3:c755:abcd | |||||
a5e:f127:8a9d:146d:e102:b5d3:c755:abce |
Related content
- Overview of IPv6/IPv6 functions
2.6 - preview plugin
Returns a table with up to the specified number of rows from the input record set, and the total number of records in the input record set.
Syntax
T |
evaluate
preview(
NumberOfRows)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The table to preview. |
NumberOfRows | int | ✔️ | The number of rows to preview from the table. |
Returns
The preview
plugin returns two result tables:
- A table with up to the specified number of rows.
For example, the sample query above is equivalent to running
T | take 50
. - A table with a single row/column, holding the number of records in the
input record set.
For example, the sample query above is equivalent to running
T | count
.
Example
StormEvents | evaluate preview(5)
Table1
The following output table only includes the first 6 columns. To see the full result, run the query.
|StartTime|EndTime|EpisodeId|EventId|State|EventType|…| |–|–|–| |2007-12-30T16:00:00Z|2007-12-30T16:05:00Z|11749|64588|GEORGIA| Thunderstorm Wind|…| |2007-12-20T07:50:00Z|2007-12-20T07:53:00Z|12554|68796|MISSISSIPPI| Thunderstorm Wind|…| |2007-09-29T08:11:00Z|2007-09-29T08:11:00Z|11091|61032|ATLANTIC SOUTH| Waterspout|…| |2007-09-20T21:57:00Z|2007-09-20T22:05:00Z|11078|60913|FLORIDA| Tornado|…| |2007-09-18T20:00:00Z|2007-09-19T18:00:00Z|11074|60904|FLORIDA| Heavy Rain|…|
Table2
Count |
---|
59066 |
2.7 - schema_merge plugin
Merges tabular schema definitions into a unified schema.
Schema definitions are expected to be in the format produced by the getschema
operator.
The schema merge
operation joins columns in input schemas and tries to reduce
data types to common ones. If data types can’t be reduced, an error is displayed on the problematic column.
The plugin is invoked with the evaluate
operator.
Syntax
T
|
evaluate
schema_merge(
PreserveOrder)
Parameters
Name | Type | Required | Description |
---|---|---|---|
PreserveOrder | bool | When set to true , directs the plugin to validate the column order as defined by the first tabular schema that is kept. If the same column is in several schemas, the column ordinal must be like the column ordinal of the first schema that it appeared in. Default value is true . |
Returns
The schema_merge
plugin returns output similar to what getschema
operator returns.
Examples
Merge with a schema that has a new column appended.
let schema1 = datatable(Uri:string, HttpStatus:int)[] | getschema;
let schema2 = datatable(Uri:string, HttpStatus:int, Referrer:string)[] | getschema;
union schema1, schema2 | evaluate schema_merge()
Output
ColumnName | ColumnOrdinal | DataType | ColumnType |
---|---|---|---|
Uri | 0 | System.String | string |
HttpStatus | 1 | System.Int32 | int |
Referrer | 2 | System.String | string |
Merge with a schema that has different column ordering (HttpStatus
ordinal changes from 1
to 2
in the new variant).
let schema1 = datatable(Uri:string, HttpStatus:int)[] | getschema;
let schema2 = datatable(Uri:string, Referrer:string, HttpStatus:int)[] | getschema;
union schema1, schema2 | evaluate schema_merge()
Output
ColumnName | ColumnOrdinal | DataType | ColumnType |
---|---|---|---|
Uri | 0 | System.String | string |
Referrer | 1 | System.String | string |
HttpStatus | -1 | ERROR(unknown CSL type:ERROR(columns are out of order)) | ERROR(columns are out of order) |
Merge with a schema that has different column ordering, but with PreserveOrder
set to false
.
let schema1 = datatable(Uri:string, HttpStatus:int)[] | getschema;
let schema2 = datatable(Uri:string, Referrer:string, HttpStatus:int)[] | getschema;
union schema1, schema2 | evaluate schema_merge(PreserveOrder = false)
Output
ColumnName | ColumnOrdinal | DataType | ColumnType |
---|---|---|---|
Uri | 0 | System.String | string |
Referrer | 1 | System.String | string |
HttpStatus | 2 | System.Int32 | int |
3 - Language plugins
3.1 - Python plugin
3.2 - Python plugin packages
This article lists the available Python packages in the Python plugin. For more information, see Python plugin.
3.11.7 (Preview)
Python engine 3.11.7 + common data science and ML packages
Package | Version |
---|---|
annotated-types | 0.6.0 |
anytree | 2.12.1 |
arrow | 1.3.0 |
attrs | 23.2.0 |
blinker | 1.7.0 |
blis | 0.7.11 |
Bottleneck | 1.3.8 |
Brotli | 1.1.0 |
brotlipy | 0.7.0 |
catalogue | 2.0.10 |
certifi | 2024.2.2 |
cffi | 1.16.0 |
chardet | 5.2.0 |
charset-normalizer | 3.3.2 |
click | 8.1.7 |
cloudpathlib | 0.16.0 |
cloudpickle | 3.0.0 |
colorama | 0.4.6 |
coloredlogs | 15.0.1 |
confection | 0.1.4 |
contourpy | 1.2.1 |
cycler | 0.12.1 |
cymem | 2.0.8 |
Cython | 3.0.10 |
daal | 2024.3.0 |
daal4py | 2024.3.0 |
dask | 2024.4.2 |
diff-match-patch | 20230430 |
dill | 0.3.8 |
distributed | 2024.4.2 |
filelock | 3.13.4 |
flashtext | 2.7 |
Flask | 3.0.3 |
Flask-Compress | 1.15 |
flatbuffers | 24.3.25 |
fonttools | 4.51.0 |
fsspec | 2024.3.1 |
gensim | 4.3.2 |
humanfriendly | 10.0 |
idna | 3.7 |
importlib_metadata | 7.1.0 |
intervaltree | 3.1.0 |
itsdangerous | 2.2.0 |
jellyfish | 1.0.3 |
Jinja2 | 3.1.3 |
jmespath | 1.0.1 |
joblib | 1.4.0 |
json5 | 0.9.25 |
jsonschema | 4.21.1 |
jsonschema-specifications | 2023.12.1 |
kiwisolver | 1.4.5 |
langcodes | 3.4.0 |
language_data | 1.2.0 |
locket | 1.0.0 |
lxml | 5.2.1 |
marisa-trie | 1.1.0 |
MarkupSafe | 2.1.5 |
mlxtend | 0.23.1 |
mpmath | 1.3.0 |
msgpack | 1.0.8 |
murmurhash | 1.0.10 |
networkx | 3.3 |
nltk | 3.8.1 |
numpy | 1.26.4 |
onnxruntime | 1.17.3 |
packaging | 24.0 |
pandas | 2.2.2 |
partd | 1.4.1 |
patsy | 0.5.6 |
pillow | 10.3.0 |
platformdirs | 4.2.1 |
plotly | 5.21.0 |
preshed | 3.0.9 |
protobuf | 5.26.1 |
psutil | 5.9.8 |
pycparser | 2.22 |
pydantic | 2.7.1 |
pydantic_core | 2.18.2 |
pyfpgrowth | 1.0 |
pyparsing | 3.1.2 |
pyreadline3 | 3.4.1 |
python-dateutil | 2.9.0.post0 |
pytz | 2024.1 |
PyWavelets | 1.6.0 |
PyYAML | 6.0.1 |
queuelib | 1.6.2 |
referencing | 0.35.0 |
regex | 2024.4.16 |
requests | 2.31.0 |
requests-file | 2.0.0 |
rpds-py | 0.18.0 |
scikit-learn | 1.4.2 |
scipy | 1.13.0 |
sip | 6.8.3 |
six | 1.16.0 |
smart-open | 6.4.0 |
snowballstemmer | 2.2.0 |
sortedcollections | 2.1.0 |
sortedcontainers | 2.4.0 |
spacy | 3.7.4 |
spacy-legacy | 3.0.12 |
spacy-loggers | 1.0.5 |
srsly | 2.4.8 |
statsmodels | 0.14.2 |
sympy | 1.12 |
tbb | 2021.12.0 |
tblib | 3.0.0 |
tenacity | 8.2.3 |
textdistance | 4.6.2 |
thinc | 8.2.3 |
threadpoolctl | 3.4.0 |
three-merge | 0.1.1 |
tldextract | 5.1.2 |
toolz | 0.12.1 |
tornado | 6.4 |
tqdm | 4.66.2 |
typer | 0.9.4 |
types-python-dateutil | 2.9.0.20240316 |
typing_extensions | 4.11.0 |
tzdata | 2024.1 |
ujson | 5.9.0 |
Unidecode | 1.3.8 |
urllib3 | 2.2.1 |
wasabi | 1.1.2 |
weasel | 0.3.4 |
Werkzeug | 3.0.2 |
xarray | 2024.3.0 |
zict | 3.0.0 |
zipp | 3.18.1 |
zstandard | 0.22.0 |
3.11.7 DL (Preview)
Python engine 3.11.7 + common data science and ML packages + deep learning packages (tensorflow & torch)
Package | Version |
---|---|
absl-py | 2.1.0 |
alembic | 1.13.1 |
aniso8601 | 9.0.1 |
annotated-types | 0.6.0 |
anytree | 2.12.1 |
arch | 7.0.0 |
arrow | 1.3.0 |
astunparse | 1.6.3 |
attrs | 23.2.0 |
blinker | 1.7.0 |
blis | 0.7.11 |
Bottleneck | 1.3.8 |
Brotli | 1.1.0 |
brotlipy | 0.7.0 |
cachetools | 5.3.3 |
catalogue | 2.0.10 |
certifi | 2024.2.2 |
cffi | 1.16.0 |
chardet | 5.2.0 |
charset-normalizer | 3.3.2 |
click | 8.1.7 |
cloudpathlib | 0.16.0 |
cloudpickle | 3.0.0 |
colorama | 0.4.6 |
coloredlogs | 15.0.1 |
confection | 0.1.4 |
contourpy | 1.2.1 |
cycler | 0.12.1 |
cymem | 2.0.8 |
Cython | 3.0.10 |
daal | 2024.3.0 |
daal4py | 2024.3.0 |
dask | 2024.4.2 |
Deprecated | 1.2.14 |
diff-match-patch | 20230430 |
dill | 0.3.8 |
distributed | 2024.4.2 |
docker | 7.1.0 |
entrypoints | 0.4 |
filelock | 3.13.4 |
flashtext | 2.7 |
Flask | 3.0.3 |
Flask-Compress | 1.15 |
flatbuffers | 24.3.25 |
fonttools | 4.51.0 |
fsspec | 2024.3.1 |
gast | 0.5.4 |
gensim | 4.3.2 |
gitdb | 4.0.11 |
GitPython | 3.1.43 |
google-pasta | 0.2.0 |
graphene | 3.3 |
graphql-core | 3.2.3 |
graphql-relay | 3.2.0 |
greenlet | 3.0.3 |
grpcio | 1.64.0 |
h5py | 3.11.0 |
humanfriendly | 10.0 |
idna | 3.7 |
importlib-metadata | 7.0.0 |
iniconfig | 2.0.0 |
intervaltree | 3.1.0 |
itsdangerous | 2.2.0 |
jellyfish | 1.0.3 |
Jinja2 | 3.1.3 |
jmespath | 1.0.1 |
joblib | 1.4.0 |
json5 | 0.9.25 |
jsonschema | 4.21.1 |
jsonschema-specifications | 2023.12.1 |
keras | 3.3.3 |
kiwisolver | 1.4.5 |
langcodes | 3.4.0 |
language_data | 1.2.0 |
libclang | 18.1.1 |
locket | 1.0.0 |
lxml | 5.2.1 |
Mako | 1.3.5 |
marisa-trie | 1.1.0 |
Markdown | 3.6 |
markdown-it-py | 3.0.0 |
MarkupSafe | 2.1.5 |
mdurl | 0.1.2 |
ml-dtypes | 0.3.2 |
mlflow | 2.13.0 |
mlxtend | 0.23.1 |
mpmath | 1.3.0 |
msgpack | 1.0.8 |
murmurhash | 1.0.10 |
namex | 0.0.8 |
networkx | 3.3 |
nltk | 3.8.1 |
numpy | 1.26.4 |
onnxruntime | 1.17.3 |
opentelemetry-api | 1.24.0 |
opentelemetry-sdk | 1.24.0 |
opentelemetry-semantic-conventions | 0.45b0 |
opt-einsum | 3.3.0 |
optree | 0.11.0 |
packaging | 24.0 |
pandas | 2.2.2 |
partd | 1.4.1 |
patsy | 0.5.6 |
pillow | 10.3.0 |
platformdirs | 4.2.1 |
plotly | 5.21.0 |
pluggy | 1.5.0 |
preshed | 3.0.9 |
protobuf | 4.25.3 |
psutil | 5.9.8 |
pyarrow | 15.0.2 |
pycparser | 2.22 |
pydantic | 2.7.1 |
pydantic_core | 2.18.2 |
pyfpgrowth | 1.0 |
Pygments | 2.18.0 |
pyparsing | 3.1.2 |
pyreadline3 | 3.4.1 |
pytest | 8.2.1 |
python-dateutil | 2.9.0.post0 |
pytz | 2024.1 |
PyWavelets | 1.6.0 |
pywin32 | 306 |
PyYAML | 6.0.1 |
querystring-parser | 1.2.4 |
queuelib | 1.6.2 |
referencing | 0.35.0 |
regex | 2024.4.16 |
requests | 2.31.0 |
requests-file | 2.0.0 |
rich | 13.7.1 |
rpds-py | 0.18.0 |
rstl | 0.1.3 |
scikit-learn | 1.4.2 |
scipy | 1.13.0 |
seasonal | 0.3.1 |
sip | 6.8.3 |
six | 1.16.0 |
smart-open | 6.4.0 |
smmap | 5.0.1 |
snowballstemmer | 2.2.0 |
sortedcollections | 2.1.0 |
sortedcontainers | 2.4.0 |
spacy | 3.7.4 |
spacy-legacy | 3.0.12 |
spacy-loggers | 1.0.5 |
SQLAlchemy | 2.0.30 |
sqlparse | 0.5.0 |
srsly | 2.4.8 |
statsmodels | 0.14.2 |
sympy | 1.12 |
tbb | 2021.12.0 |
tblib | 3.0.0 |
tenacity | 8.2.3 |
tensorboard | 2.16.2 |
tensorboard-data-server | 0.7.2 |
tensorflow | 2.16.1 |
tensorflow-intel | 2.16.1 |
tensorflow-io-gcs-filesystem | 0.31.0 |
termcolor | 2.4.0 |
textdistance | 4.6.2 |
thinc | 8.2.3 |
threadpoolctl | 3.4.0 |
three-merge | 0.1.1 |
time-series-anomaly-detector | 0.2.7 |
tldextract | 5.1.2 |
toolz | 0.12.1 |
torch | 2.2.2 |
torchaudio | 2.2.2 |
torchvision | 0.17.2 |
tornado | 6.4 |
tqdm | 4.66.2 |
typer | 0.9.4 |
types-python-dateutil | 2.9.0.20240316 |
typing_extensions | 4.11.0 |
tzdata | 2024.1 |
ujson | 5.9.0 |
Unidecode | 1.3.8 |
urllib3 | 2.2.1 |
waitress | 3.0.0 |
wasabi | 1.1.2 |
weasel | 0.3.4 |
Werkzeug | 3.0.2 |
wrapt | 1.16.0 |
xarray | 2024.3.0 |
zict | 3.0.0 |
zipp | 3.18.1 |
zstandard | 0.22.0 |
3.10.8
Python engine 3.10.8 + common data science and ML packages
Package | Version |
---|---|
alembic | 1.11.1 |
anytree | 2.8.0 |
arrow | 1.2.3 |
attrs | 22.2.0 |
blis | 0.7.9 |
Bottleneck | 1.3.5 |
Brotli | 1.0.9 |
brotlipy | 0.7.0 |
catalogue | 2.0.8 |
certifi | 2022.12.7 |
cffi | 1.15.1 |
chardet | 5.0.0 |
charset-normalizer | 2.1.1 |
click | 8.1.3 |
cloudpickle | 2.2.1 |
colorama | 0.4.6 |
coloredlogs | 15.0.1 |
confection | 0.0.4 |
contourpy | 1.0.7 |
cycler | 0.11.0 |
cymem | 2.0.7 |
Cython | 0.29.28 |
daal | 2021.6.0 |
daal4py | 2021.6.3 |
dask | 2022.10.2 |
databricks-cli | 0.17.7 |
diff-match-patch | 20200713 |
dill | 0.3.6 |
distributed | 2022.10.2 |
docker | 6.1.3 |
entrypoints | 0.4 |
filelock | 3.9.1 |
flashtext | 2.7 |
Flask | 2.2.3 |
Flask-Compress | 1.13 |
flatbuffers | 23.3.3 |
fonttools | 4.39.0 |
fsspec | 2023.3.0 |
gensim | 4.2.0 |
gitdb | 4.0.10 |
GitPython | 3.1.31 |
greenlet | 2.0.2 |
HeapDict | 1.0.1 |
humanfriendly | 10.0 |
idna | 3.4 |
importlib-metadata | 6.7.0 |
intervaltree | 3.1.0 |
itsdangerous | 2.1.2 |
jellyfish | 0.9.0 |
Jinja2 | 3.1.2 |
jmespath | 1.0.1 |
joblib | 1.2.0 |
json5 | 0.9.10 |
jsonschema | 4.16.0 |
kiwisolver | 1.4.4 |
langcodes | 3.3.0 |
locket | 1.0.0 |
lxml | 4.9.1 |
Mako | 1.2.4 |
Markdown | 3.4.3 |
MarkupSafe | 2.1.2 |
mlflow | 2.4.1 |
mlxtend | 0.21.0 |
mpmath | 1.3.0 |
msgpack | 1.0.5 |
murmurhash | 1.0.9 |
networkx | 2.8.7 |
nltk | 3.7 |
numpy | 1.23.4 |
oauthlib | 3.2.2 |
onnxruntime | 1.13.1 |
packaging | 23.0 |
pandas | 1.5.1 |
partd | 1.3.0 |
pathy | 0.10.1 |
patsy | 0.5.3 |
Pillow | 9.4.0 |
pip | 23.0.1 |
platformdirs | 2.5.2 |
plotly | 5.11.0 |
ply | 3.11 |
preshed | 3.0.8 |
protobuf | 4.22.1 |
psutil | 5.9.3 |
pyarrow | 12.0.1 |
pycparser | 2.21 |
pydantic | 1.10.6 |
pyfpgrowth | 1.0 |
PyJWT | 2.7.0 |
pyparsing | 3.0.9 |
pyreadline3 | 3.4.1 |
pyrsistent | 0.19.3 |
python-dateutil | 2.8.2 |
pytz | 2022.7.1 |
PyWavelets | 1.4.1 |
pywin32 | 306 |
PyYAML | 6.0 |
querystring-parser | 1.2.4 |
queuelib | 1.6.2 |
regex | 2022.10.31 |
requests | 2.28.2 |
requests-file | 1.5.1 |
scikit-learn | 1.1.3 |
scipy | 1.9.3 |
setuptools | 67.6.0 |
sip | 6.7.3 |
six | 1.16.0 |
smart-open | 6.3.0 |
smmap | 5.0.0 |
snowballstemmer | 2.2.0 |
sortedcollections | 2.1.0 |
sortedcontainers | 2.4.0 |
spacy | 3.4.2 |
spacy-legacy | 3.0.12 |
spacy-loggers | 1.0.4 |
SQLAlchemy | 2.0.18 |
sqlparse | 0.4.4 |
srsly | 2.4.5 |
statsmodels | 0.13.2 |
sympy | 1.11.1 |
tabulate | 0.9.0 |
tbb | 2021.7.1 |
tblib | 1.7.0 |
tenacity | 8.2.2 |
textdistance | 4.5.0 |
thinc | 8.1.9 |
threadpoolctl | 3.1.0 |
three-merge | 0.1.1 |
tldextract | 3.4.0 |
toml | 0.10.2 |
toolz | 0.12.0 |
tornado | 6.1 |
tqdm | 4.65.0 |
typer | 0.4.2 |
typing_extensions | 4.5.0 |
ujson | 5.5.0 |
Unidecode | 1.3.6 |
urllib3 | 1.26.15 |
waitress | 2.1.2 |
wasabi | 0.10.1 |
websocket-client | 1.6.1 |
Werkzeug | 2.2.3 |
wheel | 0.40.0 |
xarray | 2022.10.0 |
zict | 2.2.0 |
zipp | 3.15.0 |
3.10.8 DL
Not supported
3.6.5 (Legacy)
Not supported
This article lists the available managed Python packages in the Python plugin. For more information, see Python plugin.
To create a custom image, see Create a custom image.
3.11.7 (Preview)
Python engine 3.11.7 + common data science and ML packages
Package | Version |
---|---|
annotated-types | 0.6.0 |
anytree | 2.12.1 |
arrow | 1.3.0 |
attrs | 23.2.0 |
blinker | 1.7.0 |
blis | 0.7.11 |
Bottleneck | 1.3.8 |
Brotli | 1.1.0 |
brotlipy | 0.7.0 |
catalogue | 2.0.10 |
certifi | 2024.2.2 |
cffi | 1.16.0 |
chardet | 5.2.0 |
charset-normalizer | 3.3.2 |
click | 8.1.7 |
cloudpathlib | 0.16.0 |
cloudpickle | 3.0.0 |
colorama | 0.4.6 |
coloredlogs | 15.0.1 |
confection | 0.1.4 |
contourpy | 1.2.1 |
cycler | 0.12.1 |
cymem | 2.0.8 |
Cython | 3.0.10 |
daal | 2024.3.0 |
daal4py | 2024.3.0 |
dask | 2024.4.2 |
diff-match-patch | 20230430 |
dill | 0.3.8 |
distributed | 2024.4.2 |
filelock | 3.13.4 |
flashtext | 2.7 |
Flask | 3.0.3 |
Flask-Compress | 1.15 |
flatbuffers | 24.3.25 |
fonttools | 4.51.0 |
fsspec | 2024.3.1 |
gensim | 4.3.2 |
humanfriendly | 10.0 |
idna | 3.7 |
importlib_metadata | 7.1.0 |
intervaltree | 3.1.0 |
itsdangerous | 2.2.0 |
jellyfish | 1.0.3 |
Jinja2 | 3.1.3 |
jmespath | 1.0.1 |
joblib | 1.4.0 |
json5 | 0.9.25 |
jsonschema | 4.21.1 |
jsonschema-specifications | 2023.12.1 |
kiwisolver | 1.4.5 |
langcodes | 3.4.0 |
language_data | 1.2.0 |
locket | 1.0.0 |
lxml | 5.2.1 |
marisa-trie | 1.1.0 |
MarkupSafe | 2.1.5 |
matplotlib | 3.8.4 |
mlxtend | 0.23.1 |
mpmath | 1.3.0 |
msgpack | 1.0.8 |
murmurhash | 1.0.10 |
networkx | 3.3 |
nltk | 3.8.1 |
numpy | 1.26.4 |
onnxruntime | 1.17.3 |
packaging | 24.0 |
pandas | 2.2.2 |
partd | 1.4.1 |
patsy | 0.5.6 |
pillow | 10.3.0 |
platformdirs | 4.2.1 |
plotly | 5.21.0 |
preshed | 3.0.9 |
protobuf | 5.26.1 |
psutil | 5.9.8 |
pycparser | 2.22 |
pydantic | 2.7.1 |
pydantic_core | 2.18.2 |
pyfpgrowth | 1.0 |
pyparsing | 3.1.2 |
pyreadline3 | 3.4.1 |
python-dateutil | 2.9.0.post0 |
pytz | 2024.1 |
PyWavelets | 1.6.0 |
PyYAML | 6.0.1 |
queuelib | 1.6.2 |
referencing | 0.35.0 |
regex | 2024.4.16 |
requests | 2.31.0 |
requests-file | 2.0.0 |
rpds-py | 0.18.0 |
scikit-learn | 1.4.2 |
scipy | 1.13.0 |
sip | 6.8.3 |
six | 1.16.0 |
smart-open | 6.4.0 |
snowballstemmer | 2.2.0 |
sortedcollections | 2.1.0 |
sortedcontainers | 2.4.0 |
spacy | 3.7.4 |
spacy-legacy | 3.0.12 |
spacy-loggers | 1.0.5 |
srsly | 2.4.8 |
statsmodels | 0.14.2 |
sympy | 1.12 |
tbb | 2021.12.0 |
tblib | 3.0.0 |
tenacity | 8.2.3 |
textdistance | 4.6.2 |
thinc | 8.2.3 |
threadpoolctl | 3.4.0 |
three-merge | 0.1.1 |
tldextract | 5.1.2 |
toolz | 0.12.1 |
tornado | 6.4 |
tqdm | 4.66.2 |
typer | 0.9.4 |
types-python-dateutil | 2.9.0.20240316 |
typing_extensions | 4.11.0 |
tzdata | 2024.1 |
ujson | 5.9.0 |
Unidecode | 1.3.8 |
urllib3 | 2.2.1 |
wasabi | 1.1.2 |
weasel | 0.3.4 |
Werkzeug | 3.0.2 |
xarray | 2024.3.0 |
zict | 3.0.0 |
zipp | 3.18.1 |
zstandard | 0.22.0 |
3.11.7 DL (Preview)
Python engine 3.11.7 + common data science and ML packages + deep learning packages (tensorflow & torch)
Package | Version |
---|---|
absl-py | 2.1.0 |
alembic | 1.13.1 |
aniso8601 | 9.0.1 |
annotated-types | 0.6.0 |
anytree | 2.12.1 |
arch | 7.0.0 |
arrow | 1.3.0 |
astunparse | 1.6.3 |
attrs | 23.2.0 |
blinker | 1.7.0 |
blis | 0.7.11 |
Bottleneck | 1.3.8 |
Brotli | 1.1.0 |
brotlipy | 0.7.0 |
cachetools | 5.3.3 |
catalogue | 2.0.10 |
certifi | 2024.2.2 |
cffi | 1.16.0 |
chardet | 5.2.0 |
charset-normalizer | 3.3.2 |
click | 8.1.7 |
cloudpathlib | 0.16.0 |
cloudpickle | 3.0.0 |
colorama | 0.4.6 |
coloredlogs | 15.0.1 |
confection | 0.1.4 |
contourpy | 1.2.1 |
cycler | 0.12.1 |
cymem | 2.0.8 |
Cython | 3.0.10 |
daal | 2024.3.0 |
daal4py | 2024.3.0 |
dask | 2024.4.2 |
Deprecated | 1.2.14 |
diff-match-patch | 20230430 |
dill | 0.3.8 |
distributed | 2024.4.2 |
docker | 7.1.0 |
entrypoints | 0.4 |
filelock | 3.13.4 |
flashtext | 2.7 |
Flask | 3.0.3 |
Flask-Compress | 1.15 |
flatbuffers | 24.3.25 |
fonttools | 4.51.0 |
fsspec | 2024.3.1 |
gast | 0.5.4 |
gensim | 4.3.2 |
gitdb | 4.0.11 |
GitPython | 3.1.43 |
google-pasta | 0.2.0 |
graphene | 3.3 |
graphql-core | 3.2.3 |
graphql-relay | 3.2.0 |
greenlet | 3.0.3 |
grpcio | 1.64.0 |
h5py | 3.11.0 |
humanfriendly | 10.0 |
idna | 3.7 |
importlib-metadata | 7.0.0 |
iniconfig | 2.0.0 |
intervaltree | 3.1.0 |
itsdangerous | 2.2.0 |
jellyfish | 1.0.3 |
Jinja2 | 3.1.3 |
jmespath | 1.0.1 |
joblib | 1.4.0 |
json5 | 0.9.25 |
jsonschema | 4.21.1 |
jsonschema-specifications | 2023.12.1 |
keras | 3.3.3 |
kiwisolver | 1.4.5 |
langcodes | 3.4.0 |
language_data | 1.2.0 |
libclang | 18.1.1 |
locket | 1.0.0 |
lxml | 5.2.1 |
Mako | 1.3.5 |
marisa-trie | 1.1.0 |
Markdown | 3.6 |
markdown-it-py | 3.0.0 |
MarkupSafe | 2.1.5 |
matplotlib | 3.8.4 |
mdurl | 0.1.2 |
ml-dtypes | 0.3.2 |
mlflow | 2.13.0 |
mlxtend | 0.23.1 |
mpmath | 1.3.0 |
msgpack | 1.0.8 |
murmurhash | 1.0.10 |
namex | 0.0.8 |
networkx | 3.3 |
nltk | 3.8.1 |
numpy | 1.26.4 |
onnxruntime | 1.17.3 |
opentelemetry-api | 1.24.0 |
opentelemetry-sdk | 1.24.0 |
opentelemetry-semantic-conventions | 0.45b0 |
opt-einsum | 3.3.0 |
optree | 0.11.0 |
packaging | 24.0 |
pandas | 2.2.2 |
partd | 1.4.1 |
patsy | 0.5.6 |
pillow | 10.3.0 |
platformdirs | 4.2.1 |
plotly | 5.21.0 |
pluggy | 1.5.0 |
preshed | 3.0.9 |
protobuf | 4.25.3 |
psutil | 5.9.8 |
pyarrow | 15.0.2 |
pycparser | 2.22 |
pydantic | 2.7.1 |
pydantic_core | 2.18.2 |
pyfpgrowth | 1.0 |
Pygments | 2.18.0 |
pyparsing | 3.1.2 |
pyreadline3 | 3.4.1 |
pytest | 8.2.1 |
python-dateutil | 2.9.0.post0 |
pytz | 2024.1 |
PyWavelets | 1.6.0 |
pywin32 | 306 |
PyYAML | 6.0.1 |
querystring-parser | 1.2.4 |
queuelib | 1.6.2 |
referencing | 0.35.0 |
regex | 2024.4.16 |
requests | 2.31.0 |
requests-file | 2.0.0 |
rich | 13.7.1 |
rpds-py | 0.18.0 |
rstl | 0.1.3 |
scikit-learn | 1.4.2 |
scipy | 1.13.0 |
seasonal | 0.3.1 |
sip | 6.8.3 |
six | 1.16.0 |
smart-open | 6.4.0 |
smmap | 5.0.1 |
snowballstemmer | 2.2.0 |
sortedcollections | 2.1.0 |
sortedcontainers | 2.4.0 |
spacy | 3.7.4 |
spacy-legacy | 3.0.12 |
spacy-loggers | 1.0.5 |
SQLAlchemy | 2.0.30 |
sqlparse | 0.5.0 |
srsly | 2.4.8 |
statsmodels | 0.14.2 |
sympy | 1.12 |
tbb | 2021.12.0 |
tblib | 3.0.0 |
tenacity | 8.2.3 |
tensorboard | 2.16.2 |
tensorboard-data-server | 0.7.2 |
tensorflow | 2.16.1 |
tensorflow-intel | 2.16.1 |
tensorflow-io-gcs-filesystem | 0.31.0 |
termcolor | 2.4.0 |
textdistance | 4.6.2 |
thinc | 8.2.3 |
threadpoolctl | 3.4.0 |
three-merge | 0.1.1 |
time-series-anomaly-detector | 0.2.7 |
tldextract | 5.1.2 |
toolz | 0.12.1 |
torch | 2.2.2 |
torchaudio | 2.2.2 |
torchvision | 0.17.2 |
tornado | 6.4 |
tqdm | 4.66.2 |
typer | 0.9.4 |
types-python-dateutil | 2.9.0.20240316 |
typing_extensions | 4.11.0 |
tzdata | 2024.1 |
ujson | 5.9.0 |
Unidecode | 1.3.8 |
urllib3 | 2.2.1 |
waitress | 3.0.0 |
wasabi | 1.1.2 |
weasel | 0.3.4 |
Werkzeug | 3.0.2 |
wrapt | 1.16.0 |
xarray | 2024.3.0 |
zict | 3.0.0 |
zipp | 3.18.1 |
zstandard | 0.22.0 |
3.10.8
Python engine 3.10.8 + common data science and ML packages
Package | Version |
---|---|
alembic | 1.11.1 |
anytree | 2.8.0 |
arrow | 1.2.3 |
attrs | 22.2.0 |
blis | 0.7.9 |
Bottleneck | 1.3.5 |
Brotli | 1.0.9 |
brotlipy | 0.7.0 |
catalogue | 2.0.8 |
certifi | 2022.12.7 |
cffi | 1.15.1 |
chardet | 5.0.0 |
charset-normalizer | 2.1.1 |
click | 8.1.3 |
cloudpickle | 2.2.1 |
colorama | 0.4.6 |
coloredlogs | 15.0.1 |
confection | 0.0.4 |
contourpy | 1.0.7 |
cycler | 0.11.0 |
cymem | 2.0.7 |
Cython | 0.29.28 |
daal | 2021.6.0 |
daal4py | 2021.6.3 |
dask | 2022.10.2 |
databricks-cli | 0.17.7 |
diff-match-patch | 20200713 |
dill | 0.3.6 |
distributed | 2022.10.2 |
docker | 6.1.3 |
entrypoints | 0.4 |
filelock | 3.9.1 |
flashtext | 2.7 |
Flask | 2.2.3 |
Flask-Compress | 1.13 |
flatbuffers | 23.3.3 |
fonttools | 4.39.0 |
fsspec | 2023.3.0 |
gensim | 4.2.0 |
gitdb | 4.0.10 |
GitPython | 3.1.31 |
greenlet | 2.0.2 |
HeapDict | 1.0.1 |
humanfriendly | 10.0 |
idna | 3.4 |
importlib-metadata | 6.7.0 |
intervaltree | 3.1.0 |
itsdangerous | 2.1.2 |
jellyfish | 0.9.0 |
Jinja2 | 3.1.2 |
jmespath | 1.0.1 |
joblib | 1.2.0 |
json5 | 0.9.10 |
jsonschema | 4.16.0 |
kiwisolver | 1.4.4 |
langcodes | 3.3.0 |
locket | 1.0.0 |
lxml | 4.9.1 |
Mako | 1.2.4 |
Markdown | 3.4.3 |
MarkupSafe | 2.1.2 |
mlflow | 2.4.1 |
mlxtend | 0.21.0 |
mpmath | 1.3.0 |
msgpack | 1.0.5 |
murmurhash | 1.0.9 |
networkx | 2.8.7 |
nltk | 3.7 |
numpy | 1.23.4 |
oauthlib | 3.2.2 |
onnxruntime | 1.13.1 |
packaging | 23.0 |
pandas | 1.5.1 |
partd | 1.3.0 |
pathy | 0.10.1 |
patsy | 0.5.3 |
Pillow | 9.4.0 |
pip | 23.0.1 |
platformdirs | 2.5.2 |
plotly | 5.11.0 |
ply | 3.11 |
preshed | 3.0.8 |
protobuf | 4.22.1 |
psutil | 5.9.3 |
pyarrow | 12.0.1 |
pycparser | 2.21 |
pydantic | 1.10.6 |
pyfpgrowth | 1.0 |
PyJWT | 2.7.0 |
pyparsing | 3.0.9 |
pyreadline3 | 3.4.1 |
pyrsistent | 0.19.3 |
python-dateutil | 2.8.2 |
pytz | 2022.7.1 |
PyWavelets | 1.4.1 |
pywin32 | 306 |
PyYAML | 6.0 |
querystring-parser | 1.2.4 |
queuelib | 1.6.2 |
regex | 2022.10.31 |
requests | 2.28.2 |
requests-file | 1.5.1 |
scikit-learn | 1.1.3 |
scipy | 1.9.3 |
setuptools | 67.6.0 |
sip | 6.7.3 |
six | 1.16.0 |
smart-open | 6.3.0 |
smmap | 5.0.0 |
snowballstemmer | 2.2.0 |
sortedcollections | 2.1.0 |
sortedcontainers | 2.4.0 |
spacy | 3.4.2 |
spacy-legacy | 3.0.12 |
spacy-loggers | 1.0.4 |
SQLAlchemy | 2.0.18 |
sqlparse | 0.4.4 |
srsly | 2.4.5 |
statsmodels | 0.13.2 |
sympy | 1.11.1 |
tabulate | 0.9.0 |
tbb | 2021.7.1 |
tblib | 1.7.0 |
tenacity | 8.2.2 |
textdistance | 4.5.0 |
thinc | 8.1.9 |
threadpoolctl | 3.1.0 |
three-merge | 0.1.1 |
tldextract | 3.4.0 |
toml | 0.10.2 |
toolz | 0.12.0 |
tornado | 6.1 |
tqdm | 4.65.0 |
typer | 0.4.2 |
typing_extensions | 4.5.0 |
ujson | 5.5.0 |
Unidecode | 1.3.6 |
urllib3 | 1.26.15 |
waitress | 2.1.2 |
wasabi | 0.10.1 |
websocket-client | 1.6.1 |
Werkzeug | 2.2.3 |
wheel | 0.40.0 |
xarray | 2022.10.0 |
zict | 2.2.0 |
zipp | 3.15.0 |
3.10.8 DL
Python engine 3.10.8 + common data science and ML packages + deep learning packages (tensorflow & torch)
Package | Version |
---|---|
absl-py | 1.4.0 |
alembic | 1.11.1 |
anytree | 2.8.0 |
arrow | 1.2.3 |
astunparse | 1.6.3 |
attrs | 22.1.0 |
blis | 0.7.9 |
Bottleneck | 1.3.5 |
Brotli | 1.0.9 |
brotlipy | 0.7.0 |
cachetools | 5.3.0 |
catalogue | 2.0.8 |
certifi | 2022.9.24 |
cffi | 1.15.1 |
chardet | 5.0.0 |
charset-normalizer | 2.1.1 |
click | 8.1.3 |
cloudpickle | 2.2.0 |
colorama | 0.4.6 |
coloredlogs | 15.0.1 |
confection | 0.0.3 |
contourpy | 1.0.6 |
cycler | 0.11.0 |
cymem | 2.0.7 |
Cython | 0.29.28 |
daal | 2021.6.0 |
daal4py | 2021.6.3 |
dask | 2022.10.2 |
databricks-cli | 0.17.7 |
diff-match-patch | 20200713 |
dill | 0.3.6 |
distributed | 2022.10.2 |
docker | 6.1.3 |
entrypoints | 0.4 |
filelock | 3.8.0 |
flashtext | 2.7 |
Flask | 2.2.2 |
Flask-Compress | 1.13 |
flatbuffers | 22.10.26 |
fonttools | 4.38.0 |
fsspec | 2022.10.0 |
gast | 0.4.0 |
gensim | 4.2.0 |
gitdb | 4.0.10 |
GitPython | 3.1.31 |
google-auth | 2.16.2 |
google-auth-oauthlib | 0.4.6 |
google-pasta | 0.2.0 |
greenlet | 2.0.2 |
grpcio | 1.51.3 |
h5py | 3.8.0 |
HeapDict | 1.0.1 |
humanfriendly | 10.0 |
idna | 3.4 |
importlib-metadata | 6.7.0 |
intervaltree | 3.1.0 |
itsdangerous | 2.1.2 |
jax | 0.4.6 |
jellyfish | 0.9.0 |
Jinja2 | 3.1.2 |
jmespath | 1.0.1 |
joblib | 1.2.0 |
json5 | 0.9.10 |
jsonschema | 4.16.0 |
keras | 2.12.0 |
kiwisolver | 1.4.4 |
langcodes | 3.3.0 |
libclang | 16.0.0 |
locket | 1.0.0 |
lxml | 4.9.1 |
Mako | 1.2.4 |
Markdown | 3.4.2 |
MarkupSafe | 2.1.1 |
mlflow | 2.4.1 |
mlxtend | 0.21.0 |
mpmath | 1.2.1 |
msgpack | 1.0.4 |
murmurhash | 1.0.9 |
networkx | 2.8.7 |
nltk | 3.7 |
numpy | 1.23.4 |
oauthlib | 3.2.2 |
onnxruntime | 1.13.1 |
opt-einsum | 3.3.0 |
packaging | 21.3 |
pandas | 1.5.1 |
partd | 1.3.0 |
pathy | 0.6.2 |
patsy | 0.5.3 |
Pillow | 9.3.0 |
pip | 23.0.1 |
platformdirs | 2.5.2 |
plotly | 5.11.0 |
ply | 3.11 |
preshed | 3.0.8 |
protobuf | 4.21.9 |
psutil | 5.9.3 |
pyarrow | 12.0.1 |
pyasn1 | 0.4.8 |
pyasn1-modules | 0.2.8 |
pycparser | 2.21 |
pydantic | 1.10.2 |
pyfpgrowth | 1.0 |
PyJWT | 2.7.0 |
pyparsing | 3.0.9 |
pyreadline3 | 3.4.1 |
pyrsistent | 0.19.1 |
python-dateutil | 2.8.2 |
pytz | 2022.5 |
PyWavelets | 1.4.1 |
pywin32 | 306 |
PyYAML | 6.0 |
querystring-parser | 1.2.4 |
queuelib | 1.6.2 |
regex | 2022.10.31 |
requests | 2.28.1 |
requests-file | 1.5.1 |
requests-oauthlib | 1.3.1 |
rsa | 4.9 |
scikit-learn | 1.1.3 |
scipy | 1.9.3 |
setuptools | 67.6.0 |
sip | 6.7.3 |
six | 1.16.0 |
smart-open | 5.2.1 |
smmap | 5.0.0 |
snowballstemmer | 2.2.0 |
sortedcollections | 2.1.0 |
sortedcontainers | 2.4.0 |
spacy | 3.4.2 |
spacy-legacy | 3.0.10 |
spacy-loggers | 1.0.3 |
SQLAlchemy | 2.0.18 |
sqlparse | 0.4.4 |
srsly | 2.4.5 |
statsmodels | 0.13.2 |
sympy | 1.11.1 |
tabulate | 0.9.0 |
tbb | 2021.7.0 |
tblib | 1.7.0 |
tenacity | 8.1.0 |
tensorboard | 2.12.0 |
tensorboard-data-server | 0.7.0 |
tensorboard-plugin-wit | 1.8.1 |
tensorflow | 2.12.0 |
tensorflow-estimator | 2.12.0 |
tensorflow-intel | 2.12.0 |
tensorflow-io-gcs-filesystem | 0.31.0 |
termcolor | 2.2.0 |
textdistance | 4.5.0 |
thinc | 8.1.5 |
threadpoolctl | 3.1.0 |
three-merge | 0.1.1 |
tldextract | 3.4.0 |
toml | 0.10.2 |
toolz | 0.12.0 |
torch | 2.0.0 |
torchaudio | 2.0.1 |
torchvision | 0.15.1 |
tornado | 6.1 |
tqdm | 4.64.1 |
typer | 0.4.2 |
typing_extensions | 4.4.0 |
ujson | 5.5.0 |
Unidecode | 1.3.6 |
urllib3 | 1.26.12 |
waitress | 2.1.2 |
wasabi | 0.10.1 |
websocket-client | 1.6.1 |
Werkzeug | 2.2.2 |
wheel | 0.40.0 |
wrapt | 1.14.1 |
xarray | 2022.10.0 |
zict | 2.2.0 |
zipp | 3.15.0 |
3.6.5 (Legacy)
Package | Version |
---|---|
adal | 1.2.0 |
anaconda_navigator | 1.8.7 |
anytree | 2.8.0 |
argparse | 1.1 |
asn1crypto | 0.24.0 |
astor | 0.7.1 |
astroid | 1.6.3 |
astropy | 3.0.2 |
attr | 18.1.0 |
babel | 2.5.3 |
backcall | 0.1.0 |
bitarray | 0.8.1 |
bleach | 2.1.3 |
bokeh | 0.12.16 |
boto | 2.48.0 |
boto3 | 1.9.109 |
botocore | 1.12.109 |
bottleneck | 1.2.1 |
bs4 | 4.6.0 |
certifi | 2018.04.16 |
cffi | 1.11.5 |
cgi | 2.6 |
chardet | 3.0.4 |
click | 6.7 |
cloudpickle | 0.5.3 |
clyent | 1.2.2 |
colorama | 0.3.9 |
conda | 4.5.4 |
conda_build | 3.10.5 |
conda_env | 4.5.4 |
conda_verify | 2.0.0 |
Crypto | 2.6.1 |
cryptography | 2.2.2 |
csv | 1 |
ctypes | 1.1.0 |
cycler | 0.10.0 |
cython | 0.28.2 |
Cython | 0.28.2 |
cytoolz | 0.9.0.1 |
dask | 0.17.5 |
datashape | 0.5.4 |
dateutil | 2.7.3 |
decimal | 1.7 |
decorator | 4.3.0 |
dill | 0.2.8.2 |
distributed | 1.21.8 |
distutils | 3.6.5 |
docutils | 0.14 |
entrypoints | 0.2.3 |
et_xmlfile | 1.0.1 |
fastcache | 1.0.2 |
filelock | 3.0.4 |
flask | 1.0.2 |
flask_cors | 3.0.4 |
future | 0.17.1 |
gensim | 3.7.1 |
geohash | 0.8.5 |
gevent | 1.3.0 |
glob2 | “(0, 6)” |
greenlet | 0.4.13 |
h5py | 2.7.1 |
html5lib | 1.0.1 |
idna | 2.6 |
imageio | 2.3.0 |
imaplib | 2.58 |
ipaddress | 1 |
IPython | 6.4.0 |
ipython_genutils | 0.2.0 |
isort | 4.3.4 |
jdcal | 1.4 |
jedi | 0.12.0 |
jinja2 | 2.1 |
jmespath | 0.9.4 |
joblib | 0.13.0 |
json | 2.0.9 |
jsonschema | 2.6.0 |
jupyter_core | 4.4.0 |
jupyterlab | 0.32.1 |
jwt | 1.7.1 |
keras | 2.2.4 |
keras_applications | 1.0.6 |
keras_preprocessing | 1.0.5 |
kiwisolver | 1.0.1 |
lazy_object_proxy | 1.3.1 |
llvmlite | 0.23.1 |
logging | 0.5.1.2 |
markdown | 3.0.1 |
markupsafe | 1 |
matplotlib | 2.2.2 |
mccabe | 0.6.1 |
menuinst | 1.4.14 |
mistune | 0.8.3 |
mkl | 1.1.2 |
mlxtend | 0.15.0.0 |
mpmath | 1.0.0 |
msrest | 0.6.2 |
msrestazure | 0.6.0 |
multipledispatch | 0.5.0 |
navigator_updater | 0.2.1 |
nbconvert | 5.3.1 |
nbformat | 4.4.0 |
networkx | 2.1 |
nltk | 3.3 |
nose | 1.3.7 |
notebook | 5.5.0 |
numba | 0.38.0 |
numexpr | 2.6.5 |
numpy | 1.19.1 |
numpydoc | 0.8.0 |
oauthlib | 2.1.0 |
olefile | 0.45.1 |
onnxruntime | 1.4.0 |
openpyxl | 2.5.3 |
OpenSSL | 18.0.0 |
optparse | 1.5.3 |
packaging | 17.1 |
pandas | 0.24.1 |
parso | 0.2.0 |
past | 0.17.1 |
path | 11.0.1 |
patsy | 0.5.0 |
pep8 | 1.7.1 |
phonenumbers | 8.10.6 |
pickleshare | 0.7.4 |
PIL | 5.1.0 |
pint | 0.8.1 |
pip | 21.3.1 |
plac | 0.9.6 |
platform | 1.0.8 |
plotly | 4.8.2 |
pluggy | 0.6.0 |
ply | 3.11 |
prompt_toolkit | 1.0.15 |
psutil | 5.4.5 |
py | 1.5.3 |
pycodestyle | 2.4.0 |
pycosat | 0.6.3 |
pycparser | 2.18 |
pyflakes | 1.6.0 |
pyfpgrowth | 1 |
pygments | 2.2.0 |
pylint | 1.8.4 |
pyparsing | 2.2.0 |
pytest | 3.5.1 |
pytest_arraydiff | 0.2 |
pytz | 2018.4 |
pywt | 0.5.2 |
qtconsole | 4.3.1 |
re | 2.2.1 |
regex | 2.4.136 |
requests | 2.18.4 |
requests_oauthlib | 1.0.0 |
ruamel_yaml | 0.15.35 |
s3transfer | 0.2.0 |
sandbox_utils | 1.2 |
scipy | 1.1.0 |
scrubadub | 1.2.0 |
setuptools | 39.1.0 |
six | 1.11.0 |
sklearn | 0.20.3 |
socketserver | 0.4 |
socks | 1.6.7 |
sortedcollections | 0.6.1 |
sortedcontainers | 1.5.10 |
spacy | 2.0.18 |
sphinx | 1.7.4 |
spyder | 3.2.8 |
sqlalchemy | 1.2.7 |
statsmodels | 0.9.0 |
surprise | 1.0.6 |
sympy | 1.1.1 |
tables | 3.4.3 |
tabnanny | 6 |
tblib | 1.3.2 |
tensorflow | 1.12.0 |
terminado | 0.8.1 |
testpath | 0.3.1 |
textblob | 0.10.0 |
tlz | 0.9.0.1 |
toolz | 0.9.0 |
torch | 1.0.0 |
tqdm | 4.31.1 |
traitlets | 4.3.2 |
ujson | 1.35 |
unicodecsv | 0.14.1 |
urllib3 | 1.22 |
werkzeug | 0.14.1 |
wheel | 0.31.1 |
widgetsnbextension | 3.2.1 |
win32rcparser | 0.11 |
winpty | 0.5.1 |
wrapt | 1.10.11 |
xgboost | 0.81 |
xlsxwriter | 1.0.4 |
yaml | 3.12 |
zict | 0.1.3 |
3.3 - R plugin (Preview)
The R plugin runs a user-defined function (UDF) using an R script.
The script gets tabular data as its input, and produces tabular output. The plugin’s runtime is hosted in a sandbox on the cluster’s nodes. The sandbox provides an isolated and secure environment.
Syntax
T |
evaluate
[hint.distribution
=
(single
| per_node
)] r(
output_schema,
script [,
script_parameters] [,
external_artifacts])
Parameters
Name | Type | Required | Description |
---|---|---|---|
output_schema | string | ✔️ | A type literal that defines the output schema of the tabular data, returned by the R code. The format is: typeof( ColumnName: ColumnType[, …]) . For example: typeof(col1:string, col2:long) . To extend the input schema, use the following syntax: typeof(*, col1:string, col2:long) . |
script | string | ✔️ | The valid R script to be executed. |
script_parameters | dynamic | A property bag of name and value pairs to be passed to the R script as the reserved kargs dictionary. For more information, see Reserved R variables. | |
hint.distribution | string | Hint for the plugin’s execution to be distributed across multiple cluster nodes. The default value is single . single means that a single instance of the script will run over the entire query data. per_node means that if the query before the R block is distributed, an instance of the script will run on each node over the data that it contains. | |
external_artifacts | dynamic | A property bag of name and URL pairs for artifacts that are accessible from cloud storage. They can be made available for the script to use at runtime. URLs referenced in this property bag are required to be included in the cluster’s callout policy and in a publicly available location, or contain the necessary credentials, as explained in storage connection strings. The artifacts are made available for the script to consume from a local temporary directory, .\Temp . The names provided in the property bag are used as the local file names. See Example. For more information, see Install packages for the R plugin. |
Reserved R variables
The following variables are reserved for interaction between Kusto Query Language and the R code:
df
: The input tabular data (the values ofT
above), as an R DataFrame.kargs
: The value of the script_parameters argument, as an R dictionary.result
: An R DataFrame created by the R script. The value becomes the tabular data that gets sent to any Kusto query operator that follows the plugin.
Enable the plugin
- The plugin is disabled by default.
- Enable or disable the plugin in the Azure portal in the Configuration tab of your cluster. For more information, see Manage language extensions in your Azure Data Explorer cluster (Preview)
R sandbox image
- The R sandbox image is based on R 3.4.4 for Windows, and includes packages from Anaconda’s R Essentials bundle.
Examples
range x from 1 to 360 step 1
| evaluate r(
//
typeof(*, fx:double), // Output schema: append a new fx column to original table
//
'result <- df\n' // The R decorated script
'n <- nrow(df)\n'
'g <- kargs$gain\n'
'f <- kargs$cycles\n'
'result$fx <- g * sin(df$x / n * 2 * pi * f)'
//
, bag_pack('gain', 100, 'cycles', 4) // dictionary of parameters
)
| render linechart
Performance tips
Reduce the plugin’s input dataset to the minimum amount required (columns/rows).
Use filters on the source dataset using the Kusto Query Language, when possible.
To make a calculation on a subset of the source columns, project only those columns before invoking the plugin.
Use
hint.distribution = per_node
whenever the logic in your script is distributable.You can also use the partition operator for partitioning the input data et.
Whenever possible, use the Kusto Query Language to implement the logic of your R script.
For example:
.show operations | where StartedOn > ago(1d) // Filtering out irrelevant records before invoking the plugin | project d_seconds = Duration / 1s // Projecting only a subset of the necessary columns | evaluate hint.distribution = per_node r( // Using per_node distribution, as the script's logic allows it typeof(*, d2:double), 'result <- df\n' 'result$d2 <- df$d_seconds\n' // Negative example: this logic should have been written using Kusto's query language ) | summarize avg = avg(d2)
Usage tips
To avoid conflicts between Kusto string delimiters and R string delimiters:
- Use single quote characters (
'
) for Kusto string literals in Kusto queries. - Use double quote characters (
"
) for R string literals in R scripts.
- Use single quote characters (
Use the external data operator to obtain the content of a script that you’ve stored in an external location, such as Azure blob storage or a public GitHub repository.
For example:
let script = externaldata(script:string) [h'https://kustoscriptsamples.blob.core.windows.net/samples/R/sample_script.r'] with(format = raw); range x from 1 to 360 step 1 | evaluate r( typeof(*, fx:double), toscalar(script), bag_pack('gain', 100, 'cycles', 4)) | render linechart
Install packages for the R plugin
Follow these step by step instructions to install package(s) that aren’t included in the plugin’s base image.
Prerequisites
Create a blob container to host the packages, preferably in the same place as your cluster. For example,
https://artifactswestus.blob.core.windows.net/r
, assuming your cluster is in West US.Alter the cluster’s callout policy to allow access to that location.
This change requires AllDatabasesAdmin permissions.
For example, to enable access to a blob located in
https://artifactswestus.blob.core.windows.net/r
, run the following command:
.alter-merge cluster policy callout @'[ { "CalloutType": "sandbox_artifacts", "CalloutUriRegex": "artifactswestus\\.blob\\.core\\.windows\\.net/r/","CanCall": true } ]'
Install packages
The example snips below assume local R machine on Windows environment.
Verify you’re using the appropriate R version – current R Sandbox version is 3.4.4:
> R.Version()["version.string"] $version.string [1] "R version 3.4.4 (2018-03-15)"
If needed you can download it from here.
Launch the x64 RGui
Create a new empty folder to be populated with all the relevant packages you would like to install. In this example we install the brglm2 package, so creating “C:\brglm2”.
Add the newly created folder path to lib paths:
> .libPaths("C://brglm2")
Verify that the new folder is now the first path in .libPaths():
> .libPaths() [1] "C:/brglm2" "C:/Program Files/R/R-3.4.4/library"
Once this setup is done, any package that we install shall be added to this new folder. Let’s install the requested package and its dependencies:
> install.packages("brglm2")
In case the question “Do you want to install from sources the packages which need compilation?” pops up, answer “Y”.
Verify that new folders were added to “C:\brglm2”:
Select all items in that folder and zip them to e.g. libs.zip (do not zip the parent folder). You should get an archive structure like this:
libs.zip:
- brglm2 (folder)
- enrichwith (folder)
- numDeriv (folder)
Upload libs.zip to the blob container that was set above
Call the
r
plugin.- Specify the
external_artifacts
parameter with a property bag of name and reference to the ZIP file (the blob’s URL, including a SAS token). - In your inline r code, import
zipfile
fromsandboxutils
and call itsinstall()
method with the name of the ZIP file.
- Specify the
Example
Install the brglm2 package:
print x=1
| evaluate r(typeof(*, ver:string),
'library(sandboxutils)\n'
'zipfile.install("brglm2.zip")\n'
'library("brglm2")\n'
'result <- df\n'
'result$ver <-packageVersion("brglm2")\n'
,external_artifacts=bag_pack(brglm2.zip', 'https://artifactswestus.blob.core.windows.net/r/libs.zip?*** REPLACE WITH YOUR SAS TOKEN ***'))
x | ver |
---|---|
1 | 1.8.2 |
Make sure that the archive’s name (first value in pack pair) has the *.zip suffix to prevent collisions when unzipping folders whose name is identical to the archive name.
4 - Machine learning plugins
4.1 - autocluster plugin
autocluster
finds common patterns of discrete attributes (dimensions) in the data. It then reduces the results of the original query, whether it’s 100 or 100,000 rows, to a few patterns. The plugin was developed to help analyze failures (such as exceptions or crashes) but can potentially work on any filtered dataset. The plugin is invoked with the evaluate
operator.
Syntax
T |
evaluate
autocluster
(
[SizeWeight [,
WeightColumn [,
NumSeeds [,
CustomWildcard [,
… ]]]]])
Parameters
The parameters must be ordered as specified in the syntax. To indicate that the default value should be used, put the string tilde value ~
. For more information, see Examples.
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input tabular expression. |
SizeWeight | double | A double between 0 and 1 that controls the balance between generic (high coverage) and informative (many shared) values. Increasing this value typically reduces the quantity of patterns while expanding coverage. Conversely, decreasing this value generates more specific patterns characterized by increased shared values and a smaller percentage coverage. The default is 0.5 . The formula is a weighted geometric mean with weights SizeWeight and 1-SizeWeight . | |
WeightColumn | string | Considers each row in the input according to the specified weight. Each row has a default weight of 1 . The argument must be a name of a numeric integer column. A common usage of a weight column is to take into account sampling or bucketing or aggregation of the data that is already embedded into each row. | |
NumSeeds | int | Determines the number of initial local search points. Adjusting the number of seeds impacts result quantity or quality based on data structure. Increasing seeds can enhance results but with a slower query tradeoff. Decreasing below five yields negligible improvements, while increasing above 50 rarely generates more patterns. The default is 25 . | |
CustomWildcard | string | A type literal that sets the wildcard value for a specific type in the results table, indicating no restriction on this column. The default is null , which represents an empty string. If the default is a good value in the data, a different wildcard value should be used, such as * . You can include multiple custom wildcards by adding them consecutively. |
Returns
The autocluster
plugin usually returns a small set of patterns. The patterns capture portions of the data with shared common values across multiple discrete attributes. Each pattern in the results is represented by a row.
The first column is the segment ID. The next two columns are the count and percentage of rows out of the original query that are captured by the pattern. The remaining columns are from the original query. Their value is either a specific value from the column, or a wildcard value (which are by default null) meaning variable values.
The patterns aren’t distinct, may be overlapping, and usually don’t cover all the original rows. Some rows may not fall under any pattern.
Examples
Using evaluate
T | evaluate autocluster()
Using autocluster
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State , EventType , Damage
| evaluate autocluster(0.6)
Output
SegmentId | Count | Percent | State | EventType | Damage | |||
---|---|---|---|---|---|---|---|---|
0 | 2278 | 38.7 | Hail | NO | ||||
1 | 512 | 8.7 | Thunderstorm Wind | YES | ||||
2 | 898 | 15.3 | TEXAS |
Using custom wildcards
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State , EventType , Damage
| evaluate autocluster(0.2, '~', '~', '*')
Output
SegmentId | Count | Percent | State | EventType | Damage | |||
---|---|---|---|---|---|---|---|---|
0 | 2278 | 38.7 | * | Hail | NO | |||
1 | 512 | 8.7 | * | Thunderstorm Wind | YES | |||
2 | 898 | 15.3 | TEXAS | * | * |
Related content
4.2 - basket plugin
The basket
plugin finds frequent patterns of attributes in the data and returns the patterns that pass a frequency threshold in that data. A pattern represents a subset of the rows that have the same value across one or more columns. The basket
plugin is based on the Apriori algorithm originally developed for basket analysis data mining.
Syntax
T | evaluate
basket
(
[ Threshold,
WeightColumn,
MaxDimensions,
CustomWildcard,
CustomWildcard,
… ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
Threshold | long | A double in the range of 0.015 to 1 that sets the minimal ratio of the rows to be considered frequent. Patterns with a smaller ratio won’t be returned. The default value is 0.05. To use the default value, input the tilde: ~ .Example: `T | |
WeightColumn | string | The column name to use to consider each row in the input according to the specified weight. Must be a name of a numeric type column, such as int , long , real . By default, each row has a weight of 1. To use the default value, input the tilde: ~ . A common use of a weight column is to take into account sampling or bucketing/aggregation of the data that is already embedded into each row.Example: `T | |
MaxDimensions | int | Sets the maximal number of uncorrelated dimensions per basket, limited by default, to minimize the query runtime. The default is 5. To use the default value, input the tilde: ~ .Example: `T | |
CustomWildcard | string | Sets the wildcard value for a specific type in the result table that will indicate that the current pattern doesn’t have a restriction on this column. The default is null except for string columns whose default value is an empty string. If the default is a good value in the data, a different wildcard value should be used, such as * . To use the default value, input the tilde: ~ .Example: `T |
Returns
The basket
plugin returns frequent patterns that pass a ratio threshold. The default threshold is 0.05.
Each pattern is represented by a row in the results. The first column is the segment ID. The next two columns are the count and percentage of rows, from the original query that match the pattern. The remaining columns relate to the original query, with either a specific value from the column or a wildcard value, which is by default null, meaning a variable value.
Example
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State, EventType, Damage, DamageCrops
| evaluate basket(0.2)
Output
SegmentId | Count | Percent | State | EventType | Damage | DamageCrops | ||
---|---|---|---|---|---|---|---|---|
0 | 4574 | 77.7 | NO | 0 | ||||
1 | 2278 | 38.7 | Hail | NO | 0 | |||
2 | 5675 | 96.4 | 0 | |||||
3 | 2371 | 40.3 | Hail | 0 | ||||
4 | 1279 | 21.7 | Thunderstorm Wind | 0 | ||||
5 | 2468 | 41.9 | Hail | |||||
6 | 1310 | 22.3 | YES | |||||
7 | 1291 | 21.9 | Thunderstorm Wind |
Example with custom wildcards
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State, EventType, Damage, DamageCrops
| evaluate basket(0.2, '~', '~', '*', int(-1))
Output
SegmentId | Count | Percent | State | EventType | Damage | DamageCrops | ||
---|---|---|---|---|---|---|---|---|
0 | 4574 | 77.7 | * | * | NO | 0 | ||
1 | 2278 | 38.7 | * | Hail | NO | 0 | ||
2 | 5675 | 96.4 | * | * | * | 0 | ||
3 | 2371 | 40.3 | * | Hail | * | 0 | ||
4 | 1279 | 21.7 | * | Thunderstorm Wind | * | 0 | ||
5 | 2468 | 41.9 | * | Hail | * | -1 | ||
6 | 1310 | 22.3 | * | * | YES | -1 | ||
7 | 1291 | 21.9 | * | Thunderstorm Wind | * | -1 |
4.3 - diffpatterns plugin
Compares two datasets of the same structure and finds patterns of discrete attributes (dimensions) that characterize differences between the two datasets. The plugin is invoked with the evaluate
operator.
diffpatterns
was developed to help analyze failures (for example, by comparing failures to non-failures in a given time frame), but can potentially find differences between any two datasets of the same structure.
Syntax
T | evaluate diffpatterns(
SplitColumn,
SplitValueA,
SplitValueB [,
WeightColumn,
Threshold,
MaxDimensions,
CustomWildcard,
…])
Parameters
Name | Type | Required | Description |
---|---|---|---|
SplitColumn | string | ✔️ | The column name that tells the algorithm how to split the query into datasets. According to the specified values for the SplitValueA and SplitValueB arguments, the algorithm splits the query into two datasets, “A” and “B”, and analyzes the differences between them. As such, the split column must have at least two distinct values. |
SplitValueA | string | ✔️ | A string representation of one of the values in the SplitColumn that was specified. All the rows that have this value in their SplitColumn considered as dataset “A”. |
SplitValueB | string | ✔️ | A string representation of one of the values in the SplitColumn that was specified. All the rows that have this value in their SplitColumn considered as dataset “B”. |
WeightColumn | string | The column used to consider each row in the input according to the specified weight. Must be a name of a numeric column, such as int , long , real . By default each row has a weight of ‘1’. To use the default value, input the tilde: ~ . A common usage of a weight column is to take into account sampling or bucketing/aggregation of the data that is already embedded into each row.Example: `T | |
Threshold | real | A real in the range of 0.015 to 1. This value sets the minimal pattern ratio difference between the two sets. The default is 0.05. To use the default value, input the tilde: ~ .Example: `T | |
MaxDimensions | int | Sets the maximum number of uncorrelated dimensions per result pattern. By specifying a limit, you decrease the query runtime. The default is unlimited. To use the default value, input the tilde: ~ .Example: `T | |
CustomWildcard | string | Sets the wildcard value for a specific type in the result table that will indicate that the current pattern doesn’t have a restriction on this column. The default is null, except for string columns for which the default is an empty string. If the default is a viable value in the data, a different wildcard value should be used. For example, * . To use the default value, input the tilde: ~ .Example: `T |
Returns
diffpatterns
returns a small set of patterns that capture different portions of the data in the two sets (that is, a pattern capturing a large percentage of the rows in the first dataset and low percentage of the rows in the second set). Each pattern is represented by a row in the results.
The result of diffpatterns
returns the following columns:
SegmentId: the identity assigned to the pattern in the current query (note: IDs aren’t guaranteed to be the same in repeating queries).
CountA: the number of rows captured by the pattern in Set A (Set A is the equivalent of
where tostring(splitColumn) == SplitValueA
).CountB: the number of rows captured by the pattern in Set B (Set B is the equivalent of
where tostring(splitColumn) == SplitValueB
).PercentA: the percentage of rows in Set A captured by the pattern (100.0 * CountA / count(SetA)).
PercentB: the percentage of rows in Set B captured by the pattern (100.0 * CountB / count(SetB)).
PercentDiffAB: the absolute percentage point difference between A and B (|PercentA - PercentB|) is the main measure of significance of patterns in describing the difference between the two sets.
Rest of the columns: are the original schema of the input and describe the pattern, each row (pattern) represents the intersection of the non-wildcard values of the columns (equivalent of
where col1==val1 and col2==val2 and ... colN=valN
for each non-wildcard value in the row).
For each pattern, columns that aren’t set in the pattern (that is, without restriction on a specific value) will contain a wildcard value, which is null by default. See in the Arguments section below how wildcards can be manually changed.
- Note: the patterns are often not distinct. They may be overlapping, and usually don’t cover all the original rows. Some rows may not fall under any pattern.
Example
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , 1 , 0)
| project State , EventType , Source , Damage, DamageCrops
| evaluate diffpatterns(Damage, "0", "1" )
Output
SegmentId | CountA | CountB | PercentA | PercentB | PercentDiffAB | State | EventType | Source | DamageCrops |
---|---|---|---|---|---|---|---|---|---|
0 | 2278 | 93 | 49.8 | 7.1 | 42.7 | Hail | 0 | ||
1 | 779 | 512 | 17.03 | 39.08 | 22.05 | Thunderstorm Wind | |||
2 | 1098 | 118 | 24.01 | 9.01 | 15 | Trained Spotter | 0 | ||
3 | 136 | 158 | 2.97 | 12.06 | 9.09 | Newspaper | |||
4 | 359 | 214 | 7.85 | 16.34 | 8.49 | Flash Flood | |||
5 | 50 | 122 | 1.09 | 9.31 | 8.22 | IOWA | |||
6 | 655 | 279 | 14.32 | 21.3 | 6.98 | Law Enforcement | |||
7 | 150 | 117 | 3.28 | 8.93 | 5.65 | Flood | |||
8 | 362 | 176 | 7.91 | 13.44 | 5.52 | Emergency Manager |
4.4 - diffpatterns_text plugin
Compares two datasets of string values and finds text patterns that characterize differences between the two datasets. The plugin is invoked with the evaluate
operator.
The diffpatterns_text
returns a set of text patterns that capture different portions of the data in the two sets. For example, a pattern capturing a large percentage of the rows when the condition is true
and low percentage of the rows when the condition is false
. The patterns are built from consecutive tokens separated by white space, with a token from the text column or a *
representing a wildcard. Each pattern is represented by a row in the results.
Syntax
T | evaluate diffpatterns_text(
TextColumn, BooleanCondition [, MinTokens, Threshold , MaxTokens])
Parameters
Name | Type | Required | Description |
---|---|---|---|
TextColumn | string | ✔️ | The text column to analyze. |
BooleanCondition | string | ✔️ | An expression that evaluates to a boolean value. The algorithm splits the query into the two datasets to compare based on this expression. |
MinTokens | int | An integer value between 0 and 200 that represents the minimal number of non-wildcard tokens per result pattern. The default is 1. | |
Threshold | decimal | A decimal value between 0.015 and 1 that sets the minimal pattern ratio difference between the two sets. Default is 0.05. See diffpatterns. | |
MaxTokens | int | An integer value between 0 and 20 that sets the maximal number of tokens per result pattern, specifying a lower limit decreases the query runtime. |
Returns
The result of diffpatterns_text returns the following columns:
- Count_of_True: The number of rows matching the pattern when the condition is
true
. - Count_of_False: The number of rows matching the pattern when the condition is
false
. - Percent_of_True: The percentage of rows matching the pattern from the rows when the condition is
true
. - Percent_of_False: The percentage of rows matching the pattern from the rows when the condition is
false
. - Pattern: The text pattern containing tokens from the text string and ‘
*
’ for wildcards.
Example
The following example uses data from the StormEvents table in the help cluster. To access this data, sign in to https://dataexplorer.azure.com/clusters/help/databases/Samples. In the left menu, browse to help > Samples > Tables > Storm_Events.
The examples in this tutorial use the StormEvents
table, which is publicly available in the Weather analytics sample data.
StormEvents
| where EventNarrative != "" and monthofyear(StartTime) > 1 and monthofyear(StartTime) < 9
| where EventType == "Drought" or EventType == "Extreme Cold/Wind Chill"
| evaluate diffpatterns_text(EpisodeNarrative, EventType == "Extreme Cold/Wind Chill", 2)
Output
Count_of_True | Count_of_False | Percent_of_True | Percent_of_False | Pattern |
---|---|---|---|---|
11 | 0 | 6.29 | 0 | Winds shifting northwest in * wake * a surface trough brought heavy lake effect snowfall downwind * Lake Superior from |
9 | 0 | 5.14 | 0 | Canadian high pressure settled * * region * produced the coldest temperatures since February * 2006. Durations * freezing temperatures |
0 | 34 | 0 | 6.24 | * * * * * * * * * * * * * * * * * * West Tennessee, |
0 | 42 | 0 | 7.71 | * * * * * * caused * * * * * * * * across western Colorado. * |
0 | 45 | 0 | 8.26 | * * below normal * |
0 | 110 | 0 | 20.18 | Below normal * |
5 - Query connectivity plugins
5.1 - ai_embed_text plugin (Preview)
The ai_embed_text
plugin allows embedding of text using language models, enabling various AI-related scenarios such as Retrieval Augmented Generation (RAG) applications and semantic search. The plugin supports Azure OpenAI Service embedding models accessed using managed identity.
Prerequisites
- An Azure OpenAI Service configured with managed identity
- Managed identity and callout policies configured to allow communication with Azure OpenAI services
Syntax
evaluate
ai_embed_text
(
text, connectionString [,
options [,
IncludeErrorMessages]])
Parameters
Name | Type | Required | Description |
---|---|---|---|
text | string | ✔️ | The text to embed. The value can be a column reference or a constant scalar. |
connectionString | string | ✔️ | The connection string for the language model in the format <ModelDeploymentUri>;<AuthenticationMethod> ; replace <ModelDeploymentUri> and <AuthenticationMethod> with the AI model deployment URI and the authentication method respectively. |
options | dynamic | The options that control calls to the embedding model endpoint. See Options. | |
IncludeErrorMessages | bool | Indicates whether to output errors in a new column in the output table. Default value: false . |
Options
The following table describes the options that control the way the requests are made to the embedding model endpoint.
Name | Type | Description |
---|---|---|
RecordsPerRequest | int | Specifies the number of records to process per request. Default value: 1 . |
CharsPerRequest | int | Specifies the maximum number of characters to process per request. Default value: 0 (unlimited). Azure OpenAI counts tokens, with each token approximately translating to four characters. |
RetriesOnThrottling | int | Specifies the number of retry attempts when throttling occurs. Default value: 0 . |
GlobalTimeout | timespan | Specifies the maximum time to wait for a response from the embedding model. Default value: null |
ModelParameters | dynamic | Parameters specific to the embedding model, such as embedding dimensions or user identifiers for monitoring purposes. Default value: null . |
Configure managed identity and callout policies
To use the ai_embed_text
plugin, you must configure the following policies:
- managed identity: Allow the system-assigned managed identity to authenticate to Azure OpenAI services.
- callout: Authorize the AI model endpoint domain.
To configure these policies, use the commands in the following steps:
Configure the managed identity:
.alter-merge cluster policy managed_identity ``` [ { "ObjectId": "system", "AllowedUsages": "AzureAI" } ] ```
Configure the callout policy:
.alter-merge cluster policy callout ``` [ { "CalloutType": "azure_openai", "CalloutUriRegex": "https://[A-Za-z0-9\\-]{3,63}\\.openai\\.azure\\.com/.*", "CanCall": true } ] ```
Returns
Returns the following new embedding columns:
- A column with the _embedding suffix that contains the embedding values
- If configured to return errors, a column with the _embedding_error suffix, which contains error strings or is left empty if the operation is successful.
Depending on the input type, the plugin returns different results:
- Column reference: Returns one or more records with additional columns are prefixed by the reference column name. For example, if the input column is named TextData, the output columns are named TextData_embedding and, if configured to return errors, TextData_embedding_error.
- Constant scalar: Returns a single record with additional columns that are not prefixed. The column names are _embedding and, if configured to return errors, _embedding_error.
Examples
The following example embeds the text Embed this text using AI
using the Azure OpenAI Embedding model.
let expression = 'Embed this text using AI';
let connectionString = 'https://myaccount.openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2024-06-01;managed_identity=system';
evaluate ai_embed_text(expression, connectionString)
The following example embeds multiple texts using the Azure OpenAI Embedding model.
let connectionString = 'https://myaccount.openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2024-06-01;managed_identity=system';
let options = dynamic({
"RecordsPerRequest": 10,
"CharsPerRequest": 10000,
"RetriesOnThrottling": 1,
"GlobalTimeout": 2m
});
datatable(TextData: string)
[
"First text to embed",
"Second text to embed",
"Third text to embed"
]
| evaluate ai_embed_text(TextData, connectionString, options , true)
Best practices
Azure OpenAI embedding models are subject to heavy throttling, and frequent calls to this plugin can quickly reach throttling limits.
To efficiently use the ai_embed_text
plugin while minimizing throttling and costs, follow these best practices:
- Control request size: Adjust the number of records (
RecordsPerRequest
) and characters per request (CharsPerRequest
). - Control query timeout: Set
GlobalTimeout
to a value lower than the query timeout to ensure progress isn’t lost on successful calls up to that point. - Handle rate limits more gracefully: Set retries on throttling (
RetriesOnThrottling
).
Related content
5.2 - azure_digital_twins_query_request plugin
The azure_digital_twins_query_request
plugin runs an Azure Digital Twins query as part of a Kusto Query Language (KQL) query. The plugin is invoked with the evaluate
operator.
Using the plugin, you can query across data in both Azure Digital Twins and any data source accessible through KQL. For example, you can perform time series analytics.
For more information about the plugin, see Azure Digital Twins query plugin.
Syntax
evaluate
azure_digital_twins_query_request
(
AdtInstanceEndpoint ,
AdtQuery )
Parameters
Name | Type | Required | Description |
---|---|---|---|
AdtInstanceEndpoint | string | ✔️ | The Azure Digital Twins instance endpoint to be queried. |
AdtQuery | string | ✔️ | The query to run against the Azure Digital Twins endpoint. This query is written in a custom SQL-like query language for Azure Digital Twins, called the Azure Digital Twins query language. For more information, see Query language for Azure Digital Twins. |
Authentication and authorization
The azure_digital_twins_query_request
plugin uses the Microsoft Entra account of the user running the query to authenticate. To run a query, a user must at least be granted the Azure Digital Twins Data Reader role. Information on how to assign this role can be found in Security for Azure Digital Twins solutions.
Examples
The following examples show how you can run various Azure Digital Twins queries, including queries that use additional Kusto expressions.
Retrieval of all twins within an Azure Digital Twins instance
The following example returns all digital twins within an Azure Digital Twins instance.
evaluate azure_digital_twins_query_request(
'https://contoso.api.wcus.digitaltwins.azure.net',
'SELECT T AS Twins FROM DIGITALTWINS T')
Projection of twin properties as columns along with additional Kusto expressions
The following example returns the result from the plugin as separate columns, and then performs additional operations using Kusto expressions.
evaluate azure_digital_twins_query_request(
'https://contoso.api.wcus.digitaltwins.azure.net',
'SELECT T.Temperature, T.Humidity FROM DIGITALTWINS T WHERE IS_PRIMITIVE(T.Temperature) AND IS_PRIMITIVE(T.Humidity)')
| where Temperature > 20
| project TemperatureInC = Temperature, Humidity
Output
TemperatureInC | Humidity |
---|---|
21 | 48 |
49 | 34 |
80 | 32 |
Perform time series analytics
You can use the data history integration feature of Azure Digital Twins to historize digital twin property updates. To learn how to view the historized twin updates, see View the historized twin updates
5.3 - cosmosdb_sql_request plugin
The cosmosdb_sql_request
plugin sends a SQL query to an Azure Cosmos DB SQL network endpoint and returns the results of the query. This plugin is primarily designed for querying small datasets, for example, enriching data with reference data stored in Azure Cosmos DB. The plugin is invoked with the evaluate
operator.
Syntax
evaluate
cosmosdb_sql_request
(
ConnectionString ,
SqlQuery [,
SqlParameters [,
Options]] )
[:
OutputSchema]
Parameters
Name | Type | Required | Description |
---|---|---|---|
ConnectionString | string | ✔️ | The connection string that points to the Azure Cosmos DB collection to query. It must include AccountEndpoint, Database, and Collection. It might include AccountKey if a master key is used for authentication. For more information, see Authentication and authorization.Example: 'AccountEndpoint=https://cosmosdbacc.documents.azure.com/;Database=<MyDatabase>;Collection=<MyCollection>;AccountKey='h'<AccountKey>' |
SqlQuery | string | ✔️ | The query to execute. |
SqlParameters | dynamic | The property bag object to pass as parameters along with the query. Parameter names must begin with @ . | |
OutputSchema | The names and types of the expected columns of the cosmosdb_sql_request plugin output. Use the following syntax: ( ColumnName : ColumnType [, …] ) . Specifying this parameter enables multiple query optimizations. | ||
Options | dynamic | A property bag object of advanced settings. If an AccountKey isn’t provided in the ConnectionString, then the armResourceId field of this parameter is required. For more information, see Supported options. |
Supported options
The following table describes the supported fields of the Options parameter.
Name | Type | Description |
---|---|---|
armResourceId | string | The Azure Resource Manager resource ID of the Cosmos DB database. If an account key isn’t provided in the connection string argument, this field is required. In such a case, the armResourceId is used to authenticate to Cosmos DB.Example: armResourceId='/subscriptions/<SubscriptionId>/resourceGroups/<ResourceGroup>/providers/Microsoft.DocumentDb/databaseAccounts/<DatabaseAccount>' |
token | string | A Microsoft Entra access token of a principal with access to the Cosmos DB database. This token is used along with the armResourceId to authenticate with the Azure Resource Manager. If unspecified, the token of the principal that made the query is used.If armResourceId isn’t specified, the token is used directly to access the Cosmos DB database. For more information about the token authentication method, see Authentication and authorization. |
preferredLocations | string | The region from which to query the data.Example: ['East US'] |
Authentication and authorization
To authorize to an Azure Cosmos DB SQL network endpoint, you need to specify the authorization information. The following table provides the supported authentication methods and the description for how to use that method.
Authentication method | Description |
---|---|
Managed identity (Recommended) | Append Authentication="Active Directory Managed Identity";User Id={object_id}; to the connection string. The request is made on behalf of a managed identity which must have the appropriate permissions to the database.To enable managed identity authentication, you must add the managed identity to your cluster and alter the managed identity policy. For more information, see Managed Identity policy. |
Azure Resource Manager resource ID | This authentication method requires specifying the armResourceId and optionally the token in the options. The armResourceId identifies the Cosmos DB database account, and the token must be a valid Microsoft Entra bearer token for a principal with access permissions to the Cosmos DB database. If no token is provided, the Microsoft Entra token of the requesting principal will be used for authentication. |
Account key | You can add the account key directly to the ConnectionString argument. However, this approach is less secure as it involves including the secret in the query text, and is less resilient to future changes in the account key. To enhance security, hide the secret as an obfuscated string literal. |
Token | You can add a token value in the plugin options. The token must belong to a principal with relevant permissions. To enhance security, hide the token as an obfuscated string literal. |
Set callout policy
The plugin makes callouts to the Azure Cosmos DB instance. Make sure that the cluster’s callout policy enables calls of type cosmosdb
to the target CosmosDbUri.
The following example shows how to define the callout policy for Azure Cosmos DB. It’s recommended to restrict it to specific endpoints (my_endpoint1
, my_endpoint2
).
[
{
"CalloutType": "CosmosDB",
"CalloutUriRegex": "my_endpoint1\\.documents\\.azure\\.com",
"CanCall": true
},
{
"CalloutType": "CosmosDB",
"CalloutUriRegex": "my_endpoint2\\.documents\\.azure\\.com",
"CanCall": true
}
]
The following example shows an alter callout policy command for cosmosdb
CalloutType
.alter cluster policy callout @'[{"CalloutType": "cosmosdb", "CalloutUriRegex": "\\.documents\\.azure\\.com", "CanCall": true}]'
Examples
The following examples use placeholder text, in brackets.
Query Azure Cosmos DB with a query-defined output schema
The following example uses the cosmosdb_sql_request plugin to send a SQL query while selecting only specific columns. This query uses explicit schema definitions that allow various optimizations before the actual query is run against Cosmos DB.
evaluate cosmosdb_sql_request(
'AccountEndpoint=https://cosmosdbacc.documents.azure.com/;Database=<MyDatabase>;Collection=<MyCollection>;AccountKey='h'<AccountKey>',
'SELECT c.Id, c.Name from c') : (Id:long, Name:string)
Query Azure Cosmos DB
The following example uses the cosmosdb_sql_request plugin to send a SQL query to fetch data from Azure Cosmos DB using its Azure Cosmos DB for NoSQL.
evaluate cosmosdb_sql_request(
'AccountEndpoint=https://cosmosdbacc.documents.azure.com/;Database=<MyDatabase>;Collection=<MyCollection>;AccountKey='h'<AccountKey>',
'SELECT * from c') // OutputSchema is unknown, so it is not specified. This may harm the performance of the query.
Query Azure Cosmos DB with parameters
The following example uses SQL query parameters and queries the data from an alternate region. For more information, see preferredLocations
.
evaluate cosmosdb_sql_request(
'AccountEndpoint=https://cosmosdbacc.documents.azure.com/;Database=<MyDatabase>;Collection=<MyCollection>;AccountKey='h'<AccountKey>',
"SELECT c.id, c.lastName, @param0 as Column0 FROM c WHERE c.dob >= '1970-01-01T00:00:00Z'",
dynamic({'@param0': datetime(2019-04-16 16:47:26.7423305)}),
dynamic({'preferredLocations': ['East US']})) : (Id:long, Name:string, Column0: datetime)
| where lastName == 'Smith'
Query Azure Cosmos DB and join data with a database table
The following example joins partner data from an Azure Cosmos DB with partner data in a database using the Partner
field. It results in a list of partners with their phone numbers, website, and contact email address sorted by partner name.
evaluate cosmosdb_sql_request(
'AccountEndpoint=https://cosmosdbacc.documents.azure.com/;Database=<MyDatabase>;Collection=<MyCollection>;AccountKey='h'<AccountKey>',
"SELECT c.id, c.Partner, c. phoneNumber FROM c') : (Id:long, Partner:string, phoneNumber:string)
| join kind=innerunique Partner on Partner
| project id, Partner, phoneNumber, website, Contact
| sort by Partner
Query Azure Cosmos DB using token authentication
The following example joins partner data from an Azure Cosmos DB with partner data in a database using the Partner
field. It results in a list of partners with their phone numbers, website, and contact email address sorted by partner name.
evaluate cosmosdb_sql_request(
'AccountEndpoint=https://cosmosdbacc.documents.azure.com/;Database=<MyDatabase>;Collection=<MyCollection>;',
"SELECT c.Id, c.Name, c.City FROM c",
dynamic(null),
dynamic({'token': h'abc123...'})
) : (Id:long, Name:string, City:string)
Query Azure Cosmos DB using Azure Resource Manager resource ID for authentication
The following example uses the Azure Resource Manager resource ID for authentication and the Microsoft Entra token of the requesting principal, since a token isn’t specified. It sends a SQL query while selecting only specific columns and specifies explicit schema definitions.
evaluate cosmosdb_sql_request(
'AccountEndpoint=https://cosmosdbacc.documents.azure.com/;Database=<MyDatabase>;Collection=<MyCollection>;',
"SELECT c.Id, c.Name, c.City FROM c",
dynamic({'armResourceId': '/subscriptions/<SubscriptionId>/resourceGroups/<ResourceGroup>/providers/Microsoft.DocumentDb/databaseAccounts/<DatabaseAccount>'})
) : (Id:long, Name:string, City:string)
5.4 - http_request plugin
services: data-explorer
http_request plugin
The http_request
plugin sends an HTTP GET request and converts the response into a table.
Prerequisites
- Run
.enable plugin http_request
to enable the plugin - Set the URI to access as an allowed destination for
webapi
in the Callout policy
Syntax
evaluate
http_request
(
Uri [,
RequestHeaders [,
Options]] )
Parameters
Name | Type | Required | Description |
---|---|---|---|
Uri | string | ✔️ | The destination URI for the HTTP or HTTPS request. |
RequestHeaders | dynamic | A property bag containing HTTP headers to send with the request. | |
Options | dynamic | A property bag containing additional properties of the request. |
Authentication and authorization
To authenticate, use the HTTP standard Authorization
header or any custom header supported by the web service.
Returns
The plugin returns a table that has a single record with the following dynamic columns:
- ResponseHeaders: A property bag with the response header.
- ResponseBody: The response body parsed as a value of type
dynamic
.
If the HTTP response indicates (via the Content-Type
response header) that the media type is application/json
,
the response body is automatically parsed as-if it’s a JSON object. Otherwise, it’s returned as-is.
Headers
The RequestHeaders argument can be used to add custom headers to the outgoing HTTP request. In addition to the standard HTTP request headers and the user-provided custom headers, the plugin also adds the following custom headers:
Name | Description |
---|---|
x-ms-client-request-id | A correlation ID that identifies the request. Multiple invocations of the plugin in the same query will all have the same ID. |
x-ms-readonly | A flag indicating that the processor of this request shouldn’t make any persistent changes. |
Example
The following example retrieves Azure retails prices for Azure Purview in west Europe:
let Uri = "https://prices.azure.com/api/retail/prices?$filter=serviceName eq 'Azure Purview' and location eq 'EU West'";
evaluate http_request(Uri)
| project ResponseBody.Items
| mv-expand ResponseBody_Items
| evaluate bag_unpack(ResponseBody_Items)
Output
armRegionName | armSkuName | currencyCode | effectiveStartDate | isPrimaryMeterRegion | location | meterId | meterName | productId | productName | retailPrice | serviceFamily | serviceId | serviceName | skuId | skuName | tierMinimumUnits | type | unitOfMeasure | unitPrice |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
westeurope | Data Insights | USD | 2022-06-01T00:00:00Z | false | EU West | 8ce915f7-20db-564d-8cc3-5702a7c952ab | Data Insights Report Consumption | DZH318Z08M22 | Azure Purview Data Map | 0.21 | Analytics | DZH318Q66D0F | Azure Purview | DZH318Z08M22/006C | Catalog Insights | 0 | Consumption | 1 API Calls | 0.21 |
westeurope | Data Map Enrichment - Data Insights Generation | USD | 2022-06-01T00:00:00Z | false | EU West | 7ce2db1d-59a0-5193-8a57-0431a10622b6 | Data Map Enrichment - Data Insights Generation vCore | DZH318Z08M22 | Azure Purview Data Map | 0.82 | Analytics | DZH318Q66D0F | Azure Purview | DZH318Z08M22/005C | Data Map Enrichment - Insight Generation | 0 | Consumption | 1 Hour | 0.82 |
westeurope | USD | 2021-09-28T00:00:00Z | false | EU West | 053e2dcb-82c0-5e50-86cd-1f1c8d803705 | Power BI vCore | DZH318Z08M23 | Azure Purview Scanning Ingestion and Classification | 0 | Analytics | DZH318Q66D0F | Azure Purview | DZH318Z08M23/0005 | Power BI | 0 | Consumption | 1 Hour | 0 | |
westeurope | USD | 2021-09-28T00:00:00Z | false | EU West | a7f57f26-5f31-51e5-a5ed-ffc2b0da37b9 | Resource Set vCore | DZH318Z08M22 | Azure Purview Data Map | 0.21 | Analytics | DZH318Q66D0F | Azure Purview | DZH318Z08M22/000X | Resource Set | 0 | Consumption | 1 Hour | 0.21 | |
westeurope | USD | 2021-09-28T00:00:00Z | false | EU West | 5d157295-441c-5ea7-ba7c-5083026dc456 | SQL Server vCore | DZH318Z08M23 | Azure Purview Scanning Ingestion and Classification | 0 | Analytics | DZH318Q66D0F | Azure Purview | DZH318Z08M23/000F | SQL Server | 0 | Consumption | 1 Hour | 0 | |
westeurope | USD | 2021-09-28T00:00:00Z | false | EU West | 0745df0d-ce4f-52db-ac31-ac574d4dcfe5 | Standard Capacity Unit | DZH318Z08M22 | Azure Purview Data Map | 0.411 | Analytics | DZH318Q66D0F | Azure Purview | DZH318Z08M22/0002 | Standard | 0 | Consumption | 1 Hour | 0.411 | |
westeurope | USD | 2021-09-28T00:00:00Z | false | EU West | 811e3118-5380-5ee8-a5d9-01d48d0a0627 | Standard vCore | DZH318Z08M23 | Azure Purview Scanning Ingestion and Classification | 0.63 | Analytics | DZH318Q66D0F | Azure Purview | DZH318Z08M23/0009 | Standard | 0 | Consumption | 1 Hour | 0.63 |
5.5 - http_request_post plugin
services: data-explorer
http_request_post plugin
The http_request_post
plugin sends an HTTP POST request and converts the response into a table.
Prerequisites
- Run
.enable plugin http_request_post
to enable the plugin - Set the URI to access as an allowed destination for
webapi
in the Callout policy
Syntax
evaluate
http_request_post
(
Uri [,
RequestHeaders [,
Options [,
Content]]] )
Parameters
Name | Type | Required | Description |
---|---|---|---|
Uri | string | ✔️ | The destination URI for the HTTP or HTTPS request. |
RequestHeaders | dynamic | A property bag containing HTTP headers to send with the request. | |
Options | dynamic | A property bag containing additional properties of the request. | |
Content | string | The body content to send with the request. The content is encoded in UTF-8 and the media type for the Content-Type attribute is application/json . |
Authentication and authorization
To authenticate, use the HTTP standard Authorization
header or any custom header supported by the web service.
Returns
The plugin returns a table that has a single record with the following dynamic columns:
- ResponseHeaders: A property bag with the response header.
- ResponseBody: The response body parsed as a value of type
dynamic
.
If the HTTP response indicates (via the Content-Type
response header) that the media type is application/json
,
the response body is automatically parsed as-if it’s a JSON object. Otherwise, it’s returned as-is.
Headers
The RequestHeaders argument can be used to add custom headers to the outgoing HTTP request. In addition to the standard HTTP request headers and the user-provided custom headers, the plugin also adds the following custom headers:
Name | Description |
---|---|
x-ms-client-request-id | A correlation ID that identifies the request. Multiple invocations of the plugin in the same query will all have the same ID. |
x-ms-readonly | A flag indicating that the processor of this request shouldn’t make any persistent changes. |
Example
The following example is for a hypothetical HTTPS web service that accepts additional request headers and must be authenticated to using Microsoft Entra ID:
let uri='https://example.com/node/js/on/eniac';
let headers=dynamic({'x-ms-correlation-vector':'abc.0.1.0', 'authorization':'bearer ...Azure-AD-bearer-token-for-target-endpoint...'});
evaluate http_request_post(uri, headers)
5.6 - mysql_request plugin
The mysql_request
plugin sends a SQL query to an Azure MySQL Server network endpoint and returns the first rowset in the results. The query may return more than one rowset, but only the first rowset is made available for the rest of the Kusto query.
The plugin is invoked with the evaluate
operator.
Syntax
evaluate
mysql_request
(
ConnectionString ,
SqlQuery [,
SqlParameters] )
[:
OutputSchema]
Parameters
Name | Type | Required | Description |
---|---|---|---|
ConnectionString | string | ✔️ | The connection string that points at the MySQL Server network endpoint. See authentication and how to specify the network endpoint. |
SqlQuery | string | ✔️ | The query that is to be executed against the SQL endpoint. Must return one or more row sets. Only the first set is made available for the rest of the query. |
SqlParameters | dynamic | A property bag object that holds key-value pairs to pass as parameters along with the query. | |
OutputSchema | The names and types for the expected columns of the mysql_request plugin output.Syntax: ( ColumnName : ColumnType [, …] ) |
Authentication and authorization
To authorize to a MySQL Server network endpoint, you need to specify the authorization information in the connection string. The supported authorization method is via username and password.
Set callout policy
The plugin makes callouts to the MySql database. Make sure that the cluster’s callout policy enables calls of type mysql
to the target MySqlDbUri.
The following example shows how to define the callout policy for MySQL databases. We recommend restricting the callout policy to specific endpoints (my_endpoint1
, my_endpoint2
).
[
{
"CalloutType": "mysql",
"CalloutUriRegex": "my_endpoint1\\.mysql\\.database\\.azure\\.com",
"CanCall": true
},
{
"CalloutType": "mysql",
"CalloutUriRegex": "my_endpoint2\\.mysql\\.database\\.azure\\.com",
"CanCall": true
}
]
The following example shows an .alter callout policy
command for mysql
CalloutType:
.alter cluster policy callout @'[{"CalloutType": "mysql", "CalloutUriRegex": "\\.mysql\\.database\\.azure\\.com", "CanCall": true}]'
Username and password authentication
The mysql_request
plugin only supports username and password authentication to the MySQL server endpoint and doesn’t integrate with Microsoft Entra authentication.
The username and password are provided as part of the connections string using the following parameters:
User ID=...; Password=...;
Encryption and server validation
For security, SslMode
is unconditionally set to Required
when connecting to a MySQL server network endpoint. As a result, the server must be configured with a valid SSL/TLS server certificate.
Specify the network endpoint
Specify the MySQL network endpoint as part of the connection string.
Syntax:
Server
=
FQDN [Port
=
Port]
Where:
- FQDN is the fully qualified domain name of the endpoint.
- Port is the TCP port of the endpoint. By default,
3306
is assumed.
Examples
SQL query to Azure MySQL DB
The following example sends a SQL query to an Azure MySQL database. It retrieves all records from [dbo].[Table]
, and then processes the results.
evaluate mysql_request(
'Server=contoso.mysql.database.azure.com; Port = 3306;'
'Database=Fabrikam;'
h'UID=USERNAME;'
h'Pwd=PASSWORD;',
'select * from `dbo`.`Table`') : (Id: int, Name: string)
| where Id > 0
| project Name
SQL query to an Azure MySQL database with modifications
The following example sends a SQL query to an Azure MySQL database
retrieving all records from [dbo].[Table]
, while appending another datetime
column,
and then processes the results on the Kusto side.
It specifies a SQL parameter (@param0
) to be used in the SQL query.
evaluate mysql_request(
'Server=contoso.mysql.database.azure.com; Port = 3306;'
'Database=Fabrikam;'
h'UID=USERNAME;'
h'Pwd=PASSWORD;',
'select *, @param0 as dt from `dbo`.`Table`',
dynamic({'param0': datetime(2020-01-01 16:47:26.7423305)})) : (Id:long, Name:string, dt: datetime)
| where Id > 0
| project Name
SQL query to an Azure MySQL database without a query-defined output schema
The following example sends a SQL query to an Azure MySQL database without an output schema. This is not recommended unless the schema is unknown, as it may impact the performance of the query.
evaluate mysql_request(
'Server=contoso.mysql.database.azure.com; Port = 3306;'
'Database=Fabrikam;'
h'UID=USERNAME;'
h'Pwd=PASSWORD;',
'select * from `dbo`.`Table`')
| where Id > 0
| project Name
5.7 - postgresql_request plugin
The postgresql_request
plugin sends a SQL query to an Azure PostgreSQL Server network endpoint and returns the first rowset in the results. The query may return more than one rowset, but only the first rowset is made available for the rest of the Kusto query.
The plugin is invoked with the evaluate
operator.
Syntax
evaluate
postgresql_request
(
ConnectionString ,
SqlQuery [,
SqlParameters] )
[:
OutputSchema]
Parameters
Name | Type | Required | Description |
---|---|---|---|
ConnectionString | string | ✔️ | The connection string that points at the PostgreSQL Server network endpoint. See authentication and how to specify the network endpoint. |
SqlQuery | string | ✔️ | The query that is to be executed against the SQL endpoint. Must return one or more row sets. Only the first set is made available for the rest of the query. |
SqlParameters | dynamic | A property bag object that holds key-value pairs to pass as parameters along with the query. | |
OutputSchema | The names and types for the expected columns of the postgresql_request plugin output.Syntax: ( ColumnName : ColumnType [, …] ) |
Authentication and authorization
To authorize a PostgreSQL Server network endpoint, you must specify the authorization information in the connection string. The supported authorization method is via username and password.
Set callout policy
The plugin makes callouts to the PostgreSQL database. Make sure that the cluster’s callout policy enables calls of type postgresql
to the target PostgreSqlDbUri.
The following example shows how to define the callout policy for PostgreSQL databases. We recommend restricting the callout policy to specific endpoints (my_endpoint1
, my_endpoint2
).
[
{
"CalloutType": "postgresql",
"CalloutUriRegex": "my_endpoint1\\.postgres\\.database\\.azure\\.com",
"CanCall": true
},
{
"CalloutType": "postgresql",
"CalloutUriRegex": "my_endpoint2\\.postgres\\.database\\.azure\\.com",
"CanCall": true
}
]
The following example shows a .alter callout policy
command for postgresql
CalloutType:
.alter cluster policy callout @'[{"CalloutType": "postgresql", "CalloutUriRegex": "\\.postgresql\\.database\\.azure\\.com", "CanCall": true}]'
Username and password authentication
The postgresql_request
plugin only supports username and password authentication to the PostgreSQL server endpoint and doesn’t integrate with Microsoft Entra authentication.
The username and password are provided as part of the connections string using the following parameters:
User ID=...; Password=...;
Encryption and server validation
For security, SslMode
is unconditionally set to Required
when connecting to a PostgreSQL server network endpoint. As a result, the server must be configured with a valid SSL/TLS server certificate.
Specify the network endpoint
Specify the PostgreSQL network endpoint as part of the connection string.
Syntax:
Host
=
FQDN [Port
=
Port]
Where:
- FQDN is the fully qualified domain name of the endpoint.
- Port is the TCP port of the endpoint.
Examples
SQL query to Azure PostgreSQL DB
The following example sends a SQL query to an Azure PostgreSQL database. It retrieves all records from public."Table"
, and then processes the results.
evaluate postgresql_request(
'Host=contoso.postgres.database.azure.com; Port = 5432;'
'Database=Fabrikam;'
h'User Id=USERNAME;'
h'Password=PASSWORD;',
'select * from public."Table"') : (Id: int, Name: string)
| where Id > 0
| project Name
SQL query to an Azure PostgreSQL database with modifications
The following example sends a SQL query to an Azure PostgreSQL database
retrieving all records from public."Table"
, while appending another datetime
column,
and then processes the results.
It specifies a SQL parameter (@param0
) to be used in the SQL query.
evaluate postgresql_request(
'Server=contoso.postgres.database.azure.com; Port = 5432;'
'Database=Fabrikam;'
h'User Id=USERNAME;'
h'Password=PASSWORD;',
'select *, @param0 as dt from public."Table"',
dynamic({'param0': datetime(2020-01-01 16:47:26.7423305)})) : (Id: int, Name: string, dt: datetime)
| where Id > 0
| project Name
SQL query to an Azure PostgreSQL database without a query-defined output schema
The following example sends a SQL query to an Azure PostgreSQL database without an output schema. This is not recommended unless the schema is unknown, as it may impact the performance of the query
evaluate postgresql_request(
'Host=contoso.postgres.database.azure.com; Port = 5432;'
'Database=Fabrikam;'
h'User Id=USERNAME;'
h'Password=PASSWORD;',
'select * from public."Table"')
| where Id > 0
| project Name
5.8 - sql_request plugin
The sql_request
plugin sends a SQL query to an Azure SQL Server network endpoint and returns the results.
If more than one rowset is returned by SQL, only the first one is used.
The plugin is invoked with the evaluate
operator.
Syntax
evaluate
sql_request
(
ConnectionString ,
SqlQuery [,
SqlParameters [,
Options]] )
[:
OutputSchema]
Parameters
Name | Type | Required | Description |
---|---|---|---|
ConnectionString | string | ✔️ | The connection string that points at the SQL Server network endpoint. See valid methods of authentication and how to specify the network endpoint. |
SqlQuery | string | ✔️ | The query that is to be executed against the SQL endpoint. The query must return one or more row sets, but only the first one is made available for the rest of the Kusto query. |
SqlParameters | dynamic | A property bag of key-value pairs to pass as parameters along with the query. | |
Options | dynamic | A property bag of key-value pairs to pass more advanced settings along with the query. Currently, only token can be set, to pass a caller-provided Microsoft Entra access token that is forwarded to the SQL endpoint for authentication. | |
OutputSchema | string | The names and types for the expected columns of the sql_request plugin output. Use the following syntax: ( ColumnName : ColumnType [, …] ) . |
Authentication and authorization
The sql_request plugin supports the following three methods of authentication to the SQL Server endpoint.
|Authentication method|Syntax|How|Description|
|–|–|–|
|Microsoft Entra integrated|Authentication="Active Directory Integrated"
|Add to the ConnectionString parameter.| The user or application authenticates via Microsoft Entra ID to your cluster, and the same token is used to access the SQL Server network endpoint.
The principal must have the appropriate permissions on the SQL resource to perform the requested action. For example, to read from the database the principal needs table SELECT permissions, and to write to an existing table the principal needs UPDATE and INSERT permissions. To write to a new table, CREATE permissions are also required.|
|Managed identity|Authentication="Active Directory Managed Identity";User Id={object_id}
|Add to the ConnectionString parameter.| The request is executed on behalf of a managed identity. The managed identity must have the appropriate permissions on the SQL resource to perform the requested action.
To enable managed identity authentication, you must add the managed identity to your cluster and alter the managed identity policy. For more information, see Managed Identity policy. |
|Username and password|User ID=...; Password=...;
|Add to the ConnectionString parameter.|When possible, avoid this method as it may be less secure.|
|Microsoft Entra access token|dynamic({'token': h"eyJ0..."})
|Add in the Options parameter.|The access token is passed as token
property in the Options argument of the plugin.|
Examples
Send a SQL query using Microsoft Entra integrated authentication
The following example sends a SQL query to an Azure SQL DB database. It
retrieves all records from [dbo].[Table]
, and then processes the results on the
Kusto side. Authentication reuses the calling user’s Microsoft Entra token.
evaluate sql_request(
'Server=tcp:contoso.database.windows.net,1433;'
'Authentication="Active Directory Integrated";'
'Initial Catalog=Fabrikam;',
'select * from [dbo].[Table]') : (Id:long, Name:string)
| where Id > 0
| project Name
Send a SQL query using Username/Password authentication
The following example is identical to the previous one, except that SQL authentication is done by username/password. For confidentiality, we use obfuscated strings here.
evaluate sql_request(
'Server=tcp:contoso.database.windows.net,1433;'
'Initial Catalog=Fabrikam;'
h'User ID=USERNAME;'
h'Password=PASSWORD;',
'select * from [dbo].[Table]') : (Id:long, Name:string)
| where Id > 0
| project Name
Send a SQL query using a Microsoft Entra access token
The following example sends a SQL query to an Azure SQL database
retrieving all records from [dbo].[Table]
, while appending another datetime
column,
and then processes the results on the Kusto side.
It specifies a SQL parameter (@param0
) to be used in the SQL query.
evaluate sql_request(
'Server=tcp:contoso.database.windows.net,1433;'
'Authentication="Active Directory Integrated";'
'Initial Catalog=Fabrikam;',
'select *, @param0 as dt from [dbo].[Table]',
dynamic({'param0': datetime(2020-01-01 16:47:26.7423305)})) : (Id:long, Name:string, dt: datetime)
| where Id > 0
| project Name
Send a SQL query without a query-defined output schema
The following example sends a SQL query to an Azure SQL database without an output schema. This is not recommended unless the schema is unknown, as it may impact the performance of the query
evaluate sql_request(
'Server=tcp:contoso.database.windows.net,1433;'
'Initial Catalog=Fabrikam;'
h'User ID=USERNAME;'
h'Password=PASSWORD;',
'select * from [dbo].[Table]')
| where Id > 0
| project Name
Encryption and server validation
The following connection properties are forced when connecting to a SQL Server network endpoint, for security reasons.
Encrypt
is set totrue
unconditionally.TrustServerCertificate
is set tofalse
unconditionally.
As a result, the SQL Server must be configured with a valid SSL/TLS server certificate.
Specify the network endpoint
Specifying the SQL network endpoint as part of the connection string is mandatory. The appropriate syntax is:
Server
=
tcp:
FQDN [,
Port]
Where:
- FQDN is the fully qualified domain name of the endpoint.
- Port is the TCP port of the endpoint. By default,
1433
is assumed.
6 - User and sequence analytics plugins
6.1 - active_users_count plugin
Calculates distinct count of values, where each value has appeared in at least a minimum number of periods in a lookback period.
Useful for calculating distinct counts of “fans” only, while not including appearances of “non-fans”. A user is counted as a “fan” only if it was active during the lookback period. The lookback period is only used to determine whether a user is considered active
(“fan”) or not. The aggregation itself doesn’t include users from the lookback window. In comparison, the sliding_window_counts aggregation is performed over a sliding window of the lookback period.
Syntax
T | evaluate
active_users_count(
IdColumn,
TimelineColumn,
Start,
End,
LookbackWindow,
Period,
ActivePeriodsCount,
Bin ,
[dim1,
dim2,
…])
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input used to count active users. |
IdColumn | string | ✔️ | The name of the column with ID values that represent user activity. |
TimelineColumn | string | ✔️ | The name of the column that represents timeline. |
Start | datetime | ✔️ | The analysis start period. |
End | datetime | ✔️ | The analysis end period. |
LookbackWindow | timespan | ✔️ | The time window defining a period where user appearance is checked. The lookback period starts at ([current appearance] - [lookback window]) and ends on ([current appearance]). |
Period | timespan | ✔️ | A constant to count as single appearance (a user will be counted as active if it appears in at least distinct ActivePeriodsCount of this timespan. |
ActivePeriodsCount | decimal | ✔️ | The minimal number of distinct active periods to decide if user is active. Active users are those users who appeared in at least (equal or greater than) active periods count. |
Bin | decimal, datetime, or timespan | ✔️ | A constant value of the analysis step period. May also be a string of week , month , or year . All periods will be the corresponding startofweek, startofmonth, orstartofyear functions. |
dim1, dim2, … | dynamic | An array of the dimensions columns that slice the activity metrics calculation. |
Returns
Returns a table that has distinct count values for IDs that have appeared in ActivePeriodCounts in the following periods: the lookback period, each timeline period, and each existing dimensions combination.
Output table schema is:
TimelineColumn | dim1 | .. | dim_n | dcount_values |
---|---|---|---|---|
type: as of TimelineColumn | .. | .. | .. | long |
Examples
Calculate weekly number of distinct users that appeared in at least three different days over a period of prior eight days. Period of analysis: July 2018.
let Start = datetime(2018-07-01);
let End = datetime(2018-07-31);
let LookbackWindow = 8d;
let Period = 1d;
let ActivePeriods = 3;
let Bin = 7d;
let T = datatable(User:string, Timestamp:datetime)
[
"B", datetime(2018-06-29),
"B", datetime(2018-06-30),
"A", datetime(2018-07-02),
"B", datetime(2018-07-04),
"B", datetime(2018-07-08),
"A", datetime(2018-07-10),
"A", datetime(2018-07-14),
"A", datetime(2018-07-17),
"A", datetime(2018-07-20),
"B", datetime(2018-07-24)
];
T | evaluate active_users_count(User, Timestamp, Start, End, LookbackWindow, Period, ActivePeriods, Bin)
Output
Timestamp | dcount |
---|---|
2018-07-01 00:00:00.0000000 | 1 |
2018-07-15 00:00:00.0000000 | 1 |
A user is considered active if it fulfills both of the following criteria:
- The user was seen in at least three distinct days (Period = 1d, ActivePeriods=3).
- The user was seen in a lookback window of 8d before and including their current appearance.
In the illustration below, the only appearances that are active by this criteria are the following instances: User A on 7/20 and User B on 7/4 (see plugin results above). The appearances of User B are included for the lookback window on 7/4, but not for the Start-End time range of 6/29-30.
6.2 - activity_counts_metrics plugin
Calculates useful activity metrics for each time window compared/aggregated to all previous time windows. Metrics include: total count values, distinct count values, distinct count of new values, and aggregated distinct count. Compare this plugin to activity_metrics plugin, in which every time window is compared to its previous time window only.
Syntax
T | evaluate
activity_counts_metrics(
IdColumn,
TimelineColumn,
Start,
End,
Step [,
Dimensions])
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input used to count activities. |
IdColumn | string | ✔️ | The name of the column with ID values that represent user activity. |
TimelineColumn | string | ✔️ | The name of the column that represents the timeline. |
Start | datetime | ✔️ | The analysis start period. |
End | datetime | ✔️ | The analysis end period. |
Step | decimal, datetime, or timespan | ✔️ | The analysis window period. The value may also be a string of week , month , or year , in which case all periods would be startofweek, startofmonth, or startofyear. |
Dimensions | string | Zero or more comma-separated dimensions columns that slice the activity metrics calculation. |
Returns
Returns a table that has the total count values, distinct count values, distinct count of new values, and aggregated distinct count for each time window. If Dimensions are provided, then there’s another column for each dimension in the output table.
The following table describes the output table schema.
Column name | Type | Description |
---|---|---|
Timestamp | Same as the provided TimelineColumn argument | The time window start time. |
count | long | The total records count in the time window and dim(s) |
dcount | long | The distinct ID values count in the time window and dim(s) |
new_dcount | long | The distinct ID values in the time window and dim(s) compared to all previous time windows. |
aggregated_dcount | long | The total aggregated distinct ID values of dim(s) from first-time window to current (inclusive). |
Examples
Daily activity counts
The next query calculates daily activity counts for the provided input table.
let start=datetime(2017-08-01);
let end=datetime(2017-08-04);
let window=1d;
let T = datatable(UserId:string, Timestamp:datetime)
[
'A', datetime(2017-08-01),
'D', datetime(2017-08-01),
'J', datetime(2017-08-01),
'B', datetime(2017-08-01),
'C', datetime(2017-08-02),
'T', datetime(2017-08-02),
'J', datetime(2017-08-02),
'H', datetime(2017-08-03),
'T', datetime(2017-08-03),
'T', datetime(2017-08-03),
'J', datetime(2017-08-03),
'B', datetime(2017-08-03),
'S', datetime(2017-08-03),
'S', datetime(2017-08-04),
];
T
| evaluate activity_counts_metrics(UserId, Timestamp, start, end, window)
Output
Timestamp | count | dcount | new_dcount | aggregated_dcount |
---|---|---|---|---|
2017-08-01 00:00:00.0000000 | 4 | 4 | 4 | 4 |
2017-08-02 00:00:00.0000000 | 3 | 3 | 2 | 6 |
2017-08-03 00:00:00.0000000 | 6 | 5 | 2 | 8 |
2017-08-04 00:00:00.0000000 | 1 | 1 | 0 | 8 |
6.3 - activity_engagement plugin
Calculates activity engagement ratio based on ID column over a sliding timeline window.
The activity_engagement plugin can be used for calculating DAU/WAU/MAU (daily/weekly/monthly activities).
Syntax
T | evaluate
activity_engagement(
IdColumn,
TimelineColumn,
[Start,
End,
] InnerActivityWindow,
OuterActivityWindow [,
dim1,
dim2,
…])
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The tabular input used to calculate engagement. |
IdCoumn | string | ✔️ | The name of the column with ID values that represent user activity. |
TimelineColumn | string | ✔️ | The name of the column that represents timeline. |
Start | datetime | The analysis start period. | |
End | datetime | The analysis end period. | |
InnerActivityWindow | timespan | ✔️ | The inner-scope analysis window period. |
OuterActivityWindow | timespan | ✔️ | The outer-scope analysis window period. |
dim1, dim2, … | dynamic | An array of the dimensions columns that slice the activity metrics calculation. |
Returns
Returns a table that has a distinct count of ID values inside an inner-scope window, inside an outer-scope window, and the activity ratio for each inner-scope window period for each existing dimensions combination.
Output table schema is:
TimelineColumn | dcount_activities_inner | dcount_activities_outer | activity_ratio | dim1 | .. | dim_n | |||
---|---|---|---|---|---|---|---|---|---|
type: as of TimelineColumn | long | long | double | .. | .. | .. |
Examples
DAU/WAU calculation
The following example calculates DAU/WAU (Daily Active Users / Weekly Active Users ratio) over a randomly generated data.
// Generate random data of user activities
let _start = datetime(2017-01-01);
let _end = datetime(2017-01-31);
range _day from _start to _end step 1d
| extend d = tolong((_day - _start)/1d)
| extend r = rand()+1
| extend _users=range(tolong(d*50*r), tolong(d*50*r+100*r-1), 1)
| mv-expand id=_users to typeof(long) limit 1000000
// Calculate DAU/WAU ratio
| evaluate activity_engagement(['id'], _day, _start, _end, 1d, 7d)
| project _day, Dau_Wau=activity_ratio*100
| render timechart
:::image type=“content” source=“media/activity-engagement-plugin/activity-engagement-dau-wau.png” border=“false” alt-text=“Graph displaying the ratio of daily active users to weekly active users as specified in the query.”:::
DAU/MAU calculation
The following example calculates DAU/WAU (Daily Active Users / Weekly Active Users ratio) over a randomly generated data.
// Generate random data of user activities
let _start = datetime(2017-01-01);
let _end = datetime(2017-05-31);
range _day from _start to _end step 1d
| extend d = tolong((_day - _start)/1d)
| extend r = rand()+1
| extend _users=range(tolong(d*50*r), tolong(d*50*r+100*r-1), 1)
| mv-expand id=_users to typeof(long) limit 1000000
// Calculate DAU/MAU ratio
| evaluate activity_engagement(['id'], _day, _start, _end, 1d, 30d)
| project _day, Dau_Mau=activity_ratio*100
| render timechart
:::image type=“content” source=“media/activity-engagement-plugin/activity-engagement-dau-mau.png” border=“false” alt-text=“Graph displaying the ratio of daily active users to monthly active users as specified in the query.”:::
DAU/MAU calculation with additional dimensions
The following example calculates DAU/WAU (Daily Active Users / Weekly Active Users ratio) over a randomly generated data with additional dimension (mod3
).
// Generate random data of user activities
let _start = datetime(2017-01-01);
let _end = datetime(2017-05-31);
range _day from _start to _end step 1d
| extend d = tolong((_day - _start)/1d)
| extend r = rand()+1
| extend _users=range(tolong(d*50*r), tolong(d*50*r+100*r-1), 1)
| mv-expand id=_users to typeof(long) limit 1000000
| extend mod3 = strcat("mod3=", id % 3)
// Calculate DAU/MAU ratio
| evaluate activity_engagement(['id'], _day, _start, _end, 1d, 30d, mod3)
| project _day, Dau_Mau=activity_ratio*100, mod3
| render timechart
:::image type=“content” source=“media/activity-engagement-plugin/activity-engagement-dau-mau-mod3.png” border=“false” alt-text=“Graph displaying the ratio of daily active users to monthly active users with modulo 3 as specified in the query.”:::
6.4 - activity_metrics plugin
Calculates useful metrics that include distinct count values, distinct count of new values, retention rate, and churn rate. This plugin is different from activity_counts_metrics plugin in which every time window is compared to all previous time windows.
Syntax
T | evaluate
activity_metrics(
IdColumn,
TimelineColumn,
[Start,
End,
] Window [,
dim1,
dim2,
…])
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input used to calculate activity metrics. |
IdCoumn | string | ✔️ | The name of the column with ID values that represent user activity. |
TimelineColumn | string | ✔️ | The name of the column that represents timeline. |
Start | datetime | ✔️ | The analysis start period. |
End | datetime | ✔️ | The analysis end period. |
Step | decimal, datetime, or timespan | ✔️ | The analysis window period. This value may also be a string of week , month , or year , in which case all periods will be startofweek, startofmonth, or startofyear respectively. |
dim1, dim2, … | dynamic | An array of the dimensions columns that slice the activity metrics calculation. |
Returns
The plugin returns a table with the distinct count values, distinct count of new values, retention rate, and churn rate for each timeline period for each existing dimensions combination.
Output table schema is:
TimelineColumn | dcount_values | dcount_newvalues | retention_rate | churn_rate | dim1 | .. | dim_n | |||
---|---|---|---|---|---|---|---|---|---|---|
type: as of TimelineColumn | long | long | double | double | .. | .. | .. |
Notes
Retention Rate Definition
Retention Rate
over a period is calculated as:
where the # of customers returned during the period
is defined as:
Retention Rate
can vary from 0.0 to 1.0
A higher score means a larger number of returning users.
Churn Rate Definition
Churn Rate
over a period is calculated as:
where the # of customer lost in the period
is defined as:
Churn Rate
can vary from 0.0 to 1.0
The higher score means the larger number of users are NOT returning to the service.
Churn vs. Retention Rate
The churn vs. retention Rate is derived from the definition of Churn Rate
and Retention Rate
. The following calculation is always true:
Examples
Weekly retention rate and churn rate
The next query calculates retention and churn rate for week-over-week window.
// Generate random data of user activities
let _start = datetime(2017-01-02);
let _end = datetime(2017-05-31);
range _day from _start to _end step 1d
| extend d = tolong((_day - _start)/1d)
| extend r = rand()+1
| extend _users=range(tolong(d*50*r), tolong(d*50*r+200*r-1), 1)
| mv-expand id=_users to typeof(long) limit 1000000
//
| evaluate activity_metrics(['id'], _day, _start, _end, 7d)
| project _day, retention_rate, churn_rate
| render timechart
Output
_day | retention_rate | churn_rate |
---|---|---|
2017-01-02 00:00:00.0000000 | NaN | NaN |
2017-01-09 00:00:00.0000000 | 0.179910044977511 | 0.820089955022489 |
2017-01-16 00:00:00.0000000 | 0.744374437443744 | 0.255625562556256 |
2017-01-23 00:00:00.0000000 | 0.612096774193548 | 0.387903225806452 |
2017-01-30 00:00:00.0000000 | 0.681141439205955 | 0.318858560794045 |
2017-02-06 00:00:00.0000000 | 0.278145695364238 | 0.721854304635762 |
2017-02-13 00:00:00.0000000 | 0.223172628304821 | 0.776827371695179 |
2017-02-20 00:00:00.0000000 | 0.38 | 0.62 |
2017-02-27 00:00:00.0000000 | 0.295519001701645 | 0.704480998298355 |
2017-03-06 00:00:00.0000000 | 0.280387770320656 | 0.719612229679344 |
2017-03-13 00:00:00.0000000 | 0.360628154795289 | 0.639371845204711 |
2017-03-20 00:00:00.0000000 | 0.288008028098344 | 0.711991971901656 |
2017-03-27 00:00:00.0000000 | 0.306134969325153 | 0.693865030674847 |
2017-04-03 00:00:00.0000000 | 0.356866537717602 | 0.643133462282398 |
2017-04-10 00:00:00.0000000 | 0.495098039215686 | 0.504901960784314 |
2017-04-17 00:00:00.0000000 | 0.198296836982968 | 0.801703163017032 |
2017-04-24 00:00:00.0000000 | 0.0618811881188119 | 0.938118811881188 |
2017-05-01 00:00:00.0000000 | 0.204657727593507 | 0.795342272406493 |
2017-05-08 00:00:00.0000000 | 0.517391304347826 | 0.482608695652174 |
2017-05-15 00:00:00.0000000 | 0.143667296786389 | 0.856332703213611 |
2017-05-22 00:00:00.0000000 | 0.199122325836533 | 0.800877674163467 |
2017-05-29 00:00:00.0000000 | 0.063468992248062 | 0.936531007751938 |
:::image type=“content” source=“media/activity-metrics-plugin/activity-metrics-churn-and-retention.png” border=“false” alt-text=“Table showing the calculated retention and churn rates per seven days as specified in the query.”:::
Distinct values and distinct ’new’ values
The next query calculates distinct values and ’new’ values (IDs that didn’t appear in previous time window) for week-over-week window.
// Generate random data of user activities
let _start = datetime(2017-01-02);
let _end = datetime(2017-05-31);
range _day from _start to _end step 1d
| extend d = tolong((_day - _start)/1d)
| extend r = rand()+1
| extend _users=range(tolong(d*50*r), tolong(d*50*r+200*r-1), 1)
| mv-expand id=_users to typeof(long) limit 1000000
//
| evaluate activity_metrics(['id'], _day, _start, _end, 7d)
| project _day, dcount_values, dcount_newvalues
| render timechart
Output
_day | dcount_values | dcount_newvalues |
---|---|---|
2017-01-02 00:00:00.0000000 | 630 | 630 |
2017-01-09 00:00:00.0000000 | 738 | 575 |
2017-01-16 00:00:00.0000000 | 1187 | 841 |
2017-01-23 00:00:00.0000000 | 1092 | 465 |
2017-01-30 00:00:00.0000000 | 1261 | 647 |
2017-02-06 00:00:00.0000000 | 1744 | 1043 |
2017-02-13 00:00:00.0000000 | 1563 | 432 |
2017-02-20 00:00:00.0000000 | 1406 | 818 |
2017-02-27 00:00:00.0000000 | 1956 | 1429 |
2017-03-06 00:00:00.0000000 | 1593 | 848 |
2017-03-13 00:00:00.0000000 | 1801 | 1423 |
2017-03-20 00:00:00.0000000 | 1710 | 1017 |
2017-03-27 00:00:00.0000000 | 1796 | 1516 |
2017-04-03 00:00:00.0000000 | 1381 | 1008 |
2017-04-10 00:00:00.0000000 | 1756 | 1162 |
2017-04-17 00:00:00.0000000 | 1831 | 1409 |
2017-04-24 00:00:00.0000000 | 1823 | 1164 |
2017-05-01 00:00:00.0000000 | 1811 | 1353 |
2017-05-08 00:00:00.0000000 | 1691 | 1246 |
2017-05-15 00:00:00.0000000 | 1812 | 1608 |
2017-05-22 00:00:00.0000000 | 1740 | 1017 |
2017-05-29 00:00:00.0000000 | 960 | 756 |
:::image type=“content” source=“media/activity-metrics-plugin/activity-metrics-dcount-and-dcount-newvalues.png” border=“false” alt-text=“Table showing the count of distinct values (dcount_values) and of new distinct values (dcount_newvalues) that didn’t appear in previous time window as specified in the query.”:::
6.5 - funnel_sequence plugin
Calculates distinct count of users who have taken a sequence of states, and the distribution of previous/next states that have led to/were followed by the sequence. The plugin is invoked with the evaluate
operator.
Syntax
T | evaluate
funnel_sequence(
IdColumn,
TimelineColumn,
Start,
End,
MaxSequenceStepWindow, Step, StateColumn, Sequence)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input tabular expression. |
IdColum | string | ✔️ | The column reference representing the ID. This column must be present in T. |
TimelineColumn | string | ✔️ | The column reference representing the timeline. This column must be present in T. |
Start | datetime, timespan, or long | ✔️ | The analysis start period. |
End | datetime, timespan, or long | ✔️ | The analysis end period. |
MaxSequenceStepWindow | datetime, timespan, or long | ✔️ | The value of the max allowed timespan between two sequential steps in the sequence. |
Step | datetime, timespan, or long | ✔️ | The analysis step period, or bin. |
StateColumn | string | ✔️ | The column reference representing the state. This column must be present in T. |
Sequence | dynamic | ✔️ | An array with the sequence values that are looked up in StateColumn . |
Returns
Returns three output tables, which are useful for constructing a sankey diagram for the analyzed sequence:
Table #1 - prev-sequence-next
dcount
- TimelineColumn: the analyzed time window
- prev: the prev state (may be empty if there were any users that only had events for the searched sequence, but not any events prior to it).
- next: the next state (may be empty if there were any users that only had events for the searched sequence, but not any events that followed it).
dcount
: distinct count ofIdColumn
in time window that transitionedprev
–>Sequence
–>next
.- samples: an array of IDs (from
IdColumn
) corresponding to the row’s sequence (a maximum of 128 IDs are returned).
Table #2 - prev-sequence
dcount
- TimelineColumn: the analyzed time window
- prev: the prev state (may be empty if there were any users that only had events for the searched sequence, but not any events prior to it).
dcount
: distinct count ofIdColumn
in time window that transitionedprev
–>Sequence
–>next
.- samples: an array of IDs (from
IdColumn
) corresponding to the row’s sequence (a maximum of 128 IDs are returned).
Table #3 - sequence-next
dcount
- TimelineColumn: the analyzed time window
- next: the next state (may be empty if there were any users that only had events for the searched sequence, but not any events that followed it).
dcount
: distinct count ofIdColumn
in time window that transitionedprev
–>Sequence
–>next
.- samples: an array of IDs (from
IdColumn
) corresponding to the row’s sequence (a maximum of 128 IDs are returned).
Examples
Exploring storm events
The following query looks at the table StormEvents (weather statistics for 2007) and shows which events happened before/after all Tornado events occurred in 2007.
// Looking on StormEvents statistics:
// Q1: What happens before Tornado event?
// Q2: What happens after Tornado event?
StormEvents
| evaluate funnel_sequence(EpisodeId, StartTime, datetime(2007-01-01), datetime(2008-01-01), 1d,365d, EventType, dynamic(['Tornado']))
Result includes three tables:
- Table #1: All possible variants of what happened before and after the sequence. For example, the second line means that there were 87 different events that had following sequence:
Hail
->Tornado
->Hail
StartTime | prev | next | dcount |
---|---|---|---|
2007-01-01 00:00:00.0000000 | 293 | ||
2007-01-01 00:00:00.0000000 | Hail | Hail | 87 |
2007-01-01 00:00:00.0000000 | Thunderstorm Wind | Thunderstorm Wind | 77 |
2007-01-01 00:00:00.0000000 | Hail | Thunderstorm Wind | 28 |
2007-01-01 00:00:00.0000000 | Hail | 28 | |
2007-01-01 00:00:00.0000000 | Hail | 27 | |
2007-01-01 00:00:00.0000000 | Thunderstorm Wind | 25 | |
2007-01-01 00:00:00.0000000 | Thunderstorm Wind | Hail | 24 |
2007-01-01 00:00:00.0000000 | Thunderstorm Wind | 24 | |
2007-01-01 00:00:00.0000000 | Flash Flood | Flash Flood | 12 |
2007-01-01 00:00:00.0000000 | Thunderstorm Wind | Flash Flood | 8 |
2007-01-01 00:00:00.0000000 | Flash Flood | 8 | |
2007-01-01 00:00:00.0000000 | Funnel Cloud | Thunderstorm Wind | 6 |
2007-01-01 00:00:00.0000000 | Funnel Cloud | 6 | |
2007-01-01 00:00:00.0000000 | Flash Flood | 6 | |
2007-01-01 00:00:00.0000000 | Funnel Cloud | Funnel Cloud | 6 |
2007-01-01 00:00:00.0000000 | Hail | Flash Flood | 4 |
2007-01-01 00:00:00.0000000 | Flash Flood | Thunderstorm Wind | 4 |
2007-01-01 00:00:00.0000000 | Hail | Funnel Cloud | 4 |
2007-01-01 00:00:00.0000000 | Funnel Cloud | Hail | 4 |
2007-01-01 00:00:00.0000000 | Funnel Cloud | 4 | |
2007-01-01 00:00:00.0000000 | Thunderstorm Wind | Funnel Cloud | 3 |
2007-01-01 00:00:00.0000000 | Heavy Rain | Thunderstorm Wind | 2 |
2007-01-01 00:00:00.0000000 | Flash Flood | Funnel Cloud | 2 |
2007-01-01 00:00:00.0000000 | Flash Flood | Hail | 2 |
2007-01-01 00:00:00.0000000 | Strong Wind | Thunderstorm Wind | 1 |
2007-01-01 00:00:00.0000000 | Heavy Rain | Flash Flood | 1 |
2007-01-01 00:00:00.0000000 | Heavy Rain | Hail | 1 |
2007-01-01 00:00:00.0000000 | Hail | Flood | 1 |
2007-01-01 00:00:00.0000000 | Lightning | Hail | 1 |
2007-01-01 00:00:00.0000000 | Heavy Rain | Lightning | 1 |
2007-01-01 00:00:00.0000000 | Funnel Cloud | Heavy Rain | 1 |
2007-01-01 00:00:00.0000000 | Flash Flood | Flood | 1 |
2007-01-01 00:00:00.0000000 | Flood | Flash Flood | 1 |
2007-01-01 00:00:00.0000000 | Heavy Rain | 1 | |
2007-01-01 00:00:00.0000000 | Funnel Cloud | Lightning | 1 |
2007-01-01 00:00:00.0000000 | Lightning | Thunderstorm Wind | 1 |
2007-01-01 00:00:00.0000000 | Flood | Thunderstorm Wind | 1 |
2007-01-01 00:00:00.0000000 | Hail | Lightning | 1 |
2007-01-01 00:00:00.0000000 | Lightning | 1 | |
2007-01-01 00:00:00.0000000 | Tropical Storm | Hurricane (Typhoon) | 1 |
2007-01-01 00:00:00.0000000 | Coastal Flood | 1 | |
2007-01-01 00:00:00.0000000 | Rip Current | 1 | |
2007-01-01 00:00:00.0000000 | Heavy Snow | 1 | |
2007-01-01 00:00:00.0000000 | Strong Wind | 1 |
- Table #2: shows all distinct events grouped by the previous event. For example, the second line shows that there were a total of 150 events of
Hail
that happened just beforeTornado
.
StartTime | prev | dcount |
---|---|---|
2007-01-01 00:00:00.0000000 | 331 | |
2007-01-01 00:00:00.0000000 | Hail | 150 |
2007-01-01 00:00:00.0000000 | Thunderstorm Wind | 135 |
2007-01-01 00:00:00.0000000 | Flash Flood | 28 |
2007-01-01 00:00:00.0000000 | Funnel Cloud | 22 |
2007-01-01 00:00:00.0000000 | Heavy Rain | 5 |
2007-01-01 00:00:00.0000000 | Flood | 2 |
2007-01-01 00:00:00.0000000 | Lightning | 2 |
2007-01-01 00:00:00.0000000 | Strong Wind | 2 |
2007-01-01 00:00:00.0000000 | Heavy Snow | 1 |
2007-01-01 00:00:00.0000000 | Rip Current | 1 |
2007-01-01 00:00:00.0000000 | Coastal Flood | 1 |
2007-01-01 00:00:00.0000000 | Tropical Storm | 1 |
- Table #3: shows all distinct events grouped by next event. For example, the second line shows that there were a total of 143 events of
Hail
that happened afterTornado
.
StartTime | next | dcount |
---|---|---|
2007-01-01 00:00:00.0000000 | 332 | |
2007-01-01 00:00:00.0000000 | Hail | 145 |
2007-01-01 00:00:00.0000000 | Thunderstorm Wind | 143 |
2007-01-01 00:00:00.0000000 | Flash Flood | 32 |
2007-01-01 00:00:00.0000000 | Funnel Cloud | 21 |
2007-01-01 00:00:00.0000000 | Lightning | 4 |
2007-01-01 00:00:00.0000000 | Heavy Rain | 2 |
2007-01-01 00:00:00.0000000 | Flood | 2 |
2007-01-01 00:00:00.0000000 | Hurricane (Typhoon) | 1 |
Now, let’s try to find out how the following sequence continues:Hail
-> Tornado
-> Thunderstorm Wind
StormEvents
| evaluate funnel_sequence(
EpisodeId,
StartTime,
datetime(2007-01-01),
datetime(2008-01-01),
1d,
365d,
EventType,
dynamic(['Hail', 'Tornado', 'Thunderstorm Wind'])
)
Skipping Table #1
and Table #2
, and looking at Table #3
, we can conclude that sequence Hail
-> Tornado
-> Thunderstorm Wind
in 92 events ended with this sequence, continued as Hail
in 41 events, and turned back to Tornado
in 14.
StartTime | next | dcount |
---|---|---|
2007-01-01 00:00:00.0000000 | 92 | |
2007-01-01 00:00:00.0000000 | Hail | 41 |
2007-01-01 00:00:00.0000000 | Tornado | 14 |
2007-01-01 00:00:00.0000000 | Flash Flood | 11 |
2007-01-01 00:00:00.0000000 | Lightning | 2 |
2007-01-01 00:00:00.0000000 | Heavy Rain | 1 |
2007-01-01 00:00:00.0000000 | Flood | 1 |
6.6 - funnel_sequence_completion plugin
Calculates a funnel of completed sequence steps while comparing different time periods. The plugin is invoked with the evaluate
operator.
Syntax
T | evaluate
funnel_sequence_completion(
IdColumn,
TimelineColumn,
Start,
End,
BinSize,
StateColumn,
Sequence,
MaxSequenceStepWindows)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input tabular expression. |
IdColum | string | ✔️ | The column reference representing the ID. The column must be present in T. |
TimelineColumn | string | ✔️ | The column reference representing the timeline. The column must be present in T. |
Start | datetime, timespan, or long | ✔️ | The analysis start period. |
End | datetime, timespan, or long | ✔️ | The analysis end period. |
BinSize | datetime, timespan, or long | ✔️ | The analysis window size. Each window is analyzed separately. |
StateColumn | string | ✔️ | The column reference representing the state. The column must be present in T. |
Sequence | dynamic | ✔️ | An array with the sequence values that are looked up in StateColumn . |
MaxSequenceStepPeriods | dynamic | ✔️ | An array with the values of the max allowed timespan between the first and last sequential steps in the sequence. Each period in the array generates a funnel analysis result. |
Returns
Returns a single table useful for constructing a funnel diagram for the analyzed sequence:
TimelineColumn
: the analyzed time window (bin), each bin in the analysis timeframe (Start to End) generates a funnel analysis separately.StateColumn
: the state of the sequence.Period
: the maximal period allowed for completing steps in the funnel sequence measured from the first step in the sequence. Each value in MaxSequenceStepPeriods generates a funnel analysis with a separate period.dcount
: distinct count ofIdColumn
in time window that transitioned from first sequence state to the value ofStateColumn
.
Examples
Exploring Storm Events
The following query checks the completion funnel of the sequence: Hail
-> Tornado
-> Thunderstorm Wind
in “overall” time of 1hour, 4hours, 1day.
let _start = datetime(2007-01-01);
let _end = datetime(2008-01-01);
let _windowSize = 365d;
let _sequence = dynamic(['Hail', 'Tornado', 'Thunderstorm Wind']);
let _periods = dynamic([1h, 4h, 1d]);
StormEvents
| evaluate funnel_sequence_completion(EpisodeId, StartTime, _start, _end, _windowSize, EventType, _sequence, _periods)
Output
StartTime | EventType | Period | dcount |
---|---|---|---|
2007-01-01 00:00:00.0000000 | Hail | 01:00:00 | 2877 |
2007-01-01 00:00:00.0000000 | Tornado | 01:00:00 | 208 |
2007-01-01 00:00:00.0000000 | Thunderstorm Wind | 01:00:00 | 87 |
2007-01-01 00:00:00.0000000 | Hail | 04:00:00 | 2877 |
2007-01-01 00:00:00.0000000 | Tornado | 04:00:00 | 231 |
2007-01-01 00:00:00.0000000 | Thunderstorm Wind | 04:00:00 | 141 |
2007-01-01 00:00:00.0000000 | Hail | 1.00:00:00 | 2877 |
2007-01-01 00:00:00.0000000 | Tornado | 1.00:00:00 | 244 |
2007-01-01 00:00:00.0000000 | Thunderstorm Wind | 1.00:00:00 | 155 |
Understanding the results:
The outcome is three funnels (for periods: One hour, 4 hours, and one day). For each funnel step, a number of distinct counts of are shown. You can see that the more time is given to complete the whole sequence of Hail
-> Tornado
-> Thunderstorm Wind
, the higher dcount
value is obtained. In other words, there were more occurrences of the sequence reaching the funnel step.
Related content
6.7 - new_activity_metrics plugin
Calculates useful activity metrics (distinct count values, distinct count of new values, retention rate, and churn rate) for the cohort of New Users
. Each cohort of New Users
(all users, which were first seen in time window) is compared to all prior cohorts.
Comparison takes into account all previous time windows. For example, for records from T2 to T3, the distinct count of users will be all users in T3 who weren’t seen in both T1 and T2.
The plugin is invoked with the evaluate
operator.
Syntax
TabularExpression | evaluate
new_activity_metrics(
IdColumn,
TimelineColumn,
Start,
End,
Window [,
Cohort] [,
dim1,
dim2,
…] [,
Lookback] )
Parameters
Name | Type | Required | Description |
---|---|---|---|
TabularExpression | string | ✔️ | The tabular expression for which to calculate activity metrics. |
IdColumn | string | ✔️ | The name of the column with ID values that represent user activity. |
TimelineColumn | string | ✔️ | The name of the column that represents the timeline. |
Start | scalar | ✔️ | The value of the analysis start period. |
End | scalar | ✔️ | The value of the analysis end period. |
Window | scalar | ✔️ | The value of the analysis window period. Can be a numeric, datetime, or timespan value, or a string that is one of week , month or year , in which case all periods will be startofweek/startofmonth/startofyear accordingly. When using startofweek , make sure start time is a Sunday, otherwise first cohort will be empty (since startofweek is considered to be a Sunday). |
Cohort | scalar | Indicates a specific cohort. If not provided, all cohorts corresponding to the analysis time window are calculated and returned. | |
dim1, dim2, … | dynamic | An array of the dimensions columns that slice the activity metrics calculation. | |
Lookback | string | A tabular expression with a set of IDs that belong to the ’look back’ period. |
Returns
Returns a table that contains the following for each combination of ‘from’ and ’to’ timeline periods and for each existing column (dimensions) combination:
- distinct count values
- distinct count of new values
- retention rate
- churn rate
Output table schema is:
from_TimelineColumn | to_TimelineColumn | dcount_new_values | dcount_retained_values | dcount_churn_values | retention_rate | churn_rate | dim1 | .. | dim_n |
---|---|---|---|---|---|---|---|---|---|
type: as of TimelineColumn | same | long | long | double | double | double | .. | .. | .. |
from_TimelineColumn
- the cohort of new users. Metrics in this record refer to all users who were first seen in this period. The decision on first seen takes into account all previous periods in the analysis period.to_TimelineColumn
- the period being compared to.dcount_new_values
- the number of distinct users into_TimelineColumn
that weren’t seen in all periods prior to and includingfrom_TimelineColumn
.dcount_retained_values
- out of all new users, first seen infrom_TimelineColumn
, the number of distinct users that were seen into_TimelineCoumn
.dcount_churn_values
- out of all new users, first seen infrom_TimelineColumn
, the number of distinct users that weren’t seen into_TimelineCoumn
.retention_rate
- the percent ofdcount_retained_values
out of the cohort (users first seen infrom_TimelineColumn
).churn_rate
- the percent ofdcount_churn_values
out of the cohort (users first seen infrom_TimelineColumn
).
Examples
The following sample dataset shows which users seen on which days. The table was generated based on a source Users
table, as follows:
Users | summarize tostring(make_set(user)) by bin(Timestamp, 1d) | order by Timestamp asc;
Output
Timestamp | set_user |
---|---|
2019-11-01 00:00:00.0000000 | [0,2,3,4] |
2019-11-02 00:00:00.0000000 | [0,1,3,4,5] |
2019-11-03 00:00:00.0000000 | [0,2,4,5] |
2019-11-04 00:00:00.0000000 | [0,1,2,3] |
2019-11-05 00:00:00.0000000 | [0,1,2,3,4] |
The output of the plugin for the original table is the following:
let StartDate = datetime(2019-11-01 00:00:00);
let EndDate = datetime(2019-11-07 00:00:00);
Users
| evaluate new_activity_metrics(user, Timestamp, StartDate, EndDate-1tick, 1d)
| where from_Timestamp < datetime(2019-11-03 00:00:00.0000000)
Output
R | from_Timestamp | to_Timestamp | dcount_new_values | dcount_retained_values | dcount_churn_values | retention_rate | churn_rate |
---|---|---|---|---|---|---|---|
1 | 2019-11-01 00:00:00.0000000 | 2019-11-01 00:00:00.0000000 | 4 | 4 | 0 | 1 | 0 |
2 | 2019-11-01 00:00:00.0000000 | 2019-11-02 00:00:00.0000000 | 2 | 3 | 1 | 0.75 | 0.25 |
3 | 2019-11-01 00:00:00.0000000 | 2019-11-03 00:00:00.0000000 | 1 | 3 | 1 | 0.75 | 0.25 |
4 | 2019-11-01 00:00:00.0000000 | 2019-11-04 00:00:00.0000000 | 1 | 3 | 1 | 0.75 | 0.25 |
5 | 2019-11-01 00:00:00.0000000 | 2019-11-05 00:00:00.0000000 | 1 | 4 | 0 | 1 | 0 |
6 | 2019-11-01 00:00:00.0000000 | 2019-11-06 00:00:00.0000000 | 0 | 0 | 4 | 0 | 1 |
7 | 2019-11-02 00:00:00.0000000 | 2019-11-02 00:00:00.0000000 | 2 | 2 | 0 | 1 | 0 |
8 | 2019-11-02 00:00:00.0000000 | 2019-11-03 00:00:00.0000000 | 0 | 1 | 1 | 0.5 | 0.5 |
9 | 2019-11-02 00:00:00.0000000 | 2019-11-04 00:00:00.0000000 | 0 | 1 | 1 | 0.5 | 0.5 |
10 | 2019-11-02 00:00:00.0000000 | 2019-11-05 00:00:00.0000000 | 0 | 1 | 1 | 0.5 | 0.5 |
11 | 2019-11-02 00:00:00.0000000 | 2019-11-06 00:00:00.0000000 | 0 | 0 | 2 | 0 | 1 |
Following is an analysis of a few records from the output:
Record
R=3
,from_TimelineColumn
=2019-11-01
,to_TimelineColumn
=2019-11-03
:- The users considered for this record are all new users seen on 11/1. Since this is the first period, these are all users in that bin – [0,2,3,4]
dcount_new_values
– the number of users on 11/3 who weren’t seen on 11/1. This includes a single user –5
.dcount_retained_values
– out of all new users on 11/1, how many were retained until 11/3? There are three values ([0,2,4]
), whilecount_churn_values
is one (user=3
).retention_rate
= 0.75 – the three retained users out of the four new users who were first seen in 11/1.
Record
R=9
,from_TimelineColumn
=2019-11-02
,to_TimelineColumn
=2019-11-04
:- This record focuses on the new users who were first seen on 11/2 – users
1
and5
. dcount_new_values
– the number of users on 11/4 who weren’t seen through all periodsT0 .. from_Timestamp
. Meaning, users who are seen on 11/4 but who weren’t seen on either 11/1 or 11/2 – there are no such users.dcount_retained_values
– out of all new users on 11/2 ([1,5]
), how many were retained until 11/4? There’s one such user ([1]
),while count_churn_values
is one (user5
).retention_rate
is 0.5 – the single user that was retained on 11/4 out of the two new ones on 11/2.
- This record focuses on the new users who were first seen on 11/2 – users
Weekly retention rate, and churn rate (single week)
The next query calculates a retention and churn rate for week-over-week window for New Users
cohort (users that arrived on the first week).
// Generate random data of user activities
let _start = datetime(2017-05-01);
let _end = datetime(2017-05-31);
range Day from _start to _end step 1d
| extend d = tolong((Day - _start) / 1d)
| extend r = rand() + 1
| extend _users=range(tolong(d * 50 * r), tolong(d * 50 * r + 200 * r - 1), 1)
| mv-expand id=_users to typeof(long) limit 1000000
// Take only the first week cohort (last parameter)
| evaluate new_activity_metrics(['id'], Day, _start, _end, 7d, _start)
| project from_Day, to_Day, retention_rate, churn_rate
Output
from_Day | to_Day | retention_rate | churn_rate |
---|---|---|---|
2017-05-01 00:00:00.0000000 | 2017-05-01 00:00:00.0000000 | 1 | 0 |
2017-05-01 00:00:00.0000000 | 2017-05-08 00:00:00.0000000 | 0.544632768361582 | 0.455367231638418 |
2017-05-01 00:00:00.0000000 | 2017-05-15 00:00:00.0000000 | 0.031638418079096 | 0.968361581920904 |
2017-05-01 00:00:00.0000000 | 2017-05-22 00:00:00.0000000 | 0 | 1 |
2017-05-01 00:00:00.0000000 | 2017-05-29 00:00:00.0000000 | 0 | 1 |
Weekly retention rate, and churn rate (complete matrix)
The next query calculates retention and churn rate for week-over-week window for New Users
cohort. If the previous example calculated the statistics for a single week - the following query produces an NxN table for each from/to combination.
// Generate random data of user activities
let _start = datetime(2017-05-01);
let _end = datetime(2017-05-31);
range Day from _start to _end step 1d
| extend d = tolong((Day - _start) / 1d)
| extend r = rand() + 1
| extend _users=range(tolong(d * 50 * r), tolong(d * 50 * r + 200 * r - 1), 1)
| mv-expand id=_users to typeof(long) limit 1000000
// Last parameter is omitted -
| evaluate new_activity_metrics(['id'], Day, _start, _end, 7d)
| project from_Day, to_Day, retention_rate, churn_rate
Output
from_Day | to_Day | retention_rate | churn_rate |
---|---|---|---|
2017-05-01 00:00:00.0000000 | 2017-05-01 00:00:00.0000000 | 1 | 0 |
2017-05-01 00:00:00.0000000 | 2017-05-08 00:00:00.0000000 | 0.190397350993377 | 0.809602649006622 |
2017-05-01 00:00:00.0000000 | 2017-05-15 00:00:00.0000000 | 0 | 1 |
2017-05-01 00:00:00.0000000 | 2017-05-22 00:00:00.0000000 | 0 | 1 |
2017-05-01 00:00:00.0000000 | 2017-05-29 00:00:00.0000000 | 0 | 1 |
2017-05-08 00:00:00.0000000 | 2017-05-08 00:00:00.0000000 | 1 | 0 |
2017-05-08 00:00:00.0000000 | 2017-05-15 00:00:00.0000000 | 0.405263157894737 | 0.594736842105263 |
2017-05-08 00:00:00.0000000 | 2017-05-22 00:00:00.0000000 | 0.227631578947368 | 0.772368421052632 |
2017-05-08 00:00:00.0000000 | 2017-05-29 00:00:00.0000000 | 0 | 1 |
2017-05-15 00:00:00.0000000 | 2017-05-15 00:00:00.0000000 | 1 | 0 |
2017-05-15 00:00:00.0000000 | 2017-05-22 00:00:00.0000000 | 0.785488958990536 | 0.214511041009464 |
2017-05-15 00:00:00.0000000 | 2017-05-29 00:00:00.0000000 | 0.237644584647739 | 0.762355415352261 |
2017-05-22 00:00:00.0000000 | 2017-05-22 00:00:00.0000000 | 1 | 0 |
2017-05-22 00:00:00.0000000 | 2017-05-29 00:00:00.0000000 | 0.621835443037975 | 0.378164556962025 |
2017-05-29 00:00:00.0000000 | 2017-05-29 00:00:00.0000000 | 1 | 0 |
Weekly retention rate with lookback period
The following query calculates the retention rate of New Users
cohort when taking into
consideration lookback
period: a tabular query with set of Ids that are used to define
the New Users
cohort (all IDs that don’t appear in this set are New Users
). The
query examines the retention behavior of the New Users
during the analysis period.
// Generate random data of user activities
let _lookback = datetime(2017-02-01);
let _start = datetime(2017-05-01);
let _end = datetime(2017-05-31);
let _data = range Day from _lookback to _end step 1d
| extend d = tolong((Day - _lookback) / 1d)
| extend r = rand() + 1
| extend _users=range(tolong(d * 50 * r), tolong(d * 50 * r + 200 * r - 1), 1)
| mv-expand id=_users to typeof(long) limit 1000000;
//
let lookback_data = _data | where Day < _start | project Day, id;
_data
| evaluate new_activity_metrics(id, Day, _start, _end, 7d, _start, lookback_data)
| project from_Day, to_Day, retention_rate
Output
from_Day | to_Day | retention_rate |
---|---|---|
2017-05-01 00:00:00.0000000 | 2017-05-01 00:00:00.0000000 | 1 |
2017-05-01 00:00:00.0000000 | 2017-05-08 00:00:00.0000000 | 0.404081632653061 |
2017-05-01 00:00:00.0000000 | 2017-05-15 00:00:00.0000000 | 0.257142857142857 |
2017-05-01 00:00:00.0000000 | 2017-05-22 00:00:00.0000000 | 0.296326530612245 |
2017-05-01 00:00:00.0000000 | 2017-05-29 00:00:00.0000000 | 0.0587755102040816 |
6.8 - rolling_percentile plugin
Returns an estimate for the specified percentile of the ValueColumn population in a rolling (sliding) BinsPerWindow size window per BinSize.
The plugin is invoked with the evaluate
operator.
Syntax
T | evaluate
rolling_percentile(
ValueColumn,
Percentile,
IndexColumn,
BinSize,
BinsPerWindow [,
dim1,
dim2,
…] )
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input tabular expression. |
ValueColumn | string | ✔️ | The name of the column used to calculate the percentiles. |
Percentile | int, long, or real | ✔️ | Scalar with the percentile to calculate. |
IndexColumn | string | ✔️ | The name of the column over which to run the rolling window. |
BinSize | int, long, real, datetime, or timespan | ✔️ | Scalar with size of the bins to apply over the IndexColumn. |
BinsPerWindow | int | ✔️ | The number of bins included in each window. |
dim1, dim2, … | string | A list of the dimensions columns to slice by. |
Returns
Returns a table with a row per each bin (and combination of dimensions if specified) that has the rolling percentile of values in the window ending at the bin (inclusive). Output table schema is:
IndexColumn | dim1 | … | dim_n | rolling_BinsPerWindow_percentile_ValueColumn_Pct |
---|
Examples
Rolling 3-day median value per day
The next query calculates a 3-day median value in daily granularity. Each row in the output represents the median value for the last 3 bins (days), including the bin itself.
let T =
range idx from 0 to 24 * 10 - 1 step 1
| project Timestamp = datetime(2018-01-01) + 1h * idx, val=idx + 1
| extend EvenOrOdd = iff(val % 2 == 0, "Even", "Odd");
T
| evaluate rolling_percentile(val, 50, Timestamp, 1d, 3)
Output
Timestamp | rolling_3_percentile_val_50 |
---|---|
2018-01-01 00:00:00.0000000 | 12 |
2018-01-02 00:00:00.0000000 | 24 |
2018-01-03 00:00:00.0000000 | 36 |
2018-01-04 00:00:00.0000000 | 60 |
2018-01-05 00:00:00.0000000 | 84 |
2018-01-06 00:00:00.0000000 | 108 |
2018-01-07 00:00:00.0000000 | 132 |
2018-01-08 00:00:00.0000000 | 156 |
2018-01-09 00:00:00.0000000 | 180 |
2018-01-10 00:00:00.0000000 | 204 |
Rolling 3-day median value per day by dimension
Same example from above, but now also calculates the rolling window partitioned for each value of the dimension.
let T =
range idx from 0 to 24 * 10 - 1 step 1
| project Timestamp = datetime(2018-01-01) + 1h * idx, val=idx + 1
| extend EvenOrOdd = iff(val % 2 == 0, "Even", "Odd");
T
| evaluate rolling_percentile(val, 50, Timestamp, 1d, 3, EvenOrOdd)
Output
Timestamp | EvenOrOdd | rolling_3_percentile_val_50 |
---|---|---|
2018-01-01 00:00:00.0000000 | Even | 12 |
2018-01-02 00:00:00.0000000 | Even | 24 |
2018-01-03 00:00:00.0000000 | Even | 36 |
2018-01-04 00:00:00.0000000 | Even | 60 |
2018-01-05 00:00:00.0000000 | Even | 84 |
2018-01-06 00:00:00.0000000 | Even | 108 |
2018-01-07 00:00:00.0000000 | Even | 132 |
2018-01-08 00:00:00.0000000 | Even | 156 |
2018-01-09 00:00:00.0000000 | Even | 180 |
2018-01-10 00:00:00.0000000 | Even | 204 |
2018-01-01 00:00:00.0000000 | Odd | 11 |
2018-01-02 00:00:00.0000000 | Odd | 23 |
2018-01-03 00:00:00.0000000 | Odd | 35 |
2018-01-04 00:00:00.0000000 | Odd | 59 |
2018-01-05 00:00:00.0000000 | Odd | 83 |
2018-01-06 00:00:00.0000000 | Odd | 107 |
2018-01-07 00:00:00.0000000 | Odd | 131 |
2018-01-08 00:00:00.0000000 | Odd | 155 |
2018-01-09 00:00:00.0000000 | Odd | 179 |
2018-01-10 00:00:00.0000000 | Odd | 203 |
6.9 - rows_near plugin
Finds rows near a specified condition.
The plugin is invoked with the evaluate
operator.
Syntax
T | evaluate
rows_near(
Condition,
NumRows,
[,
RowsAfter ])
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input tabular expression. |
Condition | bool | ✔️ | Represents the condition to find rows around. |
NumRows | int | ✔️ | The number of rows to find before and after the condition. |
RowsAfter | int | When specified, overrides the number of rows to find after the condition. |
Returns
Every row from the input that is within NumRows from a true
Condition,
When RowsAfter is specified, returns every row from the input that is NumRows before or RowsAfter after a true
Condition.
Example
Find rows with an "Error"
State, and returns 2
rows before and after the "Error"
record.
datatable (Timestamp:datetime, Value:long, State:string )
[
datetime(2021-06-01), 1, "Success",
datetime(2021-06-02), 4, "Success",
datetime(2021-06-03), 3, "Success",
datetime(2021-06-04), 11, "Success",
datetime(2021-06-05), 15, "Success",
datetime(2021-06-06), 2, "Success",
datetime(2021-06-07), 19, "Error",
datetime(2021-06-08), 12, "Success",
datetime(2021-06-09), 7, "Success",
datetime(2021-06-10), 9, "Success",
datetime(2021-06-11), 4, "Success",
datetime(2021-06-12), 1, "Success",
]
| sort by Timestamp asc
| evaluate rows_near(State == "Error", 2)
Output
Timestamp | Value | State |
---|---|---|
2021-06-05 00:00:00.0000000 | 15 | Success |
2021-06-06 00:00:00.0000000 | 2 | Success |
2021-06-07 00:00:00.0000000 | 19 | Error |
2021-06-08 00:00:00.0000000 | 12 | Success |
2021-06-09 00:00:00.0000000 | 7 | Success |
6.10 - sequence_detect plugin
Detects sequence occurrences based on provided predicates. The plugin is invoked with the evaluate
operator.
Syntax
T | evaluate
sequence_detect
(
TimelineColumn,
MaxSequenceStepWindow,
MaxSequenceSpan,
Expr1,
Expr2,
…, Dim1,
Dim2,
…)
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input tabular expression. |
TimelineColumn | string | ✔️ | The column reference representing timeline, must be present in the source expression. |
MaxSequenceStepWindow | timespan | ✔️ | The value of the max allowed timespan between 2 sequential steps in the sequence. |
MaxSequenceSpan | timespan | ✔️ | The max timespan for the sequence to complete all steps. |
Expr1, Expr2, … | string | ✔️ | The boolean predicate expressions defining sequence steps. |
Dim1, Dim2, … | string | ✔️ | The dimension expressions that are used to correlate sequences. |
Returns
Returns a single table where each row in the table represents a single sequence occurrence:
- Dim1, Dim2, …: dimension columns that were used to correlate sequences.
- Expr1TimelineColumn, Expr2TimelineColumn, …: Columns with time values, representing the timeline of each sequence step.
- Duration: the overall sequence time window
Examples
The following query looks at the table T to search for relevant data from a specified time period.
T | evaluate sequence_detect(datetime_column, 10m, 1h, e1 = (Col1 == 'Val'), e2 = (Col2 == 'Val2'), Dim1, Dim2)
Exploring Storm Events
The following query looks on the table StormEvents (weather statistics for 2007) and shows cases where sequence of ‘Excessive Heat’ was followed by ‘Wildfire’ within 5 days.
StormEvents
| evaluate sequence_detect(
StartTime,
5d, // step max-time
5d, // sequence max-time
heat=(EventType == "Excessive Heat"),
wildfire=(EventType == 'Wildfire'),
State
)
Output
State | heat_StartTime | wildfire_StartTime | Duration |
---|---|---|---|
CALIFORNIA | 2007-05-08 00:00:00.0000000 | 2007-05-08 16:02:00.0000000 | 16:02:00 |
CALIFORNIA | 2007-05-08 00:00:00.0000000 | 2007-05-10 11:30:00.0000000 | 2.11:30:00 |
CALIFORNIA | 2007-07-04 09:00:00.0000000 | 2007-07-05 23:01:00.0000000 | 1.14:01:00 |
SOUTH DAKOTA | 2007-07-23 12:00:00.0000000 | 2007-07-27 09:00:00.0000000 | 3.21:00:00 |
TEXAS | 2007-08-10 08:00:00.0000000 | 2007-08-11 13:56:00.0000000 | 1.05:56:00 |
CALIFORNIA | 2007-08-31 08:00:00.0000000 | 2007-09-01 11:28:00.0000000 | 1.03:28:00 |
CALIFORNIA | 2007-08-31 08:00:00.0000000 | 2007-09-02 13:30:00.0000000 | 2.05:30:00 |
CALIFORNIA | 2007-09-02 12:00:00.0000000 | 2007-09-02 13:30:00.0000000 | 01:30:00 |
6.11 - session_count plugin
Calculates the session count based on the ID column over a timeline. The plugin is invoked with the evaluate
operator.
Syntax
TabularExpression | evaluate
session_count(
IdColumn,
TimelineColumn,
Start,
End,
Bin,
LookBackWindow [,
dim1,
dim2,
…])
Parameters
Name | Type | Required | Description |
---|---|---|---|
TabularExpression | string | ✔️ | The tabular expression that serves as input. |
IdColumn | string | ✔️ | The name of the column with ID values that represents user activity. |
TimelineColumn | string | ✔️ | The name of the column that represents the timeline. |
Start | scalar | ✔️ | The start of the analysis period. |
End | scalar | ✔️ | The end of the analysis period. |
Bin | scalar | ✔️ | The session’s analysis step period. |
LookBackWindow | scalar | ✔️ | The session lookback period. If the ID from IdColumn appears in a time window within LookBackWindow , the session is considered to be an existing one. If the ID doesn’t appear, then the session is considered to be new. |
dim1, dim2, … | string | A list of the dimensions columns that slice the session count calculation. |
Returns
Returns a table that has the session count values for each timeline period and for each existing dimensions combination.
Output table schema is:
TimelineColumn | dim1 | .. | dim_n | count_sessions | ||||||
---|---|---|---|---|---|---|---|---|---|---|
type: as of TimelineColumn | .. | .. | .. | long |
Examples
For this example, the data is deterministic, and we use a table with two columns:
Timeline
: a running number from 1 to 10,000Id
: ID of the user from 1 to 50
Id
appears at the specific Timeline
slot if it’s a divider of Timeline
(Timeline % Id == 0).
An event with Id==1
will appear at any Timeline
slot, an event with Id==2
at every second Timeline
slot, and so on.
Here are 20 lines of the data:
let _data = range Timeline from 1 to 10000 step 1
| extend __key = 1
| join kind=inner (range Id from 1 to 50 step 1 | extend __key=1) on __key
| where Timeline % Id == 0
| project Timeline, Id;
// Look on few lines of the data
_data
| order by Timeline asc, Id asc
| take 20
Output
Timeline | Id |
---|---|
1 | 1 |
2 | 1 |
2 | 2 |
3 | 1 |
3 | 3 |
4 | 1 |
4 | 2 |
4 | 4 |
5 | 1 |
5 | 5 |
6 | 1 |
6 | 2 |
6 | 3 |
6 | 6 |
7 | 1 |
7 | 7 |
8 | 1 |
8 | 2 |
8 | 4 |
8 | 8 |
Let’s define a session in next terms: session considered to be active as long as user (Id
) appears at least once at a timeframe of 100 time slots, while session look-back window is 41 time slots.
The next query shows the count of active sessions according to the above definition.
let _data = range Timeline from 1 to 9999 step 1
| extend __key = 1
| join kind=inner (range Id from 1 to 50 step 1 | extend __key=1) on __key
| where Timeline % Id == 0
| project Timeline, Id;
// End of data definition
_data
| evaluate session_count(Id, Timeline, 1, 10000, 100, 41)
| render linechart
6.12 - sliding_window_counts plugin
Calculates counts and distinct count of values in a sliding window over a lookback period, using the technique described in the Perform aggregations over a sliding window example. The plugin is invoked with the evaluate
operator.
Syntax
T | evaluate
sliding_window_counts(
IdColumn,
TimelineColumn,
Start,
End,
LookbackWindow,
Bin ,
[dim1,
dim2,
…])
Parameters
Name | Type | Required | Description |
---|---|---|---|
T | string | ✔️ | The input tabular expression. |
IdColumn | string | ✔️ | The name of the column with ID values that represent user activity. |
TimelineColumn | string | ✔️ | The name of the column representing the timeline. |
Start | int, long, real, datetime, or timespan | ✔️ | The analysis start period. |
End | int, long, real, datetime, or timespan | ✔️ | The analysis end period. |
LookbackWindow | int, long, real, datetime, or timespan | ✔️ | The lookback period. This value should be a multiple of the Bin value, otherwise the LookbackWindow will be rounded down to a multiple of the Bin value. For example, for dcount users in past 7d : LookbackWindow = 7d . |
Bin | int, long, real, datetime, timespan, or string | ✔️ | The analysis step period. The possible string values are week , month , and year for which all periods will be startofweek, startofmonth, startofyear respectively. |
dim1, dim2, … | string | A list of the dimensions columns that slice the activity metrics calculation. |
Returns
Returns a table that has the count and distinct count values of Ids in the lookback period, for each timeline period (by bin) and for each existing dimensions combination.
Output table schema is:
TimelineColumn | dim1 | .. | dim_n | count | dcount |
---|---|---|---|---|---|
type: as of TimelineColumn | .. | .. | .. | long | long |
Example
Calculate counts and dcounts
for users in past week, for each day in the analysis period.
let start = datetime(2017 - 08 - 01);
let end = datetime(2017 - 08 - 07);
let lookbackWindow = 3d;
let bin = 1d;
let T = datatable(UserId: string, Timestamp: datetime)
[
'Bob', datetime(2017 - 08 - 01),
'David', datetime(2017 - 08 - 01),
'David', datetime(2017 - 08 - 01),
'John', datetime(2017 - 08 - 01),
'Bob', datetime(2017 - 08 - 01),
'Ananda', datetime(2017 - 08 - 02),
'Atul', datetime(2017 - 08 - 02),
'John', datetime(2017 - 08 - 02),
'Ananda', datetime(2017 - 08 - 03),
'Atul', datetime(2017 - 08 - 03),
'Atul', datetime(2017 - 08 - 03),
'John', datetime(2017 - 08 - 03),
'Bob', datetime(2017 - 08 - 03),
'Betsy', datetime(2017 - 08 - 04),
'Bob', datetime(2017 - 08 - 05),
];
T
| evaluate sliding_window_counts(UserId, Timestamp, start, end, lookbackWindow, bin)
Output
Timestamp | Count | dcount |
---|---|---|
2017-08-01 00:00:00.0000000 | 5 | 3 |
2017-08-02 00:00:00.0000000 | 8 | 5 |
2017-08-03 00:00:00.0000000 | 13 | 5 |
2017-08-04 00:00:00.0000000 | 9 | 5 |
2017-08-05 00:00:00.0000000 | 7 | 5 |
2017-08-06 00:00:00.0000000 | 2 | 2 |
2017-08-07 00:00:00.0000000 | 1 | 1 |
6.13 - User Analytics
This section describes Kusto extensions (plugins) for user analytics scenarios.
Scenario | Plugin | Details | User Experience |
---|---|---|---|
Counting new users over time | activity_counts_metrics | Returns counts/dcounts/new counts for each time window. Each time window is compared to all previous time windows | Kusto.Explorer: Report Gallery |
Period-over-period: retention/churn rate and new users | activity_metrics | Returns dcount , retention/churn rate for each time window. Each time window is compared to previous time window | Kusto.Explorer: Report Gallery |
Users count and dcount over sliding window | sliding_window_counts | For each time window, returns count and dcount over a lookback period, in a sliding window manner | |
New-users cohort: retention/churn rate and new users | new_activity_metrics | Compares between cohorts of new users (all users that were first seen in time window). Each cohort is compared to all prior cohorts. Comparison takes into account all previous time windows | Kusto.Explorer: Report Gallery |
Active Users: distinct counts | active_users_count | Returns distinct users for each time window. A user is only considered if it appears in at least X distinct periods in a specified lookback period. | |
User Engagement: DAU/WAU/MAU | activity_engagement | Compares between an inner time window (for example, daily) and an outer (for example, weekly) for computing engagement (for example, DAU/WAU) | Kusto.Explorer: Report Gallery |
Sessions: count active sessions | session_count | Counts sessions, where a session is defined by a time period - a user record is considered a new session, if it hasn’t been seen in the lookback period from current record | |
Funnels: previous and next state sequence analysis | funnel_sequence | Counts distinct users who have taken a sequence of events, and the previous or next events that led or were followed by the sequence. Useful for constructing sankey diagrams | |
Funnels: sequence completion analysis | funnel_sequence_completion | Computes the distinct count of users that have completed a specified sequence in each time window | |
||||