1 - Data reshaping plugins

1.1 - bag_unpack plugin

Learn how to use the bag_unpack plugin to unpack a dynamic column.

The bag_unpack plugin unpacks a single column of type dynamic, by treating each property bag top-level slot as a column. The plugin is invoked with the evaluate operator.

Syntax

T | evaluate bag_unpack( Column [, OutputColumnPrefix ] [, columnsConflict ] [, ignoredProperties ] ) [: OutputSchema]

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose column Column is to be unpacked.
Columndynamic✔️The column of T to unpack.
OutputColumnPrefixstringA common prefix to add to all columns produced by the plugin.
columnsConflictstringThe direction for column conflict resolution. Valid values:
error - Query produces an error (default)
replace_source - Source column is replaced
keep_source - Source column is kept
ignoredPropertiesdynamicAn optional set of bag properties to be ignored. }
OutputSchemaThe names and types for the expected columns of the bag_unpack plugin output. Specifying the expected schema optimizes query execution by not having to first run the actual query to explore the schema. For syntax information, see Output schema syntax.

Output schema syntax

( ColumnName : ColumnType [, …] )

To add all columns of the input table to the plugin output, use a wildcard * as the first parameter, as follows:

( * , ColumnName : ColumnType [, …] )

Returns

The bag_unpack plugin returns a table with as many records as its tabular input (T). The schema of the table is the same as the schema of its tabular input with the following modifications:

  • The specified input column (Column) is removed.
  • The schema is extended with as many columns as there are distinct slots in the top-level property bag values of T. The name of each column corresponds to the name of each slot, optionally prefixed by OutputColumnPrefix. Its type is either the type of the slot, if all values of the same slot have the same type, or dynamic, if the values differ in type.

Examples

Expand a bag

datatable(d:dynamic)
[
    dynamic({"Name": "John", "Age":20}),
    dynamic({"Name": "Dave", "Age":40}),
    dynamic({"Name": "Jasmine", "Age":30}),
]
| evaluate bag_unpack(d)

Output

AgeName
20John
40Dave
30Jasmine

Expand a bag with OutputColumnPrefix

Expand a bag and use the OutputColumnPrefix option to produce column names that begin with the prefix ‘Property_’.

datatable(d:dynamic)
[
    dynamic({"Name": "John", "Age":20}),
    dynamic({"Name": "Dave", "Age":40}),
    dynamic({"Name": "Jasmine", "Age":30}),
]
| evaluate bag_unpack(d, 'Property_')

Output

Property_AgeProperty_Name
20John
40Dave
30Jasmine

Expand a bag with columnsConflict

Expand a bag and use the columnsConflict option to resolve conflicts between existing columns and columns produced by the bag_unpack() operator.

datatable(Name:string, d:dynamic)
[
    'Old_name', dynamic({"Name": "John", "Age":20}),
    'Old_name', dynamic({"Name": "Dave", "Age":40}),
    'Old_name', dynamic({"Name": "Jasmine", "Age":30}),
]
| evaluate bag_unpack(d, columnsConflict='replace_source') // Use new name

Output

AgeName
20John
40Dave
30Jasmine
datatable(Name:string, d:dynamic)
[
    'Old_name', dynamic({"Name": "John", "Age":20}),
    'Old_name', dynamic({"Name": "Dave", "Age":40}),
    'Old_name', dynamic({"Name": "Jasmine", "Age":30}),
]
| evaluate bag_unpack(d, columnsConflict='keep_source') // Keep old name

Output

AgeName
20Old_name
40Old_name
30Old_name

Expand a bag with ignoredProperties

Expand a bag and use the ignoredProperties option to ignore certain properties in the property bag.

datatable(d:dynamic)
[
    dynamic({"Name": "John", "Age":20, "Address": "Address-1" }),
    dynamic({"Name": "Dave", "Age":40, "Address": "Address-2"}),
    dynamic({"Name": "Jasmine", "Age":30, "Address": "Address-3"}),
]
// Ignore 'Age' and 'Address' properties
| evaluate bag_unpack(d, ignoredProperties=dynamic(['Address', 'Age']))

Output

Name
John
Dave
Jasmine

Expand a bag with a query-defined OutputSchema

Expand a bag and use the OutputSchema option to allow various optimizations to be evaluated before running the actual query.

datatable(d:dynamic)
[
    dynamic({"Name": "John", "Age":20}),
    dynamic({"Name": "Dave", "Age":40}),
    dynamic({"Name": "Jasmine", "Age":30}),
]
| evaluate bag_unpack(d) : (Name:string, Age:long)

Output

NameAge
John20
Dave40
Jasmine30

Expand a bag and use the OutputSchema option to allow various optimizations to be evaluated before running the actual query. Use a wildcard * to return all columns of the input table.

datatable(d:dynamic, Description: string)
[
    dynamic({"Name": "John", "Age":20}), "Student",
    dynamic({"Name": "Dave", "Age":40}), "Teacher",
    dynamic({"Name": "Jasmine", "Age":30}), "Student",
]
| evaluate bag_unpack(d) : (*, Name:string, Age:long)

Output

DescriptionNameAge
StudentJohn20
TeacherDave40
StudentJasmine30

1.2 - narrow plugin

Learn how to use the narrow plugin to display a wide table.

The narrow plugin “unpivots” a wide table into a table with three columns:

  • Row number
  • Column type
  • Column value (as string)

The narrow plugin is designed mainly for display purposes, as it allows wide tables to be displayed comfortably without the need of horizontal scrolling.

The plugin is invoked with the evaluate operator.

Syntax

T | evaluate narrow()

Examples

The following example shows an easy way to read the output of the Kusto .show diagnostics management command.

.show diagnostics
 | evaluate narrow()

The results of .show diagnostics itself is a table with a single row and 33 columns. By using the narrow plugin we “rotate” the output to something like this:

RowColumnValue
0IsHealthyTrue
0IsRebalanceRequiredFalse
0IsScaleOutRequiredFalse
0MachinesTotal2
0MachinesOffline0
0NodeLastRestartedOn2017-03-14 10:59:18.9263023
0AdminLastElectedOn2017-03-14 10:58:41.6741934
0ClusterWarmDataCapacityFactor0.130552847673333
0ExtentsTotal136
0DiskColdAllocationPercentage5
0InstancesTargetBasedOnDataCapacity2
0TotalOriginalDataSize5167628070
0TotalExtentSize1779165230
0IngestionsLoadFactor0
0IngestionsInProgress0
0IngestionsSuccessRate100
0MergesInProgress0
0BuildVersion1.0.6281.19882
0BuildTime2017-03-13 11:02:44.0000000
0ClusterDataCapacityFactor0.130552847673333
0IsDataWarmingRequiredFalse
0RebalanceLastRunOn2017-03-21 09:14:53.8523455
0DataWarmingLastRunOn2017-03-21 09:19:54.1438800
0MergesSuccessRate100
0NotHealthyReason[null]
0IsAttentionRequiredFalse
0AttentionRequiredReason[null]
0ProductVersionKustoRelease_2017.03.13.2
0FailedIngestOperations0
0FailedMergeOperations0
0MaxExtentsInSingleTable64
0TableWithMaxExtentsKustoMonitoringPersistentDatabase.KustoMonitoringTable
0WarmExtentSize1779165230

1.3 - pivot plugin

Learn how to use the pivot plugin to rotate a table with specified columns and aggregates the remaining columns.

Rotates a table by turning the unique values from one column in the input table into multiple columns in the output table and performs aggregations as required on any remaining column values that will appear in the final output.

Syntax

T | evaluate pivot(pivotColumn[, aggregationFunction] [,column1 [,column2]]) [: OutputSchema]

Parameters

NameTypeRequiredDescription
pivotColumnstring✔️The column to rotate. Each unique value from this column will be a column in the output table.
aggregationFunctionstringAn aggregation function used to aggregate multiple rows in the input table to a single row in the output table. Currently supported functions: min(), max(), take_any(), sum(), dcount(), avg(), stdev(), variance(), make_list(), make_bag(), make_set(), count(). The default is count().
column1, column2, …stringA column name or comma-separated list of column names. The output table will contain an additional column per each specified column. The default is all columns other than the pivoted column and the aggregation column.
OutputSchemaThe names and types for the expected columns of the pivot plugin output.

Syntax: ( ColumnName : ColumnType [, …] )

Specifying the expected schema optimizes query execution by not having to first run the actual query to explore the schema. An error is raised if the run-time schema doesn’t match the OutputSchema schema.

Returns

Pivot returns the rotated table with specified columns (column1, column2, …) plus all unique values of the pivot columns. Each cell for the pivoted columns will contain the aggregate function computation.

Examples

Pivot by a column

For each EventType and State starting with ‘AL’, count the number of events of this type in this state.

StormEvents
| project State, EventType
| where State startswith "AL"
| where EventType has "Wind"
| evaluate pivot(State)

Output

EventTypeALABAMAALASKA
Thunderstorm Wind3521
High Wind095
Extreme Cold/Wind Chill010
Strong Wind220

Pivot by a column with aggregation function

For each EventType and State starting with ‘AR’, display the total number of direct deaths.

StormEvents
| where State startswith "AR"
| project State, EventType, DeathsDirect
| where DeathsDirect > 0
| evaluate pivot(State, sum(DeathsDirect))

Output

EventTypeARKANSASARIZONA
Heavy Rain10
Thunderstorm Wind10
Lightning01
Flash Flood06
Strong Wind10
Heat30

Pivot by a column with aggregation function and a single additional column

Result is identical to previous example.

StormEvents
| where State startswith "AR"
| project State, EventType, DeathsDirect
| where DeathsDirect > 0
| evaluate pivot(State, sum(DeathsDirect), EventType)

Output

EventTypeARKANSASARIZONA
Heavy Rain10
Thunderstorm Wind10
Lightning01
Flash Flood06
Strong Wind10
Heat30

Specify the pivoted column, aggregation function, and multiple additional columns

For each event type, source, and state, sum the number of direct deaths.

StormEvents
| where State startswith "AR"
| where DeathsDirect > 0
| evaluate pivot(State, sum(DeathsDirect), EventType, Source)

Output

EventTypeSourceARKANSASARIZONA
Heavy RainEmergency Manager10
Thunderstorm WindEmergency Manager10
LightningNewspaper01
Flash FloodTrained Spotter02
Flash FloodBroadcast Media03
Flash FloodNewspaper01
Strong WindLaw Enforcement10
HeatNewspaper30

Pivot with a query-defined output schema

The following example selects specific columns in the StormEvents table. It uses an explicit schema definition that allows various optimizations to be evaluated before running the actual query.

StormEvents
| project State, EventType
| where EventType has "Wind"
| evaluate pivot(State): (EventType:string, ALABAMA:long, ALASKA:long)

Output

EventTypeALABAMAALASKA
Thunderstorm Wind3521
High Wind095
Marine Thunderstorm Wind00
Strong Wind220
Extreme Cold/Wind Chill010
Cold/Wind Chill00
Marine Strong Wind00
Marine High Wind00

2 - General plugins

2.1 - dcount_intersect plugin

Learn how to use the dcount_intersect plugin to calculate the intersection between N sets based on hyper log log (hll) values.

Calculates intersection between N sets based on hll values (N in range of [2..16]), and returns N dcount values. The plugin is invoked with the evaluate operator.

Syntax

T | evaluate dcount_intersect(hll_1, hll_2, [, hll_3, …])

Parameters

NameTypeRequiredDescription
Tstring✔️The input tabular expression.
hll_iThe values of set Si calculated with the hll() function.

Returns

Returns a table with N dcount values (per column, representing set intersections). Column names are s0, s1, … (until n-1).

Given sets S1, S2, .. Sn return values will be representing distinct counts of:
S1,
S1 ∩ S2,
S1 ∩ S2 ∩ S3,
… ,
S1 ∩ S2 ∩ … ∩ Sn

Examples

// Generate numbers from 1 to 100
range x from 1 to 100 step 1
| extend isEven = (x % 2 == 0), isMod3 = (x % 3 == 0), isMod5 = (x % 5 == 0)
// Calculate conditional HLL values (note that '0' is included in each of them as additional value, so we will subtract it later)
| summarize hll_even = hll(iif(isEven, x, 0), 2),
            hll_mod3 = hll(iif(isMod3, x, 0), 2),
            hll_mod5 = hll(iif(isMod5, x, 0), 2) 
// Invoke the plugin that calculates dcount intersections         
| evaluate dcount_intersect(hll_even, hll_mod3, hll_mod5)
| project evenNumbers = s0 - 1,             //                             100 / 2 = 50
          even_and_mod3 = s1 - 1,           // gcd(2,3) = 6, therefor:     100 / 6 = 16
          even_and_mod3_and_mod5 = s2 - 1   // gcd(2,3,5) is 30, therefore: 100 / 30 = 3 

Output

evenNumberseven_and_mod3even_and_mod3_and_mod5
50163

2.2 - infer_storage_schema plugin

Learn how to use the infer_storage_schema plugin to infer the schema of external data.

This plugin infers the schema of external data, and returns it as CSL schema string. The string can be used when creating external tables. The plugin is invoked with the evaluate operator.

Authentication and authorization

In the properties of the request, you specify storage connection strings to access. Each storage connection string specifies the authorization method to use for access to the storage. Depending on the authorization method, the principal may need to be granted permissions on the external storage to perform the schema inference.

The following table lists the supported authentication methods and any required permissions by storage type.

Authentication methodAzure Blob Storage / Data Lake Storage Gen2Data Lake Storage Gen1
ImpersonationStorage Blob Data ReaderReader
Shared Access (SAS) tokenList + ReadThis authentication method isn’t supported in Gen1.
Microsoft Entra access token
Storage account access keyThis authentication method isn’t supported in Gen1.

Syntax

evaluate infer_storage_schema( Options )

Parameters

NameTypeRequiredDescription
Optionsdynamic✔️A property bag specifying the properties of the request.

Supported properties of the request

NameTypeRequiredDescription
StorageContainersdynamic✔️An array of storage connection strings that represent prefix URI for stored data artifacts.
DataFormatstring✔️One of the supported data formats.
FileExtensionstringIf specified, the function only scans files ending with this file extension. Specifying the extension may speed up the process or eliminate data reading issues.
FileNamePrefixstringIf specified, the function only scans files starting with this prefix. Specifying the prefix may speed up the process.
ModestringThe schema inference strategy. A value of: any, last, all. The function infers the data schema from the first found file, from the last written file, or from all files respectively. The default value is last.
InferenceOptionsdynamicMore inference options. Valid options: UseFirstRowAsHeader for delimited file formats. For example, 'InferenceOptions': {'UseFirstRowAsHeader': true} .

Returns

The infer_storage_schema plugin returns a single result table containing a single row/column containing CSL schema string.

Example

let options = dynamic({
  'StorageContainers': [
    h@'https://storageaccount.blob.core.windows.net/MobileEvents;secretKey'
  ],
  'FileExtension': '.parquet',
  'FileNamePrefix': 'part-',
  'DataFormat': 'parquet'
});
evaluate infer_storage_schema(options)

Output

CslSchema
app_id:string, user_id:long, event_time:datetime, country:string, city:string, device_type:string, device_vendor:string, ad_network:string, campaign:string, site_id:string, event_type:string, event_name:string, organic:string, days_from_install:int, revenue:real

Use the returned schema in external table definition:

.create external table MobileEvents(
    app_id:string, user_id:long, event_time:datetime, country:string, city:string, device_type:string, device_vendor:string, ad_network:string, campaign:string, site_id:string, event_type:string, event_name:string, organic:string, days_from_install:int, revenue:real
)
kind=blob
partition by (dt:datetime = bin(event_time, 1d), app:string = app_id)
pathformat = ('app=' app '/dt=' datetime_pattern('yyyyMMdd', dt))
dataformat = parquet
(
    h@'https://storageaccount.blob.core.windows.net/MovileEvents;secretKey'
)

2.3 - infer_storage_schema_with_suggestions plugin

Learn how to use the infer_storage_schema_with_suggestions plugin to infer the optimal schema of external data.

This infer_storage_schema_with_suggestions plugin infers the schema of external data and returns a JSON object. For each column, the object provides inferred type, a recommended type, and the recommended mapping transformation. The recommended type and mapping are provided by the suggestion logic that determines the optimal type using the following logic:

  • Identity columns: If the inferred type for a column is long and the column name ends with id, the suggested type is string since it provides optimized indexing for identity columns where equality filters are common.
  • Unix datetime columns: If the inferred type for a column is long and one of the unix-time to datetime mapping transformations produces a valid datetime value, the suggested type is datetime and the suggested ApplicableTransformationMapping mapping is the one that produced a valid datetime value.

The plugin is invoked with the evaluate operator. To obtain the table schema that uses the inferred schema for Create and alter Azure Storage external tables without suggestions, use the infer_storage_schema plugin.

Authentication and authorization

In the properties of the request, you specify storage connection strings to access. Each storage connection string specifies the authorization method to use for access to the storage. Depending on the authorization method, the principal may need to be granted permissions on the external storage to perform the schema inference.

The following table lists the supported authentication methods and any required permissions by storage type.

Authentication methodAzure Blob Storage / Data Lake Storage Gen2Data Lake Storage Gen1
ImpersonationStorage Blob Data ReaderReader
Shared Access (SAS) tokenList + ReadThis authentication method isn’t supported in Gen1.
Microsoft Entra access token
Storage account access keyThis authentication method isn’t supported in Gen1.

Syntax

evaluate infer_storage_schema_with_suggestions( Options )

Parameters

NameTypeRequiredDescription
Optionsdynamic✔️A property bag specifying the properties of the request.

Supported properties of the request

NameTypeRequiredDescription
StorageContainersdynamic✔️An array of storage connection strings that represent prefix URI for stored data artifacts.
DataFormatstring✔️One of the supported Data formats supported for ingestion
FileExtensionstringIf specified, the function only scans files ending with this file extension. Specifying the extension may speed up the process or eliminate data reading issues.
FileNamePrefixstringIf specified, the function only scans files starting with this prefix. Specifying the prefix may speed up the process.
ModestringThe schema inference strategy. A value of: any, last, all. The function infers the data schema from the first found file, from the last written file, or from all files respectively. The default value is last.
InferenceOptionsdynamicMore inference options. Valid options: UseFirstRowAsHeader for delimited file formats. For example, 'InferenceOptions': {'UseFirstRowAsHeader': true} .

Returns

The infer_storage_schema_with_suggestions plugin returns a single result table containing a single row/column containing a JSON string.

Example

let options = dynamic({
  'StorageContainers': [
    h@'https://storageaccount.blob.core.windows.net/MobileEvents;secretKey'
  ],
  'FileExtension': '.json',
  'FileNamePrefix': 'js-',
  'DataFormat': 'json'
});
evaluate infer_storage_schema_with_suggestions(options)

Example input data

    {
        "source": "DataExplorer",
        "created_at": "2022-04-10 15:47:57",
        "author_id": 739144091473215488,
        "time_millisec":1547083647000
    }

Output

{
  "Columns": [
    {
      "OriginalColumn": {
        "Name": "source",
        "CslType": {
          "type": "string",
          "IsNumeric": false,
          "IsSummable": false
        }
      },
      "RecommendedColumn": {
        "Name": "source",
        "CslType": {
          "type": "string",
          "IsNumeric": false,
          "IsSummable": false
        }
      },
      "ApplicableTransformationMapping": "None"
    },
    {
      "OriginalColumn": {
        "Name": "created_at",
        "CslType": {
          "type": "datetime",
          "IsNumeric": false,
          "IsSummable": true
        }
      },
      "RecommendedColumn": {
        "Name": "created_at",
        "CslType": {
          "type": "datetime",
          "IsNumeric": false,
          "IsSummable": true
        }
      },
      "ApplicableTransformationMapping": "None"
    },
    {
      "OriginalColumn": {
        "Name": "author_id",
        "CslType": {
          "type": "long",
          "IsNumeric": true,
          "IsSummable": true
        }
      },
      "RecommendedColumn": {
        "Name": "author_id",
        "CslType": {
          "type": "string",
          "IsNumeric": false,
          "IsSummable": false
        }
      },
      "ApplicableTransformationMapping": "None"
    },
    {
      "OriginalColumn": {
        "Name": "time_millisec",
        "CslType": {
          "type": "long",
          "IsNumeric": true,
          "IsSummable": true
        }
      },
      "RecommendedColumn": {
        "Name": "time_millisec",
        "CslType": {
          "type": "datetime",
          "IsNumeric": false,
          "IsSummable": true
        }
      },
      "ApplicableTransformationMapping": "DateTimeFromUnixMilliseconds"
    }
  ]
}

2.4 - ipv4_lookup plugin

Learn how to use the ipv4_lookup plugin to look up an IPv4 value in a lookup table.

The ipv4_lookup plugin looks up an IPv4 value in a lookup table and returns rows with matched values. The plugin is invoked with the evaluate operator.

Syntax

T | evaluate ipv4_lookup( LookupTable , SourceIPv4Key , IPv4LookupKey [, ExtraKey1 [.. , ExtraKeyN [, return_unmatched ]]] )

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose column SourceIPv4Key is used for IPv4 matching.
LookupTablestring✔️Table or tabular expression with IPv4 lookup data, whose column LookupKey is used for IPv4 matching. IPv4 values can be masked using IP-prefix notation.
SourceIPv4Keystring✔️The column of T with IPv4 string to be looked up in LookupTable. IPv4 values can be masked using IP-prefix notation.
IPv4LookupKeystring✔️The column of LookupTable with IPv4 string that is matched against each SourceIPv4Key value.
ExtraKey1 .. ExtraKeyNstringAdditional column references that are used for lookup matches. Similar to join operation: records with equal values are considered matching. Column name references must exist both is source table T and LookupTable.
return_unmatchedboolA boolean flag that defines if the result should include all or only matching rows (default: false - only matching rows returned).

Returns

The ipv4_lookup plugin returns a result of join (lookup) based on IPv4 key. The schema of the table is the union of the source table and the lookup table, similar to the result of the lookup operator.

If the return_unmatched argument is set to true, the resulting table includes both matched and unmatched rows (filled with nulls).

If the return_unmatched argument is set to false, or omitted (the default value of false is used), the resulting table has as many records as matching results. This variant of lookup has better performance compared to return_unmatched=true execution.

Examples

IPv4 lookup - matching rows only

// IP lookup table: IP_Data
// Partial data from: https://raw.githubusercontent.com/datasets/geoip2-ipv4/master/data/geoip2-ipv4.csv
let IP_Data = datatable(network:string, continent_code:string ,continent_name:string, country_iso_code:string, country_name:string)
[
  "111.68.128.0/17","AS","Asia","JP","Japan",
  "5.8.0.0/19","EU","Europe","RU","Russia",
  "223.255.254.0/24","AS","Asia","SG","Singapore",
  "46.36.200.51/32","OC","Oceania","CK","Cook Islands",
  "2.20.183.0/24","EU","Europe","GB","United Kingdom",
];
let IPs = datatable(ip:string)
[
  '2.20.183.12',   // United Kingdom
  '5.8.1.2',       // Russia
  '192.165.12.17', // Unknown
];
IPs
| evaluate ipv4_lookup(IP_Data, ip, network)

Output

ipnetworkcontinent_codecontinent_namecountry_iso_codecountry_name
2.20.183.122.20.183.0/24EUEuropeGBUnited Kingdom
5.8.1.25.8.0.0/19EUEuropeRURussia

IPv4 lookup - return both matching and nonmatching rows

// IP lookup table: IP_Data
// Partial data from: 
// https://raw.githubusercontent.com/datasets/geoip2-ipv4/master/data/geoip2-ipv4.csv
let IP_Data = datatable(network:string,continent_code:string ,continent_name:string ,country_iso_code:string ,country_name:string )
[
    "111.68.128.0/17","AS","Asia","JP","Japan",
    "5.8.0.0/19","EU","Europe","RU","Russia",
    "223.255.254.0/24","AS","Asia","SG","Singapore",
    "46.36.200.51/32","OC","Oceania","CK","Cook Islands",
    "2.20.183.0/24","EU","Europe","GB","United Kingdom",
];
let IPs = datatable(ip:string)
[
    '2.20.183.12',   // United Kingdom
    '5.8.1.2',       // Russia
    '192.165.12.17', // Unknown
];
IPs
| evaluate ipv4_lookup(IP_Data, ip, network, return_unmatched = true)

Output

ipnetworkcontinent_codecontinent_namecountry_iso_codecountry_name
2.20.183.122.20.183.0/24EUEuropeGBUnited Kingdom
5.8.1.25.8.0.0/19EUEuropeRURussia
192.165.12.17

IPv4 lookup - using source in external_data()

let IP_Data = external_data(network:string,geoname_id:long,continent_code:string,continent_name:string ,country_iso_code:string,country_name:string,is_anonymous_proxy:bool,is_satellite_provider:bool)
    ['https://raw.githubusercontent.com/datasets/geoip2-ipv4/master/data/geoip2-ipv4.csv'];
let IPs = datatable(ip:string)
[
    '2.20.183.12',   // United Kingdom
    '5.8.1.2',       // Russia
    '192.165.12.17', // Sweden
];
IPs
| evaluate ipv4_lookup(IP_Data, ip, network, return_unmatched = true)

Output

ipnetworkgeoname_idcontinent_codecontinent_namecountry_iso_codecountry_nameis_anonymous_proxyis_satellite_provider
2.20.183.122.20.183.0/242635167EUEuropeGBUnited Kingdom00
5.8.1.25.8.0.0/192017370EUEuropeRURussia00
192.165.12.17192.165.8.0/212661886EUEuropeSESweden00

IPv4 lookup - using extra columns for matching

let IP_Data = external_data(network:string,geoname_id:long,continent_code:string,continent_name:string ,country_iso_code:string,country_name:string,is_anonymous_proxy:bool,is_satellite_provider:bool)
    ['https://raw.githubusercontent.com/datasets/geoip2-ipv4/master/data/geoip2-ipv4.csv'];
let IPs = datatable(ip:string, continent_name:string, country_iso_code:string)
[
    '2.20.183.12',   'Europe', 'GB', // United Kingdom
    '5.8.1.2',       'Europe', 'RU', // Russia
    '192.165.12.17', 'Europe', '',   // Sweden is 'SE' - so it won't be matched
];
IPs
| evaluate ipv4_lookup(IP_Data, ip, network, continent_name, country_iso_code)

Output

ipcontinent_namecountry_iso_codenetworkgeoname_idcontinent_codecountry_nameis_anonymous_proxyis_satellite_provider
2.20.183.12EuropeGB2.20.183.0/242635167EUUnited Kingdom00
5.8.1.2EuropeRU5.8.0.0/192017370EURussia00

2.5 - ipv6_lookup plugin

Learn how to use the ipv6_lookup plugin to look up an IPv6 address in a lookup table.

The ipv6_lookup plugin looks up an IPv6 value in a lookup table and returns rows with matched values. The plugin is invoked with the evaluate operator.

Syntax

T | evaluate ipv6_lookup( LookupTable , SourceIPv6Key , IPv6LookupKey [, return_unmatched ] )

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input whose column SourceIPv6Key is used for IPv6 matching.
LookupTablestring✔️Table or tabular expression with IPv6 lookup data, whose column LookupKey is used for IPv6 matching. IPv6 values can be masked using IP-prefix notation.
SourceIPv6Keystring✔️The column of T with IPv6 string to be looked up in LookupTable. IPv6 values can be masked using IP-prefix notation.
IPv6LookupKeystring✔️The column of LookupTable with IPv6 string that is matched against each SourceIPv6Key value.
return_unmatchedboolA boolean flag that defines if the result should include all or only matching rows (default: false - only matching rows returned).

Returns

The ipv6_lookup plugin returns a result of join (lookup) based on IPv6 key. The schema of the table is the union of the source table and the lookup table, similar to the result of the lookup operator.

If the return_unmatched argument is set to true, the resulting table includes both matched and unmatched rows (filled with nulls).

If the return_unmatched argument is set to false, or omitted (the default value of false is used), the resulting table has as many records as matching results. This variant of lookup has better performance compared to return_unmatched=true execution.

Examples

IPv6 lookup - matching rows only

// IP lookup table: IP_Data (the data is generated by ChatGPT).
let IP_Data = datatable(network:string, continent_code:string ,continent_name:string, country_iso_code:string, country_name:string)
[
  "2001:0db8:85a3::/48","NA","North America","US","United States",
  "2404:6800:4001::/48","AS","Asia","JP","Japan",
  "2a00:1450:4001::/48","EU","Europe","DE","Germany",
  "2800:3f0:4001::/48","SA","South America","BR","Brazil",
  "2c0f:fb50:4001::/48","AF","Africa","ZA","South Africa",
  "2607:f8b0:4001::/48","NA","North America","CA","Canada",
  "2a02:26f0:4001::/48","EU","Europe","FR","France",
  "2400:cb00:4001::/48","AS","Asia","IN","India",
  "2801:0db8:85a3::/48","SA","South America","AR","Argentina",
  "2a03:2880:4001::/48","EU","Europe","GB","United Kingdom"
];
let IPs = datatable(ip:string)
[
  "2001:0db8:85a3:0000:0000:8a2e:0370:7334", // United States
  "2404:6800:4001:0001:0000:8a2e:0370:7334", // Japan
  "2a02:26f0:4001:0006:0000:8a2e:0370:7334", // France
  "a5e:f127:8a9d:146d:e102:b5d3:c755:abcd",  // N/A
  "a5e:f127:8a9d:146d:e102:b5d3:c755:abce"   // N/A
];
IPs
| evaluate ipv6_lookup(IP_Data, ip, network)

Output

networkcontinent_codecontinent_namecountry_iso_codecountry_nameip
2001:0db8:85a3::/48NANorth AmericaUSUnited States2001:0db8:85a3:0000:0000:8a2e:0370:7334
2404:6800:4001::/48ASAsiaJPJapan2404:6800:4001:0001:0000:8a2e:0370:7334
2a02:26f0:4001::/48EUEuropeFRFrance2a02:26f0:4001:0006:0000:8a2e:0370:7334

IPv6 lookup - return both matching and nonmatching rows

// IP lookup table: IP_Data (the data is generated by ChatGPT).
let IP_Data = datatable(network:string, continent_code:string ,continent_name:string, country_iso_code:string, country_name:string)
[
  "2001:0db8:85a3::/48","NA","North America","US","United States",
  "2404:6800:4001::/48","AS","Asia","JP","Japan",
  "2a00:1450:4001::/48","EU","Europe","DE","Germany",
  "2800:3f0:4001::/48","SA","South America","BR","Brazil",
  "2c0f:fb50:4001::/48","AF","Africa","ZA","South Africa",
  "2607:f8b0:4001::/48","NA","North America","CA","Canada",
  "2a02:26f0:4001::/48","EU","Europe","FR","France",
  "2400:cb00:4001::/48","AS","Asia","IN","India",
  "2801:0db8:85a3::/48","SA","South America","AR","Argentina",
  "2a03:2880:4001::/48","EU","Europe","GB","United Kingdom"
];
let IPs = datatable(ip:string)
[
  "2001:0db8:85a3:0000:0000:8a2e:0370:7334", // United States
  "2404:6800:4001:0001:0000:8a2e:0370:7334", // Japan
  "2a02:26f0:4001:0006:0000:8a2e:0370:7334", // France
  "a5e:f127:8a9d:146d:e102:b5d3:c755:abcd",  // N/A
  "a5e:f127:8a9d:146d:e102:b5d3:c755:abce"   // N/A
];
IPs
| evaluate ipv6_lookup(IP_Data, ip, network, true)

Output

networkcontinent_codecontinent_namecountry_iso_codecountry_nameip
2001:0db8:85a3::/48NANorth AmericaUSUnited States2001:0db8:85a3:0000:0000:8a2e:0370:7334
2404:6800:4001::/48ASAsiaJPJapan2404:6800:4001:0001:0000:8a2e:0370:7334
2a02:26f0:4001::/48EUEuropeFRFrance2a02:26f0:4001:0006:0000:8a2e:0370:7334
a5e:f127:8a9d:146d:e102:b5d3:c755:abcd
a5e:f127:8a9d:146d:e102:b5d3:c755:abce

2.6 - preview plugin

Learn how to use the preview plugin to return two tables, one with the specified number of rows, and the other with the total number of records.

Returns a table with up to the specified number of rows from the input record set, and the total number of records in the input record set.

Syntax

T | evaluate preview(NumberOfRows)

Parameters

NameTypeRequiredDescription
Tstring✔️The table to preview.
NumberOfRowsint✔️The number of rows to preview from the table.

Returns

The preview plugin returns two result tables:

  • A table with up to the specified number of rows. For example, the sample query above is equivalent to running T | take 50.
  • A table with a single row/column, holding the number of records in the input record set. For example, the sample query above is equivalent to running T | count.

Example

StormEvents | evaluate preview(5)

Table1

The following output table only includes the first 6 columns. To see the full result, run the query.

|StartTime|EndTime|EpisodeId|EventId|State|EventType|…| |–|–|–| |2007-12-30T16:00:00Z|2007-12-30T16:05:00Z|11749|64588|GEORGIA| Thunderstorm Wind|…| |2007-12-20T07:50:00Z|2007-12-20T07:53:00Z|12554|68796|MISSISSIPPI| Thunderstorm Wind|…| |2007-09-29T08:11:00Z|2007-09-29T08:11:00Z|11091|61032|ATLANTIC SOUTH| Waterspout|…| |2007-09-20T21:57:00Z|2007-09-20T22:05:00Z|11078|60913|FLORIDA| Tornado|…| |2007-09-18T20:00:00Z|2007-09-19T18:00:00Z|11074|60904|FLORIDA| Heavy Rain|…|

Table2

Count
59066

2.7 - schema_merge plugin

Learn how to use the schema_merge plugin to merge tabular schema definitions into a unified schema.

Merges tabular schema definitions into a unified schema.

Schema definitions are expected to be in the format produced by the getschema operator.

The schema merge operation joins columns in input schemas and tries to reduce data types to common ones. If data types can’t be reduced, an error is displayed on the problematic column.

The plugin is invoked with the evaluate operator.

Syntax

T | evaluate schema_merge(PreserveOrder)

Parameters

NameTypeRequiredDescription
PreserveOrderboolWhen set to true, directs the plugin to validate the column order as defined by the first tabular schema that is kept. If the same column is in several schemas, the column ordinal must be like the column ordinal of the first schema that it appeared in. Default value is true.

Returns

The schema_merge plugin returns output similar to what getschema operator returns.

Examples

Merge with a schema that has a new column appended.

let schema1 = datatable(Uri:string, HttpStatus:int)[] | getschema;
let schema2 = datatable(Uri:string, HttpStatus:int, Referrer:string)[] | getschema;
union schema1, schema2 | evaluate schema_merge()

Output

ColumnNameColumnOrdinalDataTypeColumnType
Uri0System.Stringstring
HttpStatus1System.Int32int
Referrer2System.Stringstring

Merge with a schema that has different column ordering (HttpStatus ordinal changes from 1 to 2 in the new variant).

let schema1 = datatable(Uri:string, HttpStatus:int)[] | getschema;
let schema2 = datatable(Uri:string, Referrer:string, HttpStatus:int)[] | getschema;
union schema1, schema2 | evaluate schema_merge()

Output

ColumnNameColumnOrdinalDataTypeColumnType
Uri0System.Stringstring
Referrer1System.Stringstring
HttpStatus-1ERROR(unknown CSL type:ERROR(columns are out of order))ERROR(columns are out of order)

Merge with a schema that has different column ordering, but with PreserveOrder set to false.

let schema1 = datatable(Uri:string, HttpStatus:int)[] | getschema;
let schema2 = datatable(Uri:string, Referrer:string, HttpStatus:int)[] | getschema;
union schema1, schema2 | evaluate schema_merge(PreserveOrder = false)

Output

ColumnNameColumnOrdinalDataTypeColumnType
Uri0System.Stringstring
Referrer1System.Stringstring
HttpStatus2System.Int32int

3 - Language plugins

3.1 - Python plugin

Learn how to use the Python plugin to run user-defined functions using a Python script.

3.2 - Python plugin packages

Learn about the Python packages available in the Python plugin.

This article lists the available Python packages in the Python plugin. For more information, see Python plugin.

3.11.7 (Preview)

Python engine 3.11.7 + common data science and ML packages

PackageVersion
annotated-types0.6.0
anytree2.12.1
arrow1.3.0
attrs23.2.0
blinker1.7.0
blis0.7.11
Bottleneck1.3.8
Brotli1.1.0
brotlipy0.7.0
catalogue2.0.10
certifi2024.2.2
cffi1.16.0
chardet5.2.0
charset-normalizer3.3.2
click8.1.7
cloudpathlib0.16.0
cloudpickle3.0.0
colorama0.4.6
coloredlogs15.0.1
confection0.1.4
contourpy1.2.1
cycler0.12.1
cymem2.0.8
Cython3.0.10
daal2024.3.0
daal4py2024.3.0
dask2024.4.2
diff-match-patch20230430
dill0.3.8
distributed2024.4.2
filelock3.13.4
flashtext2.7
Flask3.0.3
Flask-Compress1.15
flatbuffers24.3.25
fonttools4.51.0
fsspec2024.3.1
gensim4.3.2
humanfriendly10.0
idna3.7
importlib_metadata7.1.0
intervaltree3.1.0
itsdangerous2.2.0
jellyfish1.0.3
Jinja23.1.3
jmespath1.0.1
joblib1.4.0
json50.9.25
jsonschema4.21.1
jsonschema-specifications2023.12.1
kiwisolver1.4.5
langcodes3.4.0
language_data1.2.0
locket1.0.0
lxml5.2.1
marisa-trie1.1.0
MarkupSafe2.1.5
mlxtend0.23.1
mpmath1.3.0
msgpack1.0.8
murmurhash1.0.10
networkx3.3
nltk3.8.1
numpy1.26.4
onnxruntime1.17.3
packaging24.0
pandas2.2.2
partd1.4.1
patsy0.5.6
pillow10.3.0
platformdirs4.2.1
plotly5.21.0
preshed3.0.9
protobuf5.26.1
psutil5.9.8
pycparser2.22
pydantic2.7.1
pydantic_core2.18.2
pyfpgrowth1.0
pyparsing3.1.2
pyreadline33.4.1
python-dateutil2.9.0.post0
pytz2024.1
PyWavelets1.6.0
PyYAML6.0.1
queuelib1.6.2
referencing0.35.0
regex2024.4.16
requests2.31.0
requests-file2.0.0
rpds-py0.18.0
scikit-learn1.4.2
scipy1.13.0
sip6.8.3
six1.16.0
smart-open6.4.0
snowballstemmer2.2.0
sortedcollections2.1.0
sortedcontainers2.4.0
spacy3.7.4
spacy-legacy3.0.12
spacy-loggers1.0.5
srsly2.4.8
statsmodels0.14.2
sympy1.12
tbb2021.12.0
tblib3.0.0
tenacity8.2.3
textdistance4.6.2
thinc8.2.3
threadpoolctl3.4.0
three-merge0.1.1
tldextract5.1.2
toolz0.12.1
tornado6.4
tqdm4.66.2
typer0.9.4
types-python-dateutil2.9.0.20240316
typing_extensions4.11.0
tzdata2024.1
ujson5.9.0
Unidecode1.3.8
urllib32.2.1
wasabi1.1.2
weasel0.3.4
Werkzeug3.0.2
xarray2024.3.0
zict3.0.0
zipp3.18.1
zstandard0.22.0

3.11.7 DL (Preview)

Python engine 3.11.7 + common data science and ML packages + deep learning packages (tensorflow & torch)

PackageVersion
absl-py2.1.0
alembic1.13.1
aniso86019.0.1
annotated-types0.6.0
anytree2.12.1
arch7.0.0
arrow1.3.0
astunparse1.6.3
attrs23.2.0
blinker1.7.0
blis0.7.11
Bottleneck1.3.8
Brotli1.1.0
brotlipy0.7.0
cachetools5.3.3
catalogue2.0.10
certifi2024.2.2
cffi1.16.0
chardet5.2.0
charset-normalizer3.3.2
click8.1.7
cloudpathlib0.16.0
cloudpickle3.0.0
colorama0.4.6
coloredlogs15.0.1
confection0.1.4
contourpy1.2.1
cycler0.12.1
cymem2.0.8
Cython3.0.10
daal2024.3.0
daal4py2024.3.0
dask2024.4.2
Deprecated1.2.14
diff-match-patch20230430
dill0.3.8
distributed2024.4.2
docker7.1.0
entrypoints0.4
filelock3.13.4
flashtext2.7
Flask3.0.3
Flask-Compress1.15
flatbuffers24.3.25
fonttools4.51.0
fsspec2024.3.1
gast0.5.4
gensim4.3.2
gitdb4.0.11
GitPython3.1.43
google-pasta0.2.0
graphene3.3
graphql-core3.2.3
graphql-relay3.2.0
greenlet3.0.3
grpcio1.64.0
h5py3.11.0
humanfriendly10.0
idna3.7
importlib-metadata7.0.0
iniconfig2.0.0
intervaltree3.1.0
itsdangerous2.2.0
jellyfish1.0.3
Jinja23.1.3
jmespath1.0.1
joblib1.4.0
json50.9.25
jsonschema4.21.1
jsonschema-specifications2023.12.1
keras3.3.3
kiwisolver1.4.5
langcodes3.4.0
language_data1.2.0
libclang18.1.1
locket1.0.0
lxml5.2.1
Mako1.3.5
marisa-trie1.1.0
Markdown3.6
markdown-it-py3.0.0
MarkupSafe2.1.5
mdurl0.1.2
ml-dtypes0.3.2
mlflow2.13.0
mlxtend0.23.1
mpmath1.3.0
msgpack1.0.8
murmurhash1.0.10
namex0.0.8
networkx3.3
nltk3.8.1
numpy1.26.4
onnxruntime1.17.3
opentelemetry-api1.24.0
opentelemetry-sdk1.24.0
opentelemetry-semantic-conventions0.45b0
opt-einsum3.3.0
optree0.11.0
packaging24.0
pandas2.2.2
partd1.4.1
patsy0.5.6
pillow10.3.0
platformdirs4.2.1
plotly5.21.0
pluggy1.5.0
preshed3.0.9
protobuf4.25.3
psutil5.9.8
pyarrow15.0.2
pycparser2.22
pydantic2.7.1
pydantic_core2.18.2
pyfpgrowth1.0
Pygments2.18.0
pyparsing3.1.2
pyreadline33.4.1
pytest8.2.1
python-dateutil2.9.0.post0
pytz2024.1
PyWavelets1.6.0
pywin32306
PyYAML6.0.1
querystring-parser1.2.4
queuelib1.6.2
referencing0.35.0
regex2024.4.16
requests2.31.0
requests-file2.0.0
rich13.7.1
rpds-py0.18.0
rstl0.1.3
scikit-learn1.4.2
scipy1.13.0
seasonal0.3.1
sip6.8.3
six1.16.0
smart-open6.4.0
smmap5.0.1
snowballstemmer2.2.0
sortedcollections2.1.0
sortedcontainers2.4.0
spacy3.7.4
spacy-legacy3.0.12
spacy-loggers1.0.5
SQLAlchemy2.0.30
sqlparse0.5.0
srsly2.4.8
statsmodels0.14.2
sympy1.12
tbb2021.12.0
tblib3.0.0
tenacity8.2.3
tensorboard2.16.2
tensorboard-data-server0.7.2
tensorflow2.16.1
tensorflow-intel2.16.1
tensorflow-io-gcs-filesystem0.31.0
termcolor2.4.0
textdistance4.6.2
thinc8.2.3
threadpoolctl3.4.0
three-merge0.1.1
time-series-anomaly-detector0.2.7
tldextract5.1.2
toolz0.12.1
torch2.2.2
torchaudio2.2.2
torchvision0.17.2
tornado6.4
tqdm4.66.2
typer0.9.4
types-python-dateutil2.9.0.20240316
typing_extensions4.11.0
tzdata2024.1
ujson5.9.0
Unidecode1.3.8
urllib32.2.1
waitress3.0.0
wasabi1.1.2
weasel0.3.4
Werkzeug3.0.2
wrapt1.16.0
xarray2024.3.0
zict3.0.0
zipp3.18.1
zstandard0.22.0

3.10.8

Python engine 3.10.8 + common data science and ML packages

PackageVersion
alembic1.11.1
anytree2.8.0
arrow1.2.3
attrs22.2.0
blis0.7.9
Bottleneck1.3.5
Brotli1.0.9
brotlipy0.7.0
catalogue2.0.8
certifi2022.12.7
cffi1.15.1
chardet5.0.0
charset-normalizer2.1.1
click8.1.3
cloudpickle2.2.1
colorama0.4.6
coloredlogs15.0.1
confection0.0.4
contourpy1.0.7
cycler0.11.0
cymem2.0.7
Cython0.29.28
daal2021.6.0
daal4py2021.6.3
dask2022.10.2
databricks-cli0.17.7
diff-match-patch20200713
dill0.3.6
distributed2022.10.2
docker6.1.3
entrypoints0.4
filelock3.9.1
flashtext2.7
Flask2.2.3
Flask-Compress1.13
flatbuffers23.3.3
fonttools4.39.0
fsspec2023.3.0
gensim4.2.0
gitdb4.0.10
GitPython3.1.31
greenlet2.0.2
HeapDict1.0.1
humanfriendly10.0
idna3.4
importlib-metadata6.7.0
intervaltree3.1.0
itsdangerous2.1.2
jellyfish0.9.0
Jinja23.1.2
jmespath1.0.1
joblib1.2.0
json50.9.10
jsonschema4.16.0
kiwisolver1.4.4
langcodes3.3.0
locket1.0.0
lxml4.9.1
Mako1.2.4
Markdown3.4.3
MarkupSafe2.1.2
mlflow2.4.1
mlxtend0.21.0
mpmath1.3.0
msgpack1.0.5
murmurhash1.0.9
networkx2.8.7
nltk3.7
numpy1.23.4
oauthlib3.2.2
onnxruntime1.13.1
packaging23.0
pandas1.5.1
partd1.3.0
pathy0.10.1
patsy0.5.3
Pillow9.4.0
pip23.0.1
platformdirs2.5.2
plotly5.11.0
ply3.11
preshed3.0.8
protobuf4.22.1
psutil5.9.3
pyarrow12.0.1
pycparser2.21
pydantic1.10.6
pyfpgrowth1.0
PyJWT2.7.0
pyparsing3.0.9
pyreadline33.4.1
pyrsistent0.19.3
python-dateutil2.8.2
pytz2022.7.1
PyWavelets1.4.1
pywin32306
PyYAML6.0
querystring-parser1.2.4
queuelib1.6.2
regex2022.10.31
requests2.28.2
requests-file1.5.1
scikit-learn1.1.3
scipy1.9.3
setuptools67.6.0
sip6.7.3
six1.16.0
smart-open6.3.0
smmap5.0.0
snowballstemmer2.2.0
sortedcollections2.1.0
sortedcontainers2.4.0
spacy3.4.2
spacy-legacy3.0.12
spacy-loggers1.0.4
SQLAlchemy2.0.18
sqlparse0.4.4
srsly2.4.5
statsmodels0.13.2
sympy1.11.1
tabulate0.9.0
tbb2021.7.1
tblib1.7.0
tenacity8.2.2
textdistance4.5.0
thinc8.1.9
threadpoolctl3.1.0
three-merge0.1.1
tldextract3.4.0
toml0.10.2
toolz0.12.0
tornado6.1
tqdm4.65.0
typer0.4.2
typing_extensions4.5.0
ujson5.5.0
Unidecode1.3.6
urllib31.26.15
waitress2.1.2
wasabi0.10.1
websocket-client1.6.1
Werkzeug2.2.3
wheel0.40.0
xarray2022.10.0
zict2.2.0
zipp3.15.0

3.10.8 DL

Not supported

3.6.5 (Legacy)

Not supported

This article lists the available managed Python packages in the Python plugin. For more information, see Python plugin.

To create a custom image, see Create a custom image.

3.11.7 (Preview)

Python engine 3.11.7 + common data science and ML packages

PackageVersion
annotated-types0.6.0
anytree2.12.1
arrow1.3.0
attrs23.2.0
blinker1.7.0
blis0.7.11
Bottleneck1.3.8
Brotli1.1.0
brotlipy0.7.0
catalogue2.0.10
certifi2024.2.2
cffi1.16.0
chardet5.2.0
charset-normalizer3.3.2
click8.1.7
cloudpathlib0.16.0
cloudpickle3.0.0
colorama0.4.6
coloredlogs15.0.1
confection0.1.4
contourpy1.2.1
cycler0.12.1
cymem2.0.8
Cython3.0.10
daal2024.3.0
daal4py2024.3.0
dask2024.4.2
diff-match-patch20230430
dill0.3.8
distributed2024.4.2
filelock3.13.4
flashtext2.7
Flask3.0.3
Flask-Compress1.15
flatbuffers24.3.25
fonttools4.51.0
fsspec2024.3.1
gensim4.3.2
humanfriendly10.0
idna3.7
importlib_metadata7.1.0
intervaltree3.1.0
itsdangerous2.2.0
jellyfish1.0.3
Jinja23.1.3
jmespath1.0.1
joblib1.4.0
json50.9.25
jsonschema4.21.1
jsonschema-specifications2023.12.1
kiwisolver1.4.5
langcodes3.4.0
language_data1.2.0
locket1.0.0
lxml5.2.1
marisa-trie1.1.0
MarkupSafe2.1.5
matplotlib3.8.4
mlxtend0.23.1
mpmath1.3.0
msgpack1.0.8
murmurhash1.0.10
networkx3.3
nltk3.8.1
numpy1.26.4
onnxruntime1.17.3
packaging24.0
pandas2.2.2
partd1.4.1
patsy0.5.6
pillow10.3.0
platformdirs4.2.1
plotly5.21.0
preshed3.0.9
protobuf5.26.1
psutil5.9.8
pycparser2.22
pydantic2.7.1
pydantic_core2.18.2
pyfpgrowth1.0
pyparsing3.1.2
pyreadline33.4.1
python-dateutil2.9.0.post0
pytz2024.1
PyWavelets1.6.0
PyYAML6.0.1
queuelib1.6.2
referencing0.35.0
regex2024.4.16
requests2.31.0
requests-file2.0.0
rpds-py0.18.0
scikit-learn1.4.2
scipy1.13.0
sip6.8.3
six1.16.0
smart-open6.4.0
snowballstemmer2.2.0
sortedcollections2.1.0
sortedcontainers2.4.0
spacy3.7.4
spacy-legacy3.0.12
spacy-loggers1.0.5
srsly2.4.8
statsmodels0.14.2
sympy1.12
tbb2021.12.0
tblib3.0.0
tenacity8.2.3
textdistance4.6.2
thinc8.2.3
threadpoolctl3.4.0
three-merge0.1.1
tldextract5.1.2
toolz0.12.1
tornado6.4
tqdm4.66.2
typer0.9.4
types-python-dateutil2.9.0.20240316
typing_extensions4.11.0
tzdata2024.1
ujson5.9.0
Unidecode1.3.8
urllib32.2.1
wasabi1.1.2
weasel0.3.4
Werkzeug3.0.2
xarray2024.3.0
zict3.0.0
zipp3.18.1
zstandard0.22.0

3.11.7 DL (Preview)

Python engine 3.11.7 + common data science and ML packages + deep learning packages (tensorflow & torch)

PackageVersion
absl-py2.1.0
alembic1.13.1
aniso86019.0.1
annotated-types0.6.0
anytree2.12.1
arch7.0.0
arrow1.3.0
astunparse1.6.3
attrs23.2.0
blinker1.7.0
blis0.7.11
Bottleneck1.3.8
Brotli1.1.0
brotlipy0.7.0
cachetools5.3.3
catalogue2.0.10
certifi2024.2.2
cffi1.16.0
chardet5.2.0
charset-normalizer3.3.2
click8.1.7
cloudpathlib0.16.0
cloudpickle3.0.0
colorama0.4.6
coloredlogs15.0.1
confection0.1.4
contourpy1.2.1
cycler0.12.1
cymem2.0.8
Cython3.0.10
daal2024.3.0
daal4py2024.3.0
dask2024.4.2
Deprecated1.2.14
diff-match-patch20230430
dill0.3.8
distributed2024.4.2
docker7.1.0
entrypoints0.4
filelock3.13.4
flashtext2.7
Flask3.0.3
Flask-Compress1.15
flatbuffers24.3.25
fonttools4.51.0
fsspec2024.3.1
gast0.5.4
gensim4.3.2
gitdb4.0.11
GitPython3.1.43
google-pasta0.2.0
graphene3.3
graphql-core3.2.3
graphql-relay3.2.0
greenlet3.0.3
grpcio1.64.0
h5py3.11.0
humanfriendly10.0
idna3.7
importlib-metadata7.0.0
iniconfig2.0.0
intervaltree3.1.0
itsdangerous2.2.0
jellyfish1.0.3
Jinja23.1.3
jmespath1.0.1
joblib1.4.0
json50.9.25
jsonschema4.21.1
jsonschema-specifications2023.12.1
keras3.3.3
kiwisolver1.4.5
langcodes3.4.0
language_data1.2.0
libclang18.1.1
locket1.0.0
lxml5.2.1
Mako1.3.5
marisa-trie1.1.0
Markdown3.6
markdown-it-py3.0.0
MarkupSafe2.1.5
matplotlib3.8.4
mdurl0.1.2
ml-dtypes0.3.2
mlflow2.13.0
mlxtend0.23.1
mpmath1.3.0
msgpack1.0.8
murmurhash1.0.10
namex0.0.8
networkx3.3
nltk3.8.1
numpy1.26.4
onnxruntime1.17.3
opentelemetry-api1.24.0
opentelemetry-sdk1.24.0
opentelemetry-semantic-conventions0.45b0
opt-einsum3.3.0
optree0.11.0
packaging24.0
pandas2.2.2
partd1.4.1
patsy0.5.6
pillow10.3.0
platformdirs4.2.1
plotly5.21.0
pluggy1.5.0
preshed3.0.9
protobuf4.25.3
psutil5.9.8
pyarrow15.0.2
pycparser2.22
pydantic2.7.1
pydantic_core2.18.2
pyfpgrowth1.0
Pygments2.18.0
pyparsing3.1.2
pyreadline33.4.1
pytest8.2.1
python-dateutil2.9.0.post0
pytz2024.1
PyWavelets1.6.0
pywin32306
PyYAML6.0.1
querystring-parser1.2.4
queuelib1.6.2
referencing0.35.0
regex2024.4.16
requests2.31.0
requests-file2.0.0
rich13.7.1
rpds-py0.18.0
rstl0.1.3
scikit-learn1.4.2
scipy1.13.0
seasonal0.3.1
sip6.8.3
six1.16.0
smart-open6.4.0
smmap5.0.1
snowballstemmer2.2.0
sortedcollections2.1.0
sortedcontainers2.4.0
spacy3.7.4
spacy-legacy3.0.12
spacy-loggers1.0.5
SQLAlchemy2.0.30
sqlparse0.5.0
srsly2.4.8
statsmodels0.14.2
sympy1.12
tbb2021.12.0
tblib3.0.0
tenacity8.2.3
tensorboard2.16.2
tensorboard-data-server0.7.2
tensorflow2.16.1
tensorflow-intel2.16.1
tensorflow-io-gcs-filesystem0.31.0
termcolor2.4.0
textdistance4.6.2
thinc8.2.3
threadpoolctl3.4.0
three-merge0.1.1
time-series-anomaly-detector0.2.7
tldextract5.1.2
toolz0.12.1
torch2.2.2
torchaudio2.2.2
torchvision0.17.2
tornado6.4
tqdm4.66.2
typer0.9.4
types-python-dateutil2.9.0.20240316
typing_extensions4.11.0
tzdata2024.1
ujson5.9.0
Unidecode1.3.8
urllib32.2.1
waitress3.0.0
wasabi1.1.2
weasel0.3.4
Werkzeug3.0.2
wrapt1.16.0
xarray2024.3.0
zict3.0.0
zipp3.18.1
zstandard0.22.0

3.10.8

Python engine 3.10.8 + common data science and ML packages

PackageVersion
alembic1.11.1
anytree2.8.0
arrow1.2.3
attrs22.2.0
blis0.7.9
Bottleneck1.3.5
Brotli1.0.9
brotlipy0.7.0
catalogue2.0.8
certifi2022.12.7
cffi1.15.1
chardet5.0.0
charset-normalizer2.1.1
click8.1.3
cloudpickle2.2.1
colorama0.4.6
coloredlogs15.0.1
confection0.0.4
contourpy1.0.7
cycler0.11.0
cymem2.0.7
Cython0.29.28
daal2021.6.0
daal4py2021.6.3
dask2022.10.2
databricks-cli0.17.7
diff-match-patch20200713
dill0.3.6
distributed2022.10.2
docker6.1.3
entrypoints0.4
filelock3.9.1
flashtext2.7
Flask2.2.3
Flask-Compress1.13
flatbuffers23.3.3
fonttools4.39.0
fsspec2023.3.0
gensim4.2.0
gitdb4.0.10
GitPython3.1.31
greenlet2.0.2
HeapDict1.0.1
humanfriendly10.0
idna3.4
importlib-metadata6.7.0
intervaltree3.1.0
itsdangerous2.1.2
jellyfish0.9.0
Jinja23.1.2
jmespath1.0.1
joblib1.2.0
json50.9.10
jsonschema4.16.0
kiwisolver1.4.4
langcodes3.3.0
locket1.0.0
lxml4.9.1
Mako1.2.4
Markdown3.4.3
MarkupSafe2.1.2
mlflow2.4.1
mlxtend0.21.0
mpmath1.3.0
msgpack1.0.5
murmurhash1.0.9
networkx2.8.7
nltk3.7
numpy1.23.4
oauthlib3.2.2
onnxruntime1.13.1
packaging23.0
pandas1.5.1
partd1.3.0
pathy0.10.1
patsy0.5.3
Pillow9.4.0
pip23.0.1
platformdirs2.5.2
plotly5.11.0
ply3.11
preshed3.0.8
protobuf4.22.1
psutil5.9.3
pyarrow12.0.1
pycparser2.21
pydantic1.10.6
pyfpgrowth1.0
PyJWT2.7.0
pyparsing3.0.9
pyreadline33.4.1
pyrsistent0.19.3
python-dateutil2.8.2
pytz2022.7.1
PyWavelets1.4.1
pywin32306
PyYAML6.0
querystring-parser1.2.4
queuelib1.6.2
regex2022.10.31
requests2.28.2
requests-file1.5.1
scikit-learn1.1.3
scipy1.9.3
setuptools67.6.0
sip6.7.3
six1.16.0
smart-open6.3.0
smmap5.0.0
snowballstemmer2.2.0
sortedcollections2.1.0
sortedcontainers2.4.0
spacy3.4.2
spacy-legacy3.0.12
spacy-loggers1.0.4
SQLAlchemy2.0.18
sqlparse0.4.4
srsly2.4.5
statsmodels0.13.2
sympy1.11.1
tabulate0.9.0
tbb2021.7.1
tblib1.7.0
tenacity8.2.2
textdistance4.5.0
thinc8.1.9
threadpoolctl3.1.0
three-merge0.1.1
tldextract3.4.0
toml0.10.2
toolz0.12.0
tornado6.1
tqdm4.65.0
typer0.4.2
typing_extensions4.5.0
ujson5.5.0
Unidecode1.3.6
urllib31.26.15
waitress2.1.2
wasabi0.10.1
websocket-client1.6.1
Werkzeug2.2.3
wheel0.40.0
xarray2022.10.0
zict2.2.0
zipp3.15.0

3.10.8 DL

Python engine 3.10.8 + common data science and ML packages + deep learning packages (tensorflow & torch)

PackageVersion
absl-py1.4.0
alembic1.11.1
anytree2.8.0
arrow1.2.3
astunparse1.6.3
attrs22.1.0
blis0.7.9
Bottleneck1.3.5
Brotli1.0.9
brotlipy0.7.0
cachetools5.3.0
catalogue2.0.8
certifi2022.9.24
cffi1.15.1
chardet5.0.0
charset-normalizer2.1.1
click8.1.3
cloudpickle2.2.0
colorama0.4.6
coloredlogs15.0.1
confection0.0.3
contourpy1.0.6
cycler0.11.0
cymem2.0.7
Cython0.29.28
daal2021.6.0
daal4py2021.6.3
dask2022.10.2
databricks-cli0.17.7
diff-match-patch20200713
dill0.3.6
distributed2022.10.2
docker6.1.3
entrypoints0.4
filelock3.8.0
flashtext2.7
Flask2.2.2
Flask-Compress1.13
flatbuffers22.10.26
fonttools4.38.0
fsspec2022.10.0
gast0.4.0
gensim4.2.0
gitdb4.0.10
GitPython3.1.31
google-auth2.16.2
google-auth-oauthlib0.4.6
google-pasta0.2.0
greenlet2.0.2
grpcio1.51.3
h5py3.8.0
HeapDict1.0.1
humanfriendly10.0
idna3.4
importlib-metadata6.7.0
intervaltree3.1.0
itsdangerous2.1.2
jax0.4.6
jellyfish0.9.0
Jinja23.1.2
jmespath1.0.1
joblib1.2.0
json50.9.10
jsonschema4.16.0
keras2.12.0
kiwisolver1.4.4
langcodes3.3.0
libclang16.0.0
locket1.0.0
lxml4.9.1
Mako1.2.4
Markdown3.4.2
MarkupSafe2.1.1
mlflow2.4.1
mlxtend0.21.0
mpmath1.2.1
msgpack1.0.4
murmurhash1.0.9
networkx2.8.7
nltk3.7
numpy1.23.4
oauthlib3.2.2
onnxruntime1.13.1
opt-einsum3.3.0
packaging21.3
pandas1.5.1
partd1.3.0
pathy0.6.2
patsy0.5.3
Pillow9.3.0
pip23.0.1
platformdirs2.5.2
plotly5.11.0
ply3.11
preshed3.0.8
protobuf4.21.9
psutil5.9.3
pyarrow12.0.1
pyasn10.4.8
pyasn1-modules0.2.8
pycparser2.21
pydantic1.10.2
pyfpgrowth1.0
PyJWT2.7.0
pyparsing3.0.9
pyreadline33.4.1
pyrsistent0.19.1
python-dateutil2.8.2
pytz2022.5
PyWavelets1.4.1
pywin32306
PyYAML6.0
querystring-parser1.2.4
queuelib1.6.2
regex2022.10.31
requests2.28.1
requests-file1.5.1
requests-oauthlib1.3.1
rsa4.9
scikit-learn1.1.3
scipy1.9.3
setuptools67.6.0
sip6.7.3
six1.16.0
smart-open5.2.1
smmap5.0.0
snowballstemmer2.2.0
sortedcollections2.1.0
sortedcontainers2.4.0
spacy3.4.2
spacy-legacy3.0.10
spacy-loggers1.0.3
SQLAlchemy2.0.18
sqlparse0.4.4
srsly2.4.5
statsmodels0.13.2
sympy1.11.1
tabulate0.9.0
tbb2021.7.0
tblib1.7.0
tenacity8.1.0
tensorboard2.12.0
tensorboard-data-server0.7.0
tensorboard-plugin-wit1.8.1
tensorflow2.12.0
tensorflow-estimator2.12.0
tensorflow-intel2.12.0
tensorflow-io-gcs-filesystem0.31.0
termcolor2.2.0
textdistance4.5.0
thinc8.1.5
threadpoolctl3.1.0
three-merge0.1.1
tldextract3.4.0
toml0.10.2
toolz0.12.0
torch2.0.0
torchaudio2.0.1
torchvision0.15.1
tornado6.1
tqdm4.64.1
typer0.4.2
typing_extensions4.4.0
ujson5.5.0
Unidecode1.3.6
urllib31.26.12
waitress2.1.2
wasabi0.10.1
websocket-client1.6.1
Werkzeug2.2.2
wheel0.40.0
wrapt1.14.1
xarray2022.10.0
zict2.2.0
zipp3.15.0

3.6.5 (Legacy)

PackageVersion
adal1.2.0
anaconda_navigator1.8.7
anytree2.8.0
argparse1.1
asn1crypto0.24.0
astor0.7.1
astroid1.6.3
astropy3.0.2
attr18.1.0
babel2.5.3
backcall0.1.0
bitarray0.8.1
bleach2.1.3
bokeh0.12.16
boto2.48.0
boto31.9.109
botocore1.12.109
bottleneck1.2.1
bs44.6.0
certifi2018.04.16
cffi1.11.5
cgi2.6
chardet3.0.4
click6.7
cloudpickle0.5.3
clyent1.2.2
colorama0.3.9
conda4.5.4
conda_build3.10.5
conda_env4.5.4
conda_verify2.0.0
Crypto2.6.1
cryptography2.2.2
csv1
ctypes1.1.0
cycler0.10.0
cython0.28.2
Cython0.28.2
cytoolz0.9.0.1
dask0.17.5
datashape0.5.4
dateutil2.7.3
decimal1.7
decorator4.3.0
dill0.2.8.2
distributed1.21.8
distutils3.6.5
docutils0.14
entrypoints0.2.3
et_xmlfile1.0.1
fastcache1.0.2
filelock3.0.4
flask1.0.2
flask_cors3.0.4
future0.17.1
gensim3.7.1
geohash0.8.5
gevent1.3.0
glob2“(0, 6)”
greenlet0.4.13
h5py2.7.1
html5lib1.0.1
idna2.6
imageio2.3.0
imaplib2.58
ipaddress1
IPython6.4.0
ipython_genutils0.2.0
isort4.3.4
jdcal1.4
jedi0.12.0
jinja22.1
jmespath0.9.4
joblib0.13.0
json2.0.9
jsonschema2.6.0
jupyter_core4.4.0
jupyterlab0.32.1
jwt1.7.1
keras2.2.4
keras_applications1.0.6
keras_preprocessing1.0.5
kiwisolver1.0.1
lazy_object_proxy1.3.1
llvmlite0.23.1
logging0.5.1.2
markdown3.0.1
markupsafe1
matplotlib2.2.2
mccabe0.6.1
menuinst1.4.14
mistune0.8.3
mkl1.1.2
mlxtend0.15.0.0
mpmath1.0.0
msrest0.6.2
msrestazure0.6.0
multipledispatch0.5.0
navigator_updater0.2.1
nbconvert5.3.1
nbformat4.4.0
networkx2.1
nltk3.3
nose1.3.7
notebook5.5.0
numba0.38.0
numexpr2.6.5
numpy1.19.1
numpydoc0.8.0
oauthlib2.1.0
olefile0.45.1
onnxruntime1.4.0
openpyxl2.5.3
OpenSSL18.0.0
optparse1.5.3
packaging17.1
pandas0.24.1
parso0.2.0
past0.17.1
path11.0.1
patsy0.5.0
pep81.7.1
phonenumbers8.10.6
pickleshare0.7.4
PIL5.1.0
pint0.8.1
pip21.3.1
plac0.9.6
platform1.0.8
plotly4.8.2
pluggy0.6.0
ply3.11
prompt_toolkit1.0.15
psutil5.4.5
py1.5.3
pycodestyle2.4.0
pycosat0.6.3
pycparser2.18
pyflakes1.6.0
pyfpgrowth1
pygments2.2.0
pylint1.8.4
pyparsing2.2.0
pytest3.5.1
pytest_arraydiff0.2
pytz2018.4
pywt0.5.2
qtconsole4.3.1
re2.2.1
regex2.4.136
requests2.18.4
requests_oauthlib1.0.0
ruamel_yaml0.15.35
s3transfer0.2.0
sandbox_utils1.2
scipy1.1.0
scrubadub1.2.0
setuptools39.1.0
six1.11.0
sklearn0.20.3
socketserver0.4
socks1.6.7
sortedcollections0.6.1
sortedcontainers1.5.10
spacy2.0.18
sphinx1.7.4
spyder3.2.8
sqlalchemy1.2.7
statsmodels0.9.0
surprise1.0.6
sympy1.1.1
tables3.4.3
tabnanny6
tblib1.3.2
tensorflow1.12.0
terminado0.8.1
testpath0.3.1
textblob0.10.0
tlz0.9.0.1
toolz0.9.0
torch1.0.0
tqdm4.31.1
traitlets4.3.2
ujson1.35
unicodecsv0.14.1
urllib31.22
werkzeug0.14.1
wheel0.31.1
widgetsnbextension3.2.1
win32rcparser0.11
winpty0.5.1
wrapt1.10.11
xgboost0.81
xlsxwriter1.0.4
yaml3.12
zict0.1.3

3.3 - R plugin (Preview)

Learn how to use the R plugin (Preview) to run a user-defined function using an R script.

The R plugin runs a user-defined function (UDF) using an R script.

The script gets tabular data as its input, and produces tabular output. The plugin’s runtime is hosted in a sandbox on the cluster’s nodes. The sandbox provides an isolated and secure environment.

Syntax

T | evaluate [hint.distribution = (single | per_node)] r(output_schema, script [, script_parameters] [, external_artifacts])

Parameters

NameTypeRequiredDescription
output_schemastring✔️A type literal that defines the output schema of the tabular data, returned by the R code. The format is: typeof(ColumnName: ColumnType[, …]). For example: typeof(col1:string, col2:long). To extend the input schema, use the following syntax: typeof(*, col1:string, col2:long).
scriptstring✔️The valid R script to be executed.
script_parametersdynamicA property bag of name and value pairs to be passed to the R script as the reserved kargs dictionary. For more information, see Reserved R variables.
hint.distributionstringHint for the plugin’s execution to be distributed across multiple cluster nodes. The default value is single. single means that a single instance of the script will run over the entire query data. per_node means that if the query before the R block is distributed, an instance of the script will run on each node over the data that it contains.
external_artifactsdynamicA property bag of name and URL pairs for artifacts that are accessible from cloud storage. They can be made available for the script to use at runtime. URLs referenced in this property bag are required to be included in the cluster’s callout policy and in a publicly available location, or contain the necessary credentials, as explained in storage connection strings. The artifacts are made available for the script to consume from a local temporary directory, .\Temp. The names provided in the property bag are used as the local file names. See Example. For more information, see Install packages for the R plugin.

Reserved R variables

The following variables are reserved for interaction between Kusto Query Language and the R code:

  • df: The input tabular data (the values of T above), as an R DataFrame.
  • kargs: The value of the script_parameters argument, as an R dictionary.
  • result: An R DataFrame created by the R script. The value becomes the tabular data that gets sent to any Kusto query operator that follows the plugin.

Enable the plugin

R sandbox image

Examples

range x from 1 to 360 step 1
| evaluate r(
//
typeof(*, fx:double),               //  Output schema: append a new fx column to original table 
//
'result <- df\n'                    //  The R decorated script
'n <- nrow(df)\n'
'g <- kargs$gain\n'
'f <- kargs$cycles\n'
'result$fx <- g * sin(df$x / n * 2 * pi * f)'
//
, bag_pack('gain', 100, 'cycles', 4)    //  dictionary of parameters
)
| render linechart 

Sine demo.

Performance tips

  • Reduce the plugin’s input dataset to the minimum amount required (columns/rows).

  • Use filters on the source dataset using the Kusto Query Language, when possible.

  • To make a calculation on a subset of the source columns, project only those columns before invoking the plugin.

  • Use hint.distribution = per_node whenever the logic in your script is distributable.

  • You can also use the partition operator for partitioning the input data et.

  • Whenever possible, use the Kusto Query Language to implement the logic of your R script.

    For example:

    .show operations
    | where StartedOn > ago(1d) // Filtering out irrelevant records before invoking the plugin
    | project d_seconds = Duration / 1s // Projecting only a subset of the necessary columns
    | evaluate hint.distribution = per_node r( // Using per_node distribution, as the script's logic allows it
        typeof(*, d2:double),
        'result <- df\n'
        'result$d2 <- df$d_seconds\n' // Negative example: this logic should have been written using Kusto's query language
      )
    | summarize avg = avg(d2)
    

Usage tips

  • To avoid conflicts between Kusto string delimiters and R string delimiters:

    • Use single quote characters (') for Kusto string literals in Kusto queries.
    • Use double quote characters (") for R string literals in R scripts.
  • Use the external data operator to obtain the content of a script that you’ve stored in an external location, such as Azure blob storage or a public GitHub repository.

    For example:

    let script = 
        externaldata(script:string)
        [h'https://kustoscriptsamples.blob.core.windows.net/samples/R/sample_script.r']
        with(format = raw);
    range x from 1 to 360 step 1
    | evaluate r(
        typeof(*, fx:double),
        toscalar(script), 
        bag_pack('gain', 100, 'cycles', 4))
    | render linechart 
    

Install packages for the R plugin

Follow these step by step instructions to install package(s) that aren’t included in the plugin’s base image.

Prerequisites

  1. Create a blob container to host the packages, preferably in the same place as your cluster. For example, https://artifactswestus.blob.core.windows.net/r, assuming your cluster is in West US.

  2. Alter the cluster’s callout policy to allow access to that location.

    • This change requires AllDatabasesAdmin permissions.

    • For example, to enable access to a blob located in https://artifactswestus.blob.core.windows.net/r, run the following command:

    .alter-merge cluster policy callout @'[ { "CalloutType": "sandbox_artifacts", "CalloutUriRegex": "artifactswestus\\.blob\\.core\\.windows\\.net/r/","CanCall": true } ]'
    

Install packages

The example snips below assume local R machine on Windows environment.

  1. Verify you’re using the appropriate R version – current R Sandbox version is 3.4.4:

    > R.Version()["version.string"]
    
    $version.string
    [1] "R version 3.4.4 (2018-03-15)"
    

    If needed you can download it from here.

  2. Launch the x64 RGui

  3. Create a new empty folder to be populated with all the relevant packages you would like to install. In this example we install the brglm2 package, so creating “C:\brglm2”.

  4. Add the newly created folder path to lib paths:

    > .libPaths("C://brglm2")
    
  5. Verify that the new folder is now the first path in .libPaths():

    > .libPaths()
    
    [1] "C:/brglm2"    "C:/Program Files/R/R-3.4.4/library"
    
  6. Once this setup is done, any package that we install shall be added to this new folder. Let’s install the requested package and its dependencies:

    > install.packages("brglm2")
    

    In case the question “Do you want to install from sources the packages which need compilation?” pops up, answer “Y”.

  7. Verify that new folders were added to “C:\brglm2”:

    Screenshot of library directory content.

  8. Select all items in that folder and zip them to e.g. libs.zip (do not zip the parent folder). You should get an archive structure like this:

    libs.zip:

    • brglm2 (folder)
    • enrichwith (folder)
    • numDeriv (folder)
  9. Upload libs.zip to the blob container that was set above

  10. Call the r plugin.

    • Specify the external_artifacts parameter with a property bag of name and reference to the ZIP file (the blob’s URL, including a SAS token).
    • In your inline r code, import zipfile from sandboxutils and call its install() method with the name of the ZIP file.

Example

Install the brglm2 package:

print x=1
| evaluate r(typeof(*, ver:string),
    'library(sandboxutils)\n'
    'zipfile.install("brglm2.zip")\n'
    'library("brglm2")\n'
    'result <- df\n'
    'result$ver <-packageVersion("brglm2")\n'
    ,external_artifacts=bag_pack(brglm2.zip', 'https://artifactswestus.blob.core.windows.net/r/libs.zip?*** REPLACE WITH YOUR SAS TOKEN ***'))
xver
11.8.2

Make sure that the archive’s name (first value in pack pair) has the *.zip suffix to prevent collisions when unzipping folders whose name is identical to the archive name.

4 - Machine learning plugins

4.1 - autocluster plugin

Learn how to use the autocluster plugin to find common patterns in data.

autocluster finds common patterns of discrete attributes (dimensions) in the data. It then reduces the results of the original query, whether it’s 100 or 100,000 rows, to a few patterns. The plugin was developed to help analyze failures (such as exceptions or crashes) but can potentially work on any filtered dataset. The plugin is invoked with the evaluate operator.

Syntax

T | evaluate autocluster ([SizeWeight [, WeightColumn [, NumSeeds [, CustomWildcard [, … ]]]]])

Parameters

The parameters must be ordered as specified in the syntax. To indicate that the default value should be used, put the string tilde value ~. For more information, see Examples.

NameTypeRequiredDescription
Tstring✔️The input tabular expression.
SizeWeightdoubleA double between 0 and 1 that controls the balance between generic (high coverage) and informative (many shared) values. Increasing this value typically reduces the quantity of patterns while expanding coverage. Conversely, decreasing this value generates more specific patterns characterized by increased shared values and a smaller percentage coverage. The default is 0.5. The formula is a weighted geometric mean with weights SizeWeight and 1-SizeWeight.
WeightColumnstringConsiders each row in the input according to the specified weight. Each row has a default weight of 1. The argument must be a name of a numeric integer column. A common usage of a weight column is to take into account sampling or bucketing or aggregation of the data that is already embedded into each row.
NumSeedsintDetermines the number of initial local search points. Adjusting the number of seeds impacts result quantity or quality based on data structure. Increasing seeds can enhance results but with a slower query tradeoff. Decreasing below five yields negligible improvements, while increasing above 50 rarely generates more patterns. The default is 25.
CustomWildcardstringA type literal that sets the wildcard value for a specific type in the results table, indicating no restriction on this column. The default is null, which represents an empty string. If the default is a good value in the data, a different wildcard value should be used, such as *. You can include multiple custom wildcards by adding them consecutively.

Returns

The autocluster plugin usually returns a small set of patterns. The patterns capture portions of the data with shared common values across multiple discrete attributes. Each pattern in the results is represented by a row.

The first column is the segment ID. The next two columns are the count and percentage of rows out of the original query that are captured by the pattern. The remaining columns are from the original query. Their value is either a specific value from the column, or a wildcard value (which are by default null) meaning variable values.

The patterns aren’t distinct, may be overlapping, and usually don’t cover all the original rows. Some rows may not fall under any pattern.

Examples

Using evaluate

T | evaluate autocluster()

Using autocluster

StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State , EventType , Damage
| evaluate autocluster(0.6)

Output

SegmentIdCountPercentStateEventTypeDamage
0227838.7HailNO
15128.7Thunderstorm WindYES
289815.3TEXAS

Using custom wildcards

StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State , EventType , Damage
| evaluate autocluster(0.2, '~', '~', '*')

Output

SegmentIdCountPercentStateEventTypeDamage
0227838.7*HailNO
15128.7*Thunderstorm WindYES
289815.3TEXAS**

4.2 - basket plugin

Learn how to use the basket plugin to find frequent patterns in data that exceed a frequency threshold.

The basket plugin finds frequent patterns of attributes in the data and returns the patterns that pass a frequency threshold in that data. A pattern represents a subset of the rows that have the same value across one or more columns. The basket plugin is based on the Apriori algorithm originally developed for basket analysis data mining.

Syntax

T | evaluate basket ([ Threshold, WeightColumn, MaxDimensions, CustomWildcard, CustomWildcard, … ])

Parameters

NameTypeRequiredDescription
ThresholdlongA double in the range of 0.015 to 1 that sets the minimal ratio of the rows to be considered frequent. Patterns with a smaller ratio won’t be returned. The default value is 0.05. To use the default value, input the tilde: ~.

Example: `T
WeightColumnstringThe column name to use to consider each row in the input according to the specified weight. Must be a name of a numeric type column, such as int, long, real. By default, each row has a weight of 1. To use the default value, input the tilde: ~. A common use of a weight column is to take into account sampling or bucketing/aggregation of the data that is already embedded into each row.

Example: `T
MaxDimensionsintSets the maximal number of uncorrelated dimensions per basket, limited by default, to minimize the query runtime. The default is 5. To use the default value, input the tilde: ~.

Example: `T
CustomWildcardstringSets the wildcard value for a specific type in the result table that will indicate that the current pattern doesn’t have a restriction on this column. The default is null except for string columns whose default value is an empty string. If the default is a good value in the data, a different wildcard value should be used, such as *. To use the default value, input the tilde: ~.

Example: `T

Returns

The basket plugin returns frequent patterns that pass a ratio threshold. The default threshold is 0.05.

Each pattern is represented by a row in the results. The first column is the segment ID. The next two columns are the count and percentage of rows, from the original query that match the pattern. The remaining columns relate to the original query, with either a specific value from the column or a wildcard value, which is by default null, meaning a variable value.

Example

StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State, EventType, Damage, DamageCrops
| evaluate basket(0.2)

Output

SegmentIdCountPercentStateEventTypeDamageDamageCrops
0457477.7NO0
1227838.7HailNO0
2567596.40
3237140.3Hail0
4127921.7Thunderstorm Wind0
5246841.9Hail
6131022.3YES
7129121.9Thunderstorm Wind

Example with custom wildcards

StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State, EventType, Damage, DamageCrops
| evaluate basket(0.2, '~', '~', '*', int(-1))

Output

SegmentIdCountPercentStateEventTypeDamageDamageCrops
0457477.7**NO0
1227838.7*HailNO0
2567596.4***0
3237140.3*Hail*0
4127921.7*Thunderstorm Wind*0
5246841.9*Hail*-1
6131022.3**YES-1
7129121.9*Thunderstorm Wind*-1

4.3 - diffpatterns plugin

Learn how to use the diffpatterns plugin to compare two datasets of the same structure to find the differences between the two datasets.

Compares two datasets of the same structure and finds patterns of discrete attributes (dimensions) that characterize differences between the two datasets. The plugin is invoked with the evaluate operator.

diffpatterns was developed to help analyze failures (for example, by comparing failures to non-failures in a given time frame), but can potentially find differences between any two datasets of the same structure.

Syntax

T | evaluate diffpatterns(SplitColumn, SplitValueA, SplitValueB [, WeightColumn,Threshold, MaxDimensions, CustomWildcard, …])

Parameters

NameTypeRequiredDescription
SplitColumnstring✔️The column name that tells the algorithm how to split the query into datasets. According to the specified values for the SplitValueA and SplitValueB arguments, the algorithm splits the query into two datasets, “A” and “B”, and analyzes the differences between them. As such, the split column must have at least two distinct values.
SplitValueAstring✔️A string representation of one of the values in the SplitColumn that was specified. All the rows that have this value in their SplitColumn considered as dataset “A”.
SplitValueBstring✔️A string representation of one of the values in the SplitColumn that was specified. All the rows that have this value in their SplitColumn considered as dataset “B”.
WeightColumnstringThe column used to consider each row in the input according to the specified weight. Must be a name of a numeric column, such as int, long, real. By default each row has a weight of ‘1’. To use the default value, input the tilde: ~. A common usage of a weight column is to take into account sampling or bucketing/aggregation of the data that is already embedded into each row.

Example: `T
ThresholdrealA real in the range of 0.015 to 1. This value sets the minimal pattern ratio difference between the two sets. The default is 0.05. To use the default value, input the tilde: ~.

Example: `T
MaxDimensionsintSets the maximum number of uncorrelated dimensions per result pattern. By specifying a limit, you decrease the query runtime. The default is unlimited. To use the default value, input the tilde: ~.

Example: `T
CustomWildcardstringSets the wildcard value for a specific type in the result table that will indicate that the current pattern doesn’t have a restriction on this column. The default is null, except for string columns for which the default is an empty string. If the default is a viable value in the data, a different wildcard value should be used. For example, *. To use the default value, input the tilde: ~.

Example: `T

Returns

diffpatterns returns a small set of patterns that capture different portions of the data in the two sets (that is, a pattern capturing a large percentage of the rows in the first dataset and low percentage of the rows in the second set). Each pattern is represented by a row in the results.

The result of diffpatterns returns the following columns:

  • SegmentId: the identity assigned to the pattern in the current query (note: IDs aren’t guaranteed to be the same in repeating queries).

  • CountA: the number of rows captured by the pattern in Set A (Set A is the equivalent of where tostring(splitColumn) == SplitValueA).

  • CountB: the number of rows captured by the pattern in Set B (Set B is the equivalent of where tostring(splitColumn) == SplitValueB).

  • PercentA: the percentage of rows in Set A captured by the pattern (100.0 * CountA / count(SetA)).

  • PercentB: the percentage of rows in Set B captured by the pattern (100.0 * CountB / count(SetB)).

  • PercentDiffAB: the absolute percentage point difference between A and B (|PercentA - PercentB|) is the main measure of significance of patterns in describing the difference between the two sets.

  • Rest of the columns: are the original schema of the input and describe the pattern, each row (pattern) represents the intersection of the non-wildcard values of the columns (equivalent of where col1==val1 and col2==val2 and ... colN=valN for each non-wildcard value in the row).

For each pattern, columns that aren’t set in the pattern (that is, without restriction on a specific value) will contain a wildcard value, which is null by default. See in the Arguments section below how wildcards can be manually changed.

  • Note: the patterns are often not distinct. They may be overlapping, and usually don’t cover all the original rows. Some rows may not fall under any pattern.

Example

StormEvents 
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , 1 , 0)
| project State , EventType , Source , Damage, DamageCrops
| evaluate diffpatterns(Damage, "0", "1" )

Output

SegmentIdCountACountBPercentAPercentBPercentDiffABStateEventTypeSourceDamageCrops
022789349.87.142.7Hail0
177951217.0339.0822.05Thunderstorm Wind
2109811824.019.0115Trained Spotter0
31361582.9712.069.09Newspaper
43592147.8516.348.49Flash Flood
5501221.099.318.22IOWA
665527914.3221.36.98Law Enforcement
71501173.288.935.65Flood
83621767.9113.445.52Emergency Manager

4.4 - diffpatterns_text plugin

Learn how to use the diffpatterns_text plugin to compare two string value datasets to find the differences between the two datasets.

Compares two datasets of string values and finds text patterns that characterize differences between the two datasets. The plugin is invoked with the evaluate operator.

The diffpatterns_text returns a set of text patterns that capture different portions of the data in the two sets. For example, a pattern capturing a large percentage of the rows when the condition is true and low percentage of the rows when the condition is false. The patterns are built from consecutive tokens separated by white space, with a token from the text column or a * representing a wildcard. Each pattern is represented by a row in the results.

Syntax

T | evaluate diffpatterns_text(TextColumn, BooleanCondition [, MinTokens, Threshold , MaxTokens])

Parameters

NameTypeRequiredDescription
TextColumnstring✔️The text column to analyze.
BooleanConditionstring✔️An expression that evaluates to a boolean value. The algorithm splits the query into the two datasets to compare based on this expression.
MinTokensintAn integer value between 0 and 200 that represents the minimal number of non-wildcard tokens per result pattern. The default is 1.
ThresholddecimalA decimal value between 0.015 and 1 that sets the minimal pattern ratio difference between the two sets. Default is 0.05. See diffpatterns.
MaxTokensintAn integer value between 0 and 20 that sets the maximal number of tokens per result pattern, specifying a lower limit decreases the query runtime.

Returns

The result of diffpatterns_text returns the following columns:

  • Count_of_True: The number of rows matching the pattern when the condition is true.
  • Count_of_False: The number of rows matching the pattern when the condition is false.
  • Percent_of_True: The percentage of rows matching the pattern from the rows when the condition is true.
  • Percent_of_False: The percentage of rows matching the pattern from the rows when the condition is false.
  • Pattern: The text pattern containing tokens from the text string and ‘*’ for wildcards.

Example

The following example uses data from the StormEvents table in the help cluster. To access this data, sign in to https://dataexplorer.azure.com/clusters/help/databases/Samples. In the left menu, browse to help > Samples > Tables > Storm_Events.

The examples in this tutorial use the StormEvents table, which is publicly available in the Weather analytics sample data.

StormEvents     
| where EventNarrative != "" and monthofyear(StartTime) > 1 and monthofyear(StartTime) < 9
| where EventType == "Drought" or EventType == "Extreme Cold/Wind Chill"
| evaluate diffpatterns_text(EpisodeNarrative, EventType == "Extreme Cold/Wind Chill", 2)

Output

Count_of_TrueCount_of_FalsePercent_of_TruePercent_of_FalsePattern
1106.290Winds shifting northwest in * wake * a surface trough brought heavy lake effect snowfall downwind * Lake Superior from
905.140Canadian high pressure settled * * region * produced the coldest temperatures since February * 2006. Durations * freezing temperatures
03406.24* * * * * * * * * * * * * * * * * * West Tennessee,
04207.71* * * * * * caused * * * * * * * * across western Colorado. *
04508.26* * below normal *
0110020.18Below normal *

5 - Query connectivity plugins

5.1 - ai_embed_text plugin (Preview)

Learn how to use the ai_embed_text plugin to embed text via language models, enabling various AI-related scenarios such as RAG application and semantic search.

The ai_embed_text plugin allows embedding of text using language models, enabling various AI-related scenarios such as Retrieval Augmented Generation (RAG) applications and semantic search. The plugin supports Azure OpenAI Service embedding models accessed using managed identity.

Prerequisites

Syntax

evaluate ai_embed_text (text, connectionString [, options [, IncludeErrorMessages]])

Parameters

NameTypeRequiredDescription
textstring✔️The text to embed. The value can be a column reference or a constant scalar.
connectionStringstring✔️The connection string for the language model in the format <ModelDeploymentUri>;<AuthenticationMethod>; replace <ModelDeploymentUri> and <AuthenticationMethod> with the AI model deployment URI and the authentication method respectively.
optionsdynamicThe options that control calls to the embedding model endpoint. See Options.
IncludeErrorMessagesboolIndicates whether to output errors in a new column in the output table. Default value: false.

Options

The following table describes the options that control the way the requests are made to the embedding model endpoint.

NameTypeDescription
RecordsPerRequestintSpecifies the number of records to process per request. Default value: 1.
CharsPerRequestintSpecifies the maximum number of characters to process per request. Default value: 0 (unlimited). Azure OpenAI counts tokens, with each token approximately translating to four characters.
RetriesOnThrottlingintSpecifies the number of retry attempts when throttling occurs. Default value: 0.
GlobalTimeouttimespanSpecifies the maximum time to wait for a response from the embedding model. Default value: null
ModelParametersdynamicParameters specific to the embedding model, such as embedding dimensions or user identifiers for monitoring purposes. Default value: null.

Configure managed identity and callout policies

To use the ai_embed_text plugin, you must configure the following policies:

  • managed identity: Allow the system-assigned managed identity to authenticate to Azure OpenAI services.
  • callout: Authorize the AI model endpoint domain.

To configure these policies, use the commands in the following steps:

  1. Configure the managed identity:

    .alter-merge cluster policy managed_identity
    ```
    [
      {
        "ObjectId": "system",
        "AllowedUsages": "AzureAI"
      }
    ]
    ```
    
  2. Configure the callout policy:

    .alter-merge cluster policy callout
    ```
    [
        {
            "CalloutType": "azure_openai",
            "CalloutUriRegex": "https://[A-Za-z0-9\\-]{3,63}\\.openai\\.azure\\.com/.*",
            "CanCall": true
        }
    ]
    ```
    

Returns

Returns the following new embedding columns:

  • A column with the _embedding suffix that contains the embedding values
  • If configured to return errors, a column with the _embedding_error suffix, which contains error strings or is left empty if the operation is successful.

Depending on the input type, the plugin returns different results:

  • Column reference: Returns one or more records with additional columns are prefixed by the reference column name. For example, if the input column is named TextData, the output columns are named TextData_embedding and, if configured to return errors, TextData_embedding_error.
  • Constant scalar: Returns a single record with additional columns that are not prefixed. The column names are _embedding and, if configured to return errors, _embedding_error.

Examples

The following example embeds the text Embed this text using AI using the Azure OpenAI Embedding model.

let expression = 'Embed this text using AI';
let connectionString = 'https://myaccount.openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2024-06-01;managed_identity=system';
evaluate ai_embed_text(expression, connectionString)

The following example embeds multiple texts using the Azure OpenAI Embedding model.

let connectionString = 'https://myaccount.openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2024-06-01;managed_identity=system';
let options = dynamic({
    "RecordsPerRequest": 10,
    "CharsPerRequest": 10000,
    "RetriesOnThrottling": 1,
    "GlobalTimeout": 2m
});
datatable(TextData: string)
[
    "First text to embed",
    "Second text to embed",
    "Third text to embed"
]
| evaluate ai_embed_text(TextData, connectionString, options , true)

Best practices

Azure OpenAI embedding models are subject to heavy throttling, and frequent calls to this plugin can quickly reach throttling limits.

To efficiently use the ai_embed_text plugin while minimizing throttling and costs, follow these best practices:

  • Control request size: Adjust the number of records (RecordsPerRequest) and characters per request (CharsPerRequest).
  • Control query timeout: Set GlobalTimeout to a value lower than the query timeout to ensure progress isn’t lost on successful calls up to that point.
  • Handle rate limits more gracefully: Set retries on throttling (RetriesOnThrottling).

5.2 - azure_digital_twins_query_request plugin

Learn how to use the azure_digital_twins_query_request plugin to run an Azure Digital Twins query as part of a Kusto query.

The azure_digital_twins_query_request plugin runs an Azure Digital Twins query as part of a Kusto Query Language (KQL) query. The plugin is invoked with the evaluate operator.

Using the plugin, you can query across data in both Azure Digital Twins and any data source accessible through KQL. For example, you can perform time series analytics.

For more information about the plugin, see Azure Digital Twins query plugin.

Syntax

evaluate azure_digital_twins_query_request ( AdtInstanceEndpoint , AdtQuery )

Parameters

NameTypeRequiredDescription
AdtInstanceEndpointstring✔️The Azure Digital Twins instance endpoint to be queried.
AdtQuerystring✔️The query to run against the Azure Digital Twins endpoint. This query is written in a custom SQL-like query language for Azure Digital Twins, called the Azure Digital Twins query language. For more information, see Query language for Azure Digital Twins.

Authentication and authorization

The azure_digital_twins_query_request plugin uses the Microsoft Entra account of the user running the query to authenticate. To run a query, a user must at least be granted the Azure Digital Twins Data Reader role. Information on how to assign this role can be found in Security for Azure Digital Twins solutions.

Examples

The following examples show how you can run various Azure Digital Twins queries, including queries that use additional Kusto expressions.

Retrieval of all twins within an Azure Digital Twins instance

The following example returns all digital twins within an Azure Digital Twins instance.

evaluate azure_digital_twins_query_request(
  'https://contoso.api.wcus.digitaltwins.azure.net',
  'SELECT T AS Twins FROM DIGITALTWINS T')

Screenshot of the twins present in the Azure Digital Twins instance.

Projection of twin properties as columns along with additional Kusto expressions

The following example returns the result from the plugin as separate columns, and then performs additional operations using Kusto expressions.

evaluate azure_digital_twins_query_request(
  'https://contoso.api.wcus.digitaltwins.azure.net',
  'SELECT T.Temperature, T.Humidity FROM DIGITALTWINS T WHERE IS_PRIMITIVE(T.Temperature) AND IS_PRIMITIVE(T.Humidity)')
| where Temperature > 20
| project TemperatureInC = Temperature, Humidity

Output

TemperatureInCHumidity
2148
4934
8032

Perform time series analytics

You can use the data history integration feature of Azure Digital Twins to historize digital twin property updates. To learn how to view the historized twin updates, see View the historized twin updates

5.3 - cosmosdb_sql_request plugin

Learn how to use the cosmosdb_sql_request plugin to send a SQL query to an Azure Cosmos DB SQL network endpoint to query small datasets.

The cosmosdb_sql_request plugin sends a SQL query to an Azure Cosmos DB SQL network endpoint and returns the results of the query. This plugin is primarily designed for querying small datasets, for example, enriching data with reference data stored in Azure Cosmos DB. The plugin is invoked with the evaluate operator.

Syntax

evaluate cosmosdb_sql_request ( ConnectionString , SqlQuery [, SqlParameters [, Options]] ) [: OutputSchema]

Parameters

NameTypeRequiredDescription
ConnectionStringstring✔️The connection string that points to the Azure Cosmos DB collection to query. It must include AccountEndpoint, Database, and Collection. It might include AccountKey if a master key is used for authentication. For more information, see Authentication and authorization.
Example: 'AccountEndpoint=https://cosmosdbacc.documents.azure.com/;Database=<MyDatabase>;Collection=<MyCollection>;AccountKey='h'<AccountKey>'
SqlQuerystring✔️The query to execute.
SqlParametersdynamicThe property bag object to pass as parameters along with the query. Parameter names must begin with @.
OutputSchemaThe names and types of the expected columns of the cosmosdb_sql_request plugin output. Use the following syntax: ( ColumnName : ColumnType [, …] ). Specifying this parameter enables multiple query optimizations.
OptionsdynamicA property bag object of advanced settings. If an AccountKey isn’t provided in the ConnectionString, then the armResourceId field of this parameter is required. For more information, see Supported options.

Supported options

The following table describes the supported fields of the Options parameter.

NameTypeDescription
armResourceIdstringThe Azure Resource Manager resource ID of the Cosmos DB database. If an account key isn’t provided in the connection string argument, this field is required. In such a case, the armResourceId is used to authenticate to Cosmos DB.
Example: armResourceId='/subscriptions/<SubscriptionId>/resourceGroups/<ResourceGroup>/providers/Microsoft.DocumentDb/databaseAccounts/<DatabaseAccount>'
tokenstringA Microsoft Entra access token of a principal with access to the Cosmos DB database. This token is used along with the armResourceId to authenticate with the Azure Resource Manager. If unspecified, the token of the principal that made the query is used.

If armResourceId isn’t specified, the token is used directly to access the Cosmos DB database. For more information about the token authentication method, see Authentication and authorization.
preferredLocationsstringThe region from which to query the data.
Example: ['East US']

Authentication and authorization

To authorize to an Azure Cosmos DB SQL network endpoint, you need to specify the authorization information. The following table provides the supported authentication methods and the description for how to use that method.

Authentication methodDescription
Managed identity (Recommended)Append Authentication="Active Directory Managed Identity";User Id={object_id}; to the connection string. The request is made on behalf of a managed identity which must have the appropriate permissions to the database.
To enable managed identity authentication, you must add the managed identity to your cluster and alter the managed identity policy. For more information, see Managed Identity policy.
Azure Resource Manager resource IDThis authentication method requires specifying the armResourceId and optionally the token in the options. The armResourceId identifies the Cosmos DB database account, and the token must be a valid Microsoft Entra bearer token for a principal with access permissions to the Cosmos DB database. If no token is provided, the Microsoft Entra token of the requesting principal will be used for authentication.
Account keyYou can add the account key directly to the ConnectionString argument. However, this approach is less secure as it involves including the secret in the query text, and is less resilient to future changes in the account key. To enhance security, hide the secret as an obfuscated string literal.
TokenYou can add a token value in the plugin options. The token must belong to a principal with relevant permissions. To enhance security, hide the token as an obfuscated string literal.

Set callout policy

The plugin makes callouts to the Azure Cosmos DB instance. Make sure that the cluster’s callout policy enables calls of type cosmosdb to the target CosmosDbUri.

The following example shows how to define the callout policy for Azure Cosmos DB. It’s recommended to restrict it to specific endpoints (my_endpoint1, my_endpoint2).

[
  {
    "CalloutType": "CosmosDB",
    "CalloutUriRegex": "my_endpoint1\\.documents\\.azure\\.com",
    "CanCall": true
  },
  {
    "CalloutType": "CosmosDB",
    "CalloutUriRegex": "my_endpoint2\\.documents\\.azure\\.com",
    "CanCall": true
  }
]

The following example shows an alter callout policy command for cosmosdb CalloutType

.alter cluster policy callout @'[{"CalloutType": "cosmosdb", "CalloutUriRegex": "\\.documents\\.azure\\.com", "CanCall": true}]'

Examples

The following examples use placeholder text, in brackets.

Query Azure Cosmos DB with a query-defined output schema

The following example uses the cosmosdb_sql_request plugin to send a SQL query while selecting only specific columns. This query uses explicit schema definitions that allow various optimizations before the actual query is run against Cosmos DB.

evaluate cosmosdb_sql_request(
  'AccountEndpoint=https://cosmosdbacc.documents.azure.com/;Database=<MyDatabase>;Collection=<MyCollection>;AccountKey='h'<AccountKey>',
  'SELECT c.Id, c.Name from c') : (Id:long, Name:string) 

Query Azure Cosmos DB

The following example uses the cosmosdb_sql_request plugin to send a SQL query to fetch data from Azure Cosmos DB using its Azure Cosmos DB for NoSQL.

evaluate cosmosdb_sql_request(
  'AccountEndpoint=https://cosmosdbacc.documents.azure.com/;Database=<MyDatabase>;Collection=<MyCollection>;AccountKey='h'<AccountKey>',
  'SELECT * from c') // OutputSchema is unknown, so it is not specified. This may harm the performance of the query.

Query Azure Cosmos DB with parameters

The following example uses SQL query parameters and queries the data from an alternate region. For more information, see preferredLocations.

evaluate cosmosdb_sql_request(
   'AccountEndpoint=https://cosmosdbacc.documents.azure.com/;Database=<MyDatabase>;Collection=<MyCollection>;AccountKey='h'<AccountKey>',
    "SELECT c.id, c.lastName, @param0 as Column0 FROM c WHERE c.dob >= '1970-01-01T00:00:00Z'",
    dynamic({'@param0': datetime(2019-04-16 16:47:26.7423305)}),
    dynamic({'preferredLocations': ['East US']})) : (Id:long, Name:string, Column0: datetime) 
| where lastName == 'Smith'

Query Azure Cosmos DB and join data with a database table

The following example joins partner data from an Azure Cosmos DB with partner data in a database using the Partner field. It results in a list of partners with their phone numbers, website, and contact email address sorted by partner name.

evaluate cosmosdb_sql_request(
    'AccountEndpoint=https://cosmosdbacc.documents.azure.com/;Database=<MyDatabase>;Collection=<MyCollection>;AccountKey='h'<AccountKey>',
    "SELECT c.id, c.Partner, c. phoneNumber FROM c') : (Id:long, Partner:string, phoneNumber:string) 
| join kind=innerunique Partner on Partner
| project id, Partner, phoneNumber, website, Contact
| sort by Partner

Query Azure Cosmos DB using token authentication

The following example joins partner data from an Azure Cosmos DB with partner data in a database using the Partner field. It results in a list of partners with their phone numbers, website, and contact email address sorted by partner name.

evaluate cosmosdb_sql_request(
    'AccountEndpoint=https://cosmosdbacc.documents.azure.com/;Database=<MyDatabase>;Collection=<MyCollection>;',
    "SELECT c.Id, c.Name, c.City FROM c",
    dynamic(null),
    dynamic({'token': h'abc123...'})
) : (Id:long, Name:string, City:string)

Query Azure Cosmos DB using Azure Resource Manager resource ID for authentication

The following example uses the Azure Resource Manager resource ID for authentication and the Microsoft Entra token of the requesting principal, since a token isn’t specified. It sends a SQL query while selecting only specific columns and specifies explicit schema definitions.

evaluate cosmosdb_sql_request(
    'AccountEndpoint=https://cosmosdbacc.documents.azure.com/;Database=<MyDatabase>;Collection=<MyCollection>;',
    "SELECT c.Id, c.Name, c.City FROM c",
    dynamic({'armResourceId': '/subscriptions/<SubscriptionId>/resourceGroups/<ResourceGroup>/providers/Microsoft.DocumentDb/databaseAccounts/<DatabaseAccount>'})
) : (Id:long, Name:string, City:string)

5.4 - http_request plugin

Learn how to use the http_request plugin to send an HTTP request and convert the response into a table.

services: data-explorer

http_request plugin

The http_request plugin sends an HTTP GET request and converts the response into a table.

Prerequisites

Syntax

evaluate http_request ( Uri [, RequestHeaders [, Options]] )

Parameters

NameTypeRequiredDescription
Uristring✔️The destination URI for the HTTP or HTTPS request.
RequestHeadersdynamicA property bag containing HTTP headers to send with the request.
OptionsdynamicA property bag containing additional properties of the request.

Authentication and authorization

To authenticate, use the HTTP standard Authorization header or any custom header supported by the web service.

Returns

The plugin returns a table that has a single record with the following dynamic columns:

  • ResponseHeaders: A property bag with the response header.
  • ResponseBody: The response body parsed as a value of type dynamic.

If the HTTP response indicates (via the Content-Type response header) that the media type is application/json, the response body is automatically parsed as-if it’s a JSON object. Otherwise, it’s returned as-is.

Headers

The RequestHeaders argument can be used to add custom headers to the outgoing HTTP request. In addition to the standard HTTP request headers and the user-provided custom headers, the plugin also adds the following custom headers:

NameDescription
x-ms-client-request-idA correlation ID that identifies the request. Multiple invocations of the plugin in the same query will all have the same ID.
x-ms-readonlyA flag indicating that the processor of this request shouldn’t make any persistent changes.

Example

The following example retrieves Azure retails prices for Azure Purview in west Europe:

let Uri = "https://prices.azure.com/api/retail/prices?$filter=serviceName eq 'Azure Purview' and location eq 'EU West'";
evaluate http_request(Uri)
| project ResponseBody.Items
| mv-expand ResponseBody_Items
| evaluate bag_unpack(ResponseBody_Items)

Output

armRegionNamearmSkuNamecurrencyCodeeffectiveStartDateisPrimaryMeterRegionlocationmeterIdmeterNameproductIdproductNameretailPriceserviceFamilyserviceIdserviceNameskuIdskuNametierMinimumUnitstypeunitOfMeasureunitPrice
westeuropeData InsightsUSD2022-06-01T00:00:00ZfalseEU West8ce915f7-20db-564d-8cc3-5702a7c952abData Insights Report ConsumptionDZH318Z08M22Azure Purview Data Map0.21AnalyticsDZH318Q66D0FAzure PurviewDZH318Z08M22/006CCatalog Insights0Consumption1 API Calls0.21
westeuropeData Map Enrichment - Data Insights GenerationUSD2022-06-01T00:00:00ZfalseEU West7ce2db1d-59a0-5193-8a57-0431a10622b6Data Map Enrichment - Data Insights Generation vCoreDZH318Z08M22Azure Purview Data Map0.82AnalyticsDZH318Q66D0FAzure PurviewDZH318Z08M22/005CData Map Enrichment - Insight Generation0Consumption1 Hour0.82
westeuropeUSD2021-09-28T00:00:00ZfalseEU West053e2dcb-82c0-5e50-86cd-1f1c8d803705Power BI vCoreDZH318Z08M23Azure Purview Scanning Ingestion and Classification0AnalyticsDZH318Q66D0FAzure PurviewDZH318Z08M23/0005Power BI0Consumption1 Hour0
westeuropeUSD2021-09-28T00:00:00ZfalseEU Westa7f57f26-5f31-51e5-a5ed-ffc2b0da37b9Resource Set vCoreDZH318Z08M22Azure Purview Data Map0.21AnalyticsDZH318Q66D0FAzure PurviewDZH318Z08M22/000XResource Set0Consumption1 Hour0.21
westeuropeUSD2021-09-28T00:00:00ZfalseEU West5d157295-441c-5ea7-ba7c-5083026dc456SQL Server vCoreDZH318Z08M23Azure Purview Scanning Ingestion and Classification0AnalyticsDZH318Q66D0FAzure PurviewDZH318Z08M23/000FSQL Server0Consumption1 Hour0
westeuropeUSD2021-09-28T00:00:00ZfalseEU West0745df0d-ce4f-52db-ac31-ac574d4dcfe5Standard Capacity UnitDZH318Z08M22Azure Purview Data Map0.411AnalyticsDZH318Q66D0FAzure PurviewDZH318Z08M22/0002Standard0Consumption1 Hour0.411
westeuropeUSD2021-09-28T00:00:00ZfalseEU West811e3118-5380-5ee8-a5d9-01d48d0a0627Standard vCoreDZH318Z08M23Azure Purview Scanning Ingestion and Classification0.63AnalyticsDZH318Q66D0FAzure PurviewDZH318Z08M23/0009Standard0Consumption1 Hour0.63

5.5 - http_request_post plugin

Learn how to use the http_request_post plugin to send an HTTP request and convert the response into a table.

services: data-explorer

http_request_post plugin

The http_request_post plugin sends an HTTP POST request and converts the response into a table.

Prerequisites

Syntax

evaluate http_request_post ( Uri [, RequestHeaders [, Options [, Content]]] )

Parameters

NameTypeRequiredDescription
Uristring✔️The destination URI for the HTTP or HTTPS request.
RequestHeadersdynamicA property bag containing HTTP headers to send with the request.
OptionsdynamicA property bag containing additional properties of the request.
ContentstringThe body content to send with the request. The content is encoded in UTF-8 and the media type for the Content-Type attribute is application/json.

Authentication and authorization

To authenticate, use the HTTP standard Authorization header or any custom header supported by the web service.

Returns

The plugin returns a table that has a single record with the following dynamic columns:

  • ResponseHeaders: A property bag with the response header.
  • ResponseBody: The response body parsed as a value of type dynamic.

If the HTTP response indicates (via the Content-Type response header) that the media type is application/json, the response body is automatically parsed as-if it’s a JSON object. Otherwise, it’s returned as-is.

Headers

The RequestHeaders argument can be used to add custom headers to the outgoing HTTP request. In addition to the standard HTTP request headers and the user-provided custom headers, the plugin also adds the following custom headers:

NameDescription
x-ms-client-request-idA correlation ID that identifies the request. Multiple invocations of the plugin in the same query will all have the same ID.
x-ms-readonlyA flag indicating that the processor of this request shouldn’t make any persistent changes.

Example

The following example is for a hypothetical HTTPS web service that accepts additional request headers and must be authenticated to using Microsoft Entra ID:

let uri='https://example.com/node/js/on/eniac';
let headers=dynamic({'x-ms-correlation-vector':'abc.0.1.0', 'authorization':'bearer ...Azure-AD-bearer-token-for-target-endpoint...'});
evaluate http_request_post(uri, headers)

5.6 - mysql_request plugin

Learn how to use the mysql_request plugin to send a SQL query to a MySQL server network endpoint.

The mysql_request plugin sends a SQL query to an Azure MySQL Server network endpoint and returns the first rowset in the results. The query may return more than one rowset, but only the first rowset is made available for the rest of the Kusto query.

The plugin is invoked with the evaluate operator.

Syntax

evaluate mysql_request ( ConnectionString , SqlQuery [, SqlParameters] ) [: OutputSchema]

Parameters

NameTypeRequiredDescription
ConnectionStringstring✔️The connection string that points at the MySQL Server network endpoint. See authentication and how to specify the network endpoint.
SqlQuerystring✔️The query that is to be executed against the SQL endpoint. Must return one or more row sets. Only the first set is made available for the rest of the query.
SqlParametersdynamicA property bag object that holds key-value pairs to pass as parameters along with the query.
OutputSchemaThe names and types for the expected columns of the mysql_request plugin output.

Syntax: ( ColumnName : ColumnType [, …] )

Authentication and authorization

To authorize to a MySQL Server network endpoint, you need to specify the authorization information in the connection string. The supported authorization method is via username and password.

Set callout policy

The plugin makes callouts to the MySql database. Make sure that the cluster’s callout policy enables calls of type mysql to the target MySqlDbUri.

The following example shows how to define the callout policy for MySQL databases. We recommend restricting the callout policy to specific endpoints (my_endpoint1, my_endpoint2).

[
  {
    "CalloutType": "mysql",
    "CalloutUriRegex": "my_endpoint1\\.mysql\\.database\\.azure\\.com",
    "CanCall": true
  },
  {
    "CalloutType": "mysql",
    "CalloutUriRegex": "my_endpoint2\\.mysql\\.database\\.azure\\.com",
    "CanCall": true
  }
]

The following example shows an .alter callout policy command for mysql CalloutType:

.alter cluster policy callout @'[{"CalloutType": "mysql", "CalloutUriRegex": "\\.mysql\\.database\\.azure\\.com", "CanCall": true}]'

Username and password authentication

The mysql_request plugin only supports username and password authentication to the MySQL server endpoint and doesn’t integrate with Microsoft Entra authentication.

The username and password are provided as part of the connections string using the following parameters:

User ID=...; Password=...;

Encryption and server validation

For security, SslMode is unconditionally set to Required when connecting to a MySQL server network endpoint. As a result, the server must be configured with a valid SSL/TLS server certificate.

Specify the network endpoint

Specify the MySQL network endpoint as part of the connection string.

Syntax:

Server = FQDN [Port = Port]

Where:

  • FQDN is the fully qualified domain name of the endpoint.
  • Port is the TCP port of the endpoint. By default, 3306 is assumed.

Examples

SQL query to Azure MySQL DB

The following example sends a SQL query to an Azure MySQL database. It retrieves all records from [dbo].[Table], and then processes the results.

evaluate mysql_request(
    'Server=contoso.mysql.database.azure.com; Port = 3306;'
    'Database=Fabrikam;'
    h'UID=USERNAME;'
    h'Pwd=PASSWORD;',
    'select * from `dbo`.`Table`') : (Id: int, Name: string)
| where Id > 0
| project Name

SQL query to an Azure MySQL database with modifications

The following example sends a SQL query to an Azure MySQL database retrieving all records from [dbo].[Table], while appending another datetime column, and then processes the results on the Kusto side. It specifies a SQL parameter (@param0) to be used in the SQL query.

evaluate mysql_request(
    'Server=contoso.mysql.database.azure.com; Port = 3306;'
    'Database=Fabrikam;'
    h'UID=USERNAME;'
    h'Pwd=PASSWORD;',
    'select *, @param0 as dt from `dbo`.`Table`',
    dynamic({'param0': datetime(2020-01-01 16:47:26.7423305)})) : (Id:long, Name:string, dt: datetime)
| where Id > 0
| project Name

SQL query to an Azure MySQL database without a query-defined output schema

The following example sends a SQL query to an Azure MySQL database without an output schema. This is not recommended unless the schema is unknown, as it may impact the performance of the query.

evaluate mysql_request(
    'Server=contoso.mysql.database.azure.com; Port = 3306;'
    'Database=Fabrikam;'
    h'UID=USERNAME;'
    h'Pwd=PASSWORD;',
    'select * from `dbo`.`Table`')
| where Id > 0
| project Name

5.7 - postgresql_request plugin

Learn how to use the postgresql_request plugin to send a SQL query to a PostgreSQL server network endpoint.

The postgresql_request plugin sends a SQL query to an Azure PostgreSQL Server network endpoint and returns the first rowset in the results. The query may return more than one rowset, but only the first rowset is made available for the rest of the Kusto query.

The plugin is invoked with the evaluate operator.

Syntax

evaluate postgresql_request ( ConnectionString , SqlQuery [, SqlParameters] ) [: OutputSchema]

Parameters

NameTypeRequiredDescription
ConnectionStringstring✔️The connection string that points at the PostgreSQL Server network endpoint. See authentication and how to specify the network endpoint.
SqlQuerystring✔️The query that is to be executed against the SQL endpoint. Must return one or more row sets. Only the first set is made available for the rest of the query.
SqlParametersdynamicA property bag object that holds key-value pairs to pass as parameters along with the query.
OutputSchemaThe names and types for the expected columns of the postgresql_request plugin output.

Syntax: ( ColumnName : ColumnType [, …] )

Authentication and authorization

To authorize a PostgreSQL Server network endpoint, you must specify the authorization information in the connection string. The supported authorization method is via username and password.

Set callout policy

The plugin makes callouts to the PostgreSQL database. Make sure that the cluster’s callout policy enables calls of type postgresql to the target PostgreSqlDbUri.

The following example shows how to define the callout policy for PostgreSQL databases. We recommend restricting the callout policy to specific endpoints (my_endpoint1, my_endpoint2).

[
  {
    "CalloutType": "postgresql",
    "CalloutUriRegex": "my_endpoint1\\.postgres\\.database\\.azure\\.com",
    "CanCall": true
  },
  {
    "CalloutType": "postgresql",
    "CalloutUriRegex": "my_endpoint2\\.postgres\\.database\\.azure\\.com",
    "CanCall": true
  }
]

The following example shows a .alter callout policy command for postgresql CalloutType:

.alter cluster policy callout @'[{"CalloutType": "postgresql", "CalloutUriRegex": "\\.postgresql\\.database\\.azure\\.com", "CanCall": true}]'

Username and password authentication

The postgresql_request plugin only supports username and password authentication to the PostgreSQL server endpoint and doesn’t integrate with Microsoft Entra authentication.

The username and password are provided as part of the connections string using the following parameters:

User ID=...; Password=...;

Encryption and server validation

For security, SslMode is unconditionally set to Required when connecting to a PostgreSQL server network endpoint. As a result, the server must be configured with a valid SSL/TLS server certificate.

Specify the network endpoint

Specify the PostgreSQL network endpoint as part of the connection string.

Syntax:

Host = FQDN [Port = Port]

Where:

  • FQDN is the fully qualified domain name of the endpoint.
  • Port is the TCP port of the endpoint.

Examples

SQL query to Azure PostgreSQL DB

The following example sends a SQL query to an Azure PostgreSQL database. It retrieves all records from public."Table", and then processes the results.

evaluate postgresql_request(
    'Host=contoso.postgres.database.azure.com; Port = 5432;'
    'Database=Fabrikam;'
    h'User Id=USERNAME;'
    h'Password=PASSWORD;',
    'select * from public."Table"') : (Id: int, Name: string)
| where Id > 0
| project Name

SQL query to an Azure PostgreSQL database with modifications

The following example sends a SQL query to an Azure PostgreSQL database retrieving all records from public."Table", while appending another datetime column, and then processes the results. It specifies a SQL parameter (@param0) to be used in the SQL query.

evaluate postgresql_request(
    'Server=contoso.postgres.database.azure.com; Port = 5432;'
    'Database=Fabrikam;'
    h'User Id=USERNAME;'
    h'Password=PASSWORD;',
    'select *, @param0 as dt from public."Table"',
    dynamic({'param0': datetime(2020-01-01 16:47:26.7423305)})) : (Id: int, Name: string, dt: datetime)
| where Id > 0
| project Name

SQL query to an Azure PostgreSQL database without a query-defined output schema

The following example sends a SQL query to an Azure PostgreSQL database without an output schema. This is not recommended unless the schema is unknown, as it may impact the performance of the query

evaluate postgresql_request(
    'Host=contoso.postgres.database.azure.com; Port = 5432;'
    'Database=Fabrikam;'
    h'User Id=USERNAME;'
    h'Password=PASSWORD;',
    'select * from public."Table"')
| where Id > 0
| project Name

5.8 - sql_request plugin

Learn how to use the sql_request plugin to send an SQL query to an SQL server network endpoint.

The sql_request plugin sends a SQL query to an Azure SQL Server network endpoint and returns the results. If more than one rowset is returned by SQL, only the first one is used. The plugin is invoked with the evaluate operator.

Syntax

evaluate sql_request ( ConnectionString , SqlQuery [, SqlParameters [, Options]] ) [: OutputSchema]

Parameters

NameTypeRequiredDescription
ConnectionStringstring✔️The connection string that points at the SQL Server network endpoint. See valid methods of authentication and how to specify the network endpoint.
SqlQuerystring✔️The query that is to be executed against the SQL endpoint. The query must return one or more row sets, but only the first one is made available for the rest of the Kusto query.
SqlParametersdynamicA property bag of key-value pairs to pass as parameters along with the query.
OptionsdynamicA property bag of key-value pairs to pass more advanced settings along with the query. Currently, only token can be set, to pass a caller-provided Microsoft Entra access token that is forwarded to the SQL endpoint for authentication.
OutputSchemastringThe names and types for the expected columns of the sql_request plugin output. Use the following syntax: ( ColumnName : ColumnType [, …] ).

Authentication and authorization

The sql_request plugin supports the following three methods of authentication to the SQL Server endpoint.

|Authentication method|Syntax|How|Description| |–|–|–| |Microsoft Entra integrated|Authentication="Active Directory Integrated"|Add to the ConnectionString parameter.| The user or application authenticates via Microsoft Entra ID to your cluster, and the same token is used to access the SQL Server network endpoint.
The principal must have the appropriate permissions on the SQL resource to perform the requested action. For example, to read from the database the principal needs table SELECT permissions, and to write to an existing table the principal needs UPDATE and INSERT permissions. To write to a new table, CREATE permissions are also required.| |Managed identity|Authentication="Active Directory Managed Identity";User Id={object_id}|Add to the ConnectionString parameter.| The request is executed on behalf of a managed identity. The managed identity must have the appropriate permissions on the SQL resource to perform the requested action.
To enable managed identity authentication, you must add the managed identity to your cluster and alter the managed identity policy. For more information, see Managed Identity policy. | |Username and password|User ID=...; Password=...;|Add to the ConnectionString parameter.|When possible, avoid this method as it may be less secure.| |Microsoft Entra access token|dynamic({'token': h"eyJ0..."})|Add in the Options parameter.|The access token is passed as token property in the Options argument of the plugin.|

Examples

Send a SQL query using Microsoft Entra integrated authentication

The following example sends a SQL query to an Azure SQL DB database. It retrieves all records from [dbo].[Table], and then processes the results on the Kusto side. Authentication reuses the calling user’s Microsoft Entra token.

evaluate sql_request(
  'Server=tcp:contoso.database.windows.net,1433;'
    'Authentication="Active Directory Integrated";'
    'Initial Catalog=Fabrikam;',
  'select * from [dbo].[Table]') : (Id:long, Name:string)
| where Id > 0
| project Name

Send a SQL query using Username/Password authentication

The following example is identical to the previous one, except that SQL authentication is done by username/password. For confidentiality, we use obfuscated strings here.

evaluate sql_request(
  'Server=tcp:contoso.database.windows.net,1433;'
    'Initial Catalog=Fabrikam;'
    h'User ID=USERNAME;'
    h'Password=PASSWORD;',
  'select * from [dbo].[Table]') : (Id:long, Name:string)
| where Id > 0
| project Name

Send a SQL query using a Microsoft Entra access token

The following example sends a SQL query to an Azure SQL database retrieving all records from [dbo].[Table], while appending another datetime column, and then processes the results on the Kusto side. It specifies a SQL parameter (@param0) to be used in the SQL query.

evaluate sql_request(
  'Server=tcp:contoso.database.windows.net,1433;'
    'Authentication="Active Directory Integrated";'
    'Initial Catalog=Fabrikam;',
  'select *, @param0 as dt from [dbo].[Table]',
  dynamic({'param0': datetime(2020-01-01 16:47:26.7423305)})) : (Id:long, Name:string, dt: datetime)
| where Id > 0
| project Name

Send a SQL query without a query-defined output schema

The following example sends a SQL query to an Azure SQL database without an output schema. This is not recommended unless the schema is unknown, as it may impact the performance of the query

evaluate sql_request(
  'Server=tcp:contoso.database.windows.net,1433;'
    'Initial Catalog=Fabrikam;'
    h'User ID=USERNAME;'
    h'Password=PASSWORD;',
  'select * from [dbo].[Table]')
| where Id > 0
| project Name

Encryption and server validation

The following connection properties are forced when connecting to a SQL Server network endpoint, for security reasons.

  • Encrypt is set to true unconditionally.
  • TrustServerCertificate is set to false unconditionally.

As a result, the SQL Server must be configured with a valid SSL/TLS server certificate.

Specify the network endpoint

Specifying the SQL network endpoint as part of the connection string is mandatory. The appropriate syntax is:

Server = tcp: FQDN [, Port]

Where:

  • FQDN is the fully qualified domain name of the endpoint.
  • Port is the TCP port of the endpoint. By default, 1433 is assumed.

6 - User and sequence analytics plugins

6.1 - active_users_count plugin

Learn how to use the active_users_count plugin to calculate the distinct count of values that appeared in a minimum number of periods in a lookback period.

Calculates distinct count of values, where each value has appeared in at least a minimum number of periods in a lookback period.

Useful for calculating distinct counts of “fans” only, while not including appearances of “non-fans”. A user is counted as a “fan” only if it was active during the lookback period. The lookback period is only used to determine whether a user is considered active (“fan”) or not. The aggregation itself doesn’t include users from the lookback window. In comparison, the sliding_window_counts aggregation is performed over a sliding window of the lookback period.

Syntax

T | evaluate active_users_count(IdColumn, TimelineColumn, Start, End, LookbackWindow, Period, ActivePeriodsCount, Bin , [dim1, dim2, …])

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input used to count active users.
IdColumnstring✔️The name of the column with ID values that represent user activity.
TimelineColumnstring✔️The name of the column that represents timeline.
Startdatetime✔️The analysis start period.
Enddatetime✔️The analysis end period.
LookbackWindowtimespan✔️The time window defining a period where user appearance is checked. The lookback period starts at ([current appearance] - [lookback window]) and ends on ([current appearance]).
Periodtimespan✔️A constant to count as single appearance (a user will be counted as active if it appears in at least distinct ActivePeriodsCount of this timespan.
ActivePeriodsCountdecimal✔️The minimal number of distinct active periods to decide if user is active. Active users are those users who appeared in at least (equal or greater than) active periods count.
Bindecimal, datetime, or timespan✔️A constant value of the analysis step period. May also be a string of week, month, or year. All periods will be the corresponding startofweek, startofmonth, orstartofyear functions.
dim1, dim2, …dynamicAn array of the dimensions columns that slice the activity metrics calculation.

Returns

Returns a table that has distinct count values for IDs that have appeared in ActivePeriodCounts in the following periods: the lookback period, each timeline period, and each existing dimensions combination.

Output table schema is:

TimelineColumndim1..dim_ndcount_values
type: as of TimelineColumn......long

Examples

Calculate weekly number of distinct users that appeared in at least three different days over a period of prior eight days. Period of analysis: July 2018.

let Start = datetime(2018-07-01);
let End = datetime(2018-07-31);
let LookbackWindow = 8d;
let Period = 1d;
let ActivePeriods = 3;
let Bin = 7d;
let T =  datatable(User:string, Timestamp:datetime)
[
    "B",      datetime(2018-06-29),
    "B",      datetime(2018-06-30),
    "A",      datetime(2018-07-02),
    "B",      datetime(2018-07-04),
    "B",      datetime(2018-07-08),
    "A",      datetime(2018-07-10),
    "A",      datetime(2018-07-14),
    "A",      datetime(2018-07-17),
    "A",      datetime(2018-07-20),
    "B",      datetime(2018-07-24)
];
T | evaluate active_users_count(User, Timestamp, Start, End, LookbackWindow, Period, ActivePeriods, Bin)

Output

Timestampdcount
2018-07-01 00:00:00.00000001
2018-07-15 00:00:00.00000001

A user is considered active if it fulfills both of the following criteria:

  • The user was seen in at least three distinct days (Period = 1d, ActivePeriods=3).
  • The user was seen in a lookback window of 8d before and including their current appearance.

In the illustration below, the only appearances that are active by this criteria are the following instances: User A on 7/20 and User B on 7/4 (see plugin results above). The appearances of User B are included for the lookback window on 7/4, but not for the Start-End time range of 6/29-30.

Graph showing active users based on the loopback window and active period specified in the query.

6.2 - activity_counts_metrics plugin

Learn how to use the activity_counts_metrics plugin to compare activity metrics in different time windows.

Calculates useful activity metrics for each time window compared/aggregated to all previous time windows. Metrics include: total count values, distinct count values, distinct count of new values, and aggregated distinct count. Compare this plugin to activity_metrics plugin, in which every time window is compared to its previous time window only.

Syntax

T | evaluate activity_counts_metrics(IdColumn, TimelineColumn, Start, End, Step [, Dimensions])

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input used to count activities.
IdColumnstring✔️The name of the column with ID values that represent user activity.
TimelineColumnstring✔️The name of the column that represents the timeline.
Startdatetime✔️The analysis start period.
Enddatetime✔️The analysis end period.
Stepdecimal, datetime, or timespan✔️The analysis window period. The value may also be a string of week, month, or year, in which case all periods would be startofweek, startofmonth, or startofyear.
DimensionsstringZero or more comma-separated dimensions columns that slice the activity metrics calculation.

Returns

Returns a table that has the total count values, distinct count values, distinct count of new values, and aggregated distinct count for each time window. If Dimensions are provided, then there’s another column for each dimension in the output table.

The following table describes the output table schema.

Column nameTypeDescription
TimestampSame as the provided TimelineColumn argumentThe time window start time.
countlongThe total records count in the time window and dim(s)
dcountlongThe distinct ID values count in the time window and dim(s)
new_dcountlongThe distinct ID values in the time window and dim(s) compared to all previous time windows.
aggregated_dcountlongThe total aggregated distinct ID values of dim(s) from first-time window to current (inclusive).

Examples

Daily activity counts

The next query calculates daily activity counts for the provided input table.

let start=datetime(2017-08-01);
let end=datetime(2017-08-04);
let window=1d;
let T = datatable(UserId:string, Timestamp:datetime)
[
'A', datetime(2017-08-01),
'D', datetime(2017-08-01),
'J', datetime(2017-08-01),
'B', datetime(2017-08-01),
'C', datetime(2017-08-02),
'T', datetime(2017-08-02),
'J', datetime(2017-08-02),
'H', datetime(2017-08-03),
'T', datetime(2017-08-03),
'T', datetime(2017-08-03),
'J', datetime(2017-08-03),
'B', datetime(2017-08-03),
'S', datetime(2017-08-03),
'S', datetime(2017-08-04),
];
 T
 | evaluate activity_counts_metrics(UserId, Timestamp, start, end, window)

Output

Timestampcountdcountnew_dcountaggregated_dcount
2017-08-01 00:00:00.00000004444
2017-08-02 00:00:00.00000003326
2017-08-03 00:00:00.00000006528
2017-08-04 00:00:00.00000001108

6.3 - activity_engagement plugin

Learn how to use the activity_engagement plugin to calculate activity engagement ratios.

Calculates activity engagement ratio based on ID column over a sliding timeline window.

The activity_engagement plugin can be used for calculating DAU/WAU/MAU (daily/weekly/monthly activities).

Syntax

T | evaluate activity_engagement(IdColumn, TimelineColumn, [Start, End,] InnerActivityWindow, OuterActivityWindow [, dim1, dim2, …])

Parameters

NameTypeRequiredDescription
Tstring✔️The tabular input used to calculate engagement.
IdCoumnstring✔️The name of the column with ID values that represent user activity.
TimelineColumnstring✔️The name of the column that represents timeline.
StartdatetimeThe analysis start period.
EnddatetimeThe analysis end period.
InnerActivityWindowtimespan✔️The inner-scope analysis window period.
OuterActivityWindowtimespan✔️The outer-scope analysis window period.
dim1, dim2, …dynamicAn array of the dimensions columns that slice the activity metrics calculation.

Returns

Returns a table that has a distinct count of ID values inside an inner-scope window, inside an outer-scope window, and the activity ratio for each inner-scope window period for each existing dimensions combination.

Output table schema is:

TimelineColumndcount_activities_innerdcount_activities_outeractivity_ratiodim1..dim_n
type: as of TimelineColumnlonglongdouble......

Examples

DAU/WAU calculation

The following example calculates DAU/WAU (Daily Active Users / Weekly Active Users ratio) over a randomly generated data.

// Generate random data of user activities
let _start = datetime(2017-01-01);
let _end = datetime(2017-01-31);
range _day from _start to _end  step 1d
| extend d = tolong((_day - _start)/1d)
| extend r = rand()+1
| extend _users=range(tolong(d*50*r), tolong(d*50*r+100*r-1), 1) 
| mv-expand id=_users to typeof(long) limit 1000000
// Calculate DAU/WAU ratio
| evaluate activity_engagement(['id'], _day, _start, _end, 1d, 7d)
| project _day, Dau_Wau=activity_ratio*100 
| render timechart 

:::image type=“content” source=“media/activity-engagement-plugin/activity-engagement-dau-wau.png” border=“false” alt-text=“Graph displaying the ratio of daily active users to weekly active users as specified in the query.”:::

DAU/MAU calculation

The following example calculates DAU/WAU (Daily Active Users / Weekly Active Users ratio) over a randomly generated data.

// Generate random data of user activities
let _start = datetime(2017-01-01);
let _end = datetime(2017-05-31);
range _day from _start to _end  step 1d
| extend d = tolong((_day - _start)/1d)
| extend r = rand()+1
| extend _users=range(tolong(d*50*r), tolong(d*50*r+100*r-1), 1) 
| mv-expand id=_users to typeof(long) limit 1000000
// Calculate DAU/MAU ratio
| evaluate activity_engagement(['id'], _day, _start, _end, 1d, 30d)
| project _day, Dau_Mau=activity_ratio*100 
| render timechart 

:::image type=“content” source=“media/activity-engagement-plugin/activity-engagement-dau-mau.png” border=“false” alt-text=“Graph displaying the ratio of daily active users to monthly active users as specified in the query.”:::

DAU/MAU calculation with additional dimensions

The following example calculates DAU/WAU (Daily Active Users / Weekly Active Users ratio) over a randomly generated data with additional dimension (mod3).

// Generate random data of user activities
let _start = datetime(2017-01-01);
let _end = datetime(2017-05-31);
range _day from _start to _end  step 1d
| extend d = tolong((_day - _start)/1d)
| extend r = rand()+1
| extend _users=range(tolong(d*50*r), tolong(d*50*r+100*r-1), 1) 
| mv-expand id=_users to typeof(long) limit 1000000
| extend mod3 = strcat("mod3=", id % 3)
// Calculate DAU/MAU ratio
| evaluate activity_engagement(['id'], _day, _start, _end, 1d, 30d, mod3)
| project _day, Dau_Mau=activity_ratio*100, mod3 
| render timechart 

:::image type=“content” source=“media/activity-engagement-plugin/activity-engagement-dau-mau-mod3.png” border=“false” alt-text=“Graph displaying the ratio of daily active users to monthly active users with modulo 3 as specified in the query.”:::

6.4 - activity_metrics plugin

Learn how to use the activity_metrics plugin to calculate activity metrics using the current time window compared to the previous window.

Calculates useful metrics that include distinct count values, distinct count of new values, retention rate, and churn rate. This plugin is different from activity_counts_metrics plugin in which every time window is compared to all previous time windows.

Syntax

T | evaluate activity_metrics(IdColumn, TimelineColumn, [Start, End,] Window [, dim1, dim2, …])

Parameters

NameTypeRequiredDescription
Tstring✔️The input used to calculate activity metrics.
IdCoumnstring✔️The name of the column with ID values that represent user activity.
TimelineColumnstring✔️The name of the column that represents timeline.
Startdatetime✔️The analysis start period.
Enddatetime✔️The analysis end period.
Stepdecimal, datetime, or timespan✔️The analysis window period. This value may also be a string of week, month, or year, in which case all periods will be startofweek, startofmonth, or startofyear respectively.
dim1, dim2, …dynamicAn array of the dimensions columns that slice the activity metrics calculation.

Returns

The plugin returns a table with the distinct count values, distinct count of new values, retention rate, and churn rate for each timeline period for each existing dimensions combination.

Output table schema is:

TimelineColumndcount_valuesdcount_newvaluesretention_ratechurn_ratedim1..dim_n
type: as of TimelineColumnlonglongdoubledouble......

Notes

Retention Rate Definition

Retention Rate over a period is calculated as:

where the # of customers returned during the period is defined as:

Retention Rate can vary from 0.0 to 1.0 A higher score means a larger number of returning users.

Churn Rate Definition

Churn Rate over a period is calculated as:

where the # of customer lost in the period is defined as:

Churn Rate can vary from 0.0 to 1.0 The higher score means the larger number of users are NOT returning to the service.

Churn vs. Retention Rate The churn vs. retention Rate is derived from the definition of Churn Rate and Retention Rate. The following calculation is always true:

Examples

Weekly retention rate and churn rate

The next query calculates retention and churn rate for week-over-week window.

// Generate random data of user activities
let _start = datetime(2017-01-02);
let _end = datetime(2017-05-31);
range _day from _start to _end  step 1d
| extend d = tolong((_day - _start)/1d)
| extend r = rand()+1
| extend _users=range(tolong(d*50*r), tolong(d*50*r+200*r-1), 1)
| mv-expand id=_users to typeof(long) limit 1000000
//
| evaluate activity_metrics(['id'], _day, _start, _end, 7d)
| project _day, retention_rate, churn_rate
| render timechart

Output

_dayretention_ratechurn_rate
2017-01-02 00:00:00.0000000NaNNaN
2017-01-09 00:00:00.00000000.1799100449775110.820089955022489
2017-01-16 00:00:00.00000000.7443744374437440.255625562556256
2017-01-23 00:00:00.00000000.6120967741935480.387903225806452
2017-01-30 00:00:00.00000000.6811414392059550.318858560794045
2017-02-06 00:00:00.00000000.2781456953642380.721854304635762
2017-02-13 00:00:00.00000000.2231726283048210.776827371695179
2017-02-20 00:00:00.00000000.380.62
2017-02-27 00:00:00.00000000.2955190017016450.704480998298355
2017-03-06 00:00:00.00000000.2803877703206560.719612229679344
2017-03-13 00:00:00.00000000.3606281547952890.639371845204711
2017-03-20 00:00:00.00000000.2880080280983440.711991971901656
2017-03-27 00:00:00.00000000.3061349693251530.693865030674847
2017-04-03 00:00:00.00000000.3568665377176020.643133462282398
2017-04-10 00:00:00.00000000.4950980392156860.504901960784314
2017-04-17 00:00:00.00000000.1982968369829680.801703163017032
2017-04-24 00:00:00.00000000.06188118811881190.938118811881188
2017-05-01 00:00:00.00000000.2046577275935070.795342272406493
2017-05-08 00:00:00.00000000.5173913043478260.482608695652174
2017-05-15 00:00:00.00000000.1436672967863890.856332703213611
2017-05-22 00:00:00.00000000.1991223258365330.800877674163467
2017-05-29 00:00:00.00000000.0634689922480620.936531007751938

:::image type=“content” source=“media/activity-metrics-plugin/activity-metrics-churn-and-retention.png” border=“false” alt-text=“Table showing the calculated retention and churn rates per seven days as specified in the query.”:::

Distinct values and distinct ’new’ values

The next query calculates distinct values and ’new’ values (IDs that didn’t appear in previous time window) for week-over-week window.

// Generate random data of user activities
let _start = datetime(2017-01-02);
let _end = datetime(2017-05-31);
range _day from _start to _end  step 1d
| extend d = tolong((_day - _start)/1d)
| extend r = rand()+1
| extend _users=range(tolong(d*50*r), tolong(d*50*r+200*r-1), 1)
| mv-expand id=_users to typeof(long) limit 1000000
//
| evaluate activity_metrics(['id'], _day, _start, _end, 7d)
| project _day, dcount_values, dcount_newvalues
| render timechart

Output

_daydcount_valuesdcount_newvalues
2017-01-02 00:00:00.0000000630630
2017-01-09 00:00:00.0000000738575
2017-01-16 00:00:00.00000001187841
2017-01-23 00:00:00.00000001092465
2017-01-30 00:00:00.00000001261647
2017-02-06 00:00:00.000000017441043
2017-02-13 00:00:00.00000001563432
2017-02-20 00:00:00.00000001406818
2017-02-27 00:00:00.000000019561429
2017-03-06 00:00:00.00000001593848
2017-03-13 00:00:00.000000018011423
2017-03-20 00:00:00.000000017101017
2017-03-27 00:00:00.000000017961516
2017-04-03 00:00:00.000000013811008
2017-04-10 00:00:00.000000017561162
2017-04-17 00:00:00.000000018311409
2017-04-24 00:00:00.000000018231164
2017-05-01 00:00:00.000000018111353
2017-05-08 00:00:00.000000016911246
2017-05-15 00:00:00.000000018121608
2017-05-22 00:00:00.000000017401017
2017-05-29 00:00:00.0000000960756

:::image type=“content” source=“media/activity-metrics-plugin/activity-metrics-dcount-and-dcount-newvalues.png” border=“false” alt-text=“Table showing the count of distinct values (dcount_values) and of new distinct values (dcount_newvalues) that didn’t appear in previous time window as specified in the query.”:::

6.5 - funnel_sequence plugin

Learn how to use the funnel_sequence plugin to learn how to calculate the distinct count of users who have taken a sequence of states, and the distribution of previous/next states that have led to/were followed by the sequence.

Calculates distinct count of users who have taken a sequence of states, and the distribution of previous/next states that have led to/were followed by the sequence. The plugin is invoked with the evaluate operator.

Syntax

T | evaluate funnel_sequence(IdColumn, TimelineColumn, Start, End, MaxSequenceStepWindow, Step, StateColumn, Sequence)

Parameters

NameTypeRequiredDescription
Tstring✔️The input tabular expression.
IdColumstring✔️The column reference representing the ID. This column must be present in T.
TimelineColumnstring✔️The column reference representing the timeline. This column must be present in T.
Startdatetime, timespan, or long✔️The analysis start period.
Enddatetime, timespan, or long✔️The analysis end period.
MaxSequenceStepWindowdatetime, timespan, or long✔️The value of the max allowed timespan between two sequential steps in the sequence.
Stepdatetime, timespan, or long✔️The analysis step period, or bin.
StateColumnstring✔️The column reference representing the state. This column must be present in T.
Sequencedynamic✔️An array with the sequence values that are looked up in StateColumn.

Returns

Returns three output tables, which are useful for constructing a sankey diagram for the analyzed sequence:

  • Table #1 - prev-sequence-next dcount

    • TimelineColumn: the analyzed time window
    • prev: the prev state (may be empty if there were any users that only had events for the searched sequence, but not any events prior to it).
    • next: the next state (may be empty if there were any users that only had events for the searched sequence, but not any events that followed it).
    • dcount: distinct count of IdColumn in time window that transitioned prev –> Sequence –> next.
    • samples: an array of IDs (from IdColumn) corresponding to the row’s sequence (a maximum of 128 IDs are returned).
  • Table #2 - prev-sequence dcount

    • TimelineColumn: the analyzed time window
    • prev: the prev state (may be empty if there were any users that only had events for the searched sequence, but not any events prior to it).
    • dcount: distinct count of IdColumn in time window that transitioned prev –> Sequence –> next.
    • samples: an array of IDs (from IdColumn) corresponding to the row’s sequence (a maximum of 128 IDs are returned).
  • Table #3 - sequence-next dcount

    • TimelineColumn: the analyzed time window
    • next: the next state (may be empty if there were any users that only had events for the searched sequence, but not any events that followed it).
    • dcount: distinct count of IdColumn in time window that transitioned prev –> Sequence –> next.
    • samples: an array of IDs (from IdColumn) corresponding to the row’s sequence (a maximum of 128 IDs are returned).

Examples

Exploring storm events

The following query looks at the table StormEvents (weather statistics for 2007) and shows which events happened before/after all Tornado events occurred in 2007.

// Looking on StormEvents statistics: 
// Q1: What happens before Tornado event?
// Q2: What happens after Tornado event?
StormEvents
| evaluate funnel_sequence(EpisodeId, StartTime, datetime(2007-01-01), datetime(2008-01-01), 1d,365d, EventType, dynamic(['Tornado']))

Result includes three tables:

  • Table #1: All possible variants of what happened before and after the sequence. For example, the second line means that there were 87 different events that had following sequence: Hail -> Tornado -> Hail
StartTimeprevnextdcount
2007-01-01 00:00:00.0000000293
2007-01-01 00:00:00.0000000HailHail87
2007-01-01 00:00:00.0000000Thunderstorm WindThunderstorm Wind77
2007-01-01 00:00:00.0000000HailThunderstorm Wind28
2007-01-01 00:00:00.0000000Hail28
2007-01-01 00:00:00.0000000Hail27
2007-01-01 00:00:00.0000000Thunderstorm Wind25
2007-01-01 00:00:00.0000000Thunderstorm WindHail24
2007-01-01 00:00:00.0000000Thunderstorm Wind24
2007-01-01 00:00:00.0000000Flash FloodFlash Flood12
2007-01-01 00:00:00.0000000Thunderstorm WindFlash Flood8
2007-01-01 00:00:00.0000000Flash Flood8
2007-01-01 00:00:00.0000000Funnel CloudThunderstorm Wind6
2007-01-01 00:00:00.0000000Funnel Cloud6
2007-01-01 00:00:00.0000000Flash Flood6
2007-01-01 00:00:00.0000000Funnel CloudFunnel Cloud6
2007-01-01 00:00:00.0000000HailFlash Flood4
2007-01-01 00:00:00.0000000Flash FloodThunderstorm Wind4
2007-01-01 00:00:00.0000000HailFunnel Cloud4
2007-01-01 00:00:00.0000000Funnel CloudHail4
2007-01-01 00:00:00.0000000Funnel Cloud4
2007-01-01 00:00:00.0000000Thunderstorm WindFunnel Cloud3
2007-01-01 00:00:00.0000000Heavy RainThunderstorm Wind2
2007-01-01 00:00:00.0000000Flash FloodFunnel Cloud2
2007-01-01 00:00:00.0000000Flash FloodHail2
2007-01-01 00:00:00.0000000Strong WindThunderstorm Wind1
2007-01-01 00:00:00.0000000Heavy RainFlash Flood1
2007-01-01 00:00:00.0000000Heavy RainHail1
2007-01-01 00:00:00.0000000HailFlood1
2007-01-01 00:00:00.0000000LightningHail1
2007-01-01 00:00:00.0000000Heavy RainLightning1
2007-01-01 00:00:00.0000000Funnel CloudHeavy Rain1
2007-01-01 00:00:00.0000000Flash FloodFlood1
2007-01-01 00:00:00.0000000FloodFlash Flood1
2007-01-01 00:00:00.0000000Heavy Rain1
2007-01-01 00:00:00.0000000Funnel CloudLightning1
2007-01-01 00:00:00.0000000LightningThunderstorm Wind1
2007-01-01 00:00:00.0000000FloodThunderstorm Wind1
2007-01-01 00:00:00.0000000HailLightning1
2007-01-01 00:00:00.0000000Lightning1
2007-01-01 00:00:00.0000000Tropical StormHurricane (Typhoon)1
2007-01-01 00:00:00.0000000Coastal Flood1
2007-01-01 00:00:00.0000000Rip Current1
2007-01-01 00:00:00.0000000Heavy Snow1
2007-01-01 00:00:00.0000000Strong Wind1
  • Table #2: shows all distinct events grouped by the previous event. For example, the second line shows that there were a total of 150 events of Hail that happened just before Tornado.
StartTimeprevdcount
2007-01-01 00:00:00.0000000331
2007-01-01 00:00:00.0000000Hail150
2007-01-01 00:00:00.0000000Thunderstorm Wind135
2007-01-01 00:00:00.0000000Flash Flood28
2007-01-01 00:00:00.0000000Funnel Cloud22
2007-01-01 00:00:00.0000000Heavy Rain5
2007-01-01 00:00:00.0000000Flood2
2007-01-01 00:00:00.0000000Lightning2
2007-01-01 00:00:00.0000000Strong Wind2
2007-01-01 00:00:00.0000000Heavy Snow1
2007-01-01 00:00:00.0000000Rip Current1
2007-01-01 00:00:00.0000000Coastal Flood1
2007-01-01 00:00:00.0000000Tropical Storm1
  • Table #3: shows all distinct events grouped by next event. For example, the second line shows that there were a total of 143 events of Hail that happened after Tornado.
StartTimenextdcount
2007-01-01 00:00:00.0000000332
2007-01-01 00:00:00.0000000Hail145
2007-01-01 00:00:00.0000000Thunderstorm Wind143
2007-01-01 00:00:00.0000000Flash Flood32
2007-01-01 00:00:00.0000000Funnel Cloud21
2007-01-01 00:00:00.0000000Lightning4
2007-01-01 00:00:00.0000000Heavy Rain2
2007-01-01 00:00:00.0000000Flood2
2007-01-01 00:00:00.0000000Hurricane (Typhoon)1

Now, let’s try to find out how the following sequence continues:
Hail -> Tornado -> Thunderstorm Wind

StormEvents
| evaluate funnel_sequence(
               EpisodeId,
               StartTime,
               datetime(2007-01-01),
               datetime(2008-01-01),
               1d,
               365d,
               EventType, 
               dynamic(['Hail', 'Tornado', 'Thunderstorm Wind'])
           )

Skipping Table #1 and Table #2, and looking at Table #3, we can conclude that sequence Hail -> Tornado -> Thunderstorm Wind in 92 events ended with this sequence, continued as Hail in 41 events, and turned back to Tornado in 14.

StartTimenextdcount
2007-01-01 00:00:00.000000092
2007-01-01 00:00:00.0000000Hail41
2007-01-01 00:00:00.0000000Tornado14
2007-01-01 00:00:00.0000000Flash Flood11
2007-01-01 00:00:00.0000000Lightning2
2007-01-01 00:00:00.0000000Heavy Rain1
2007-01-01 00:00:00.0000000Flood1

6.6 - funnel_sequence_completion plugin

Learn how to use the funnel_sequence_completion plugin to calculate a funnel of completed sequence steps while comparing different time periods.

Calculates a funnel of completed sequence steps while comparing different time periods. The plugin is invoked with the evaluate operator.

Syntax

T | evaluate funnel_sequence_completion(IdColumn, TimelineColumn, Start, End, BinSize, StateColumn, Sequence, MaxSequenceStepWindows)

Parameters

NameTypeRequiredDescription
Tstring✔️The input tabular expression.
IdColumstring✔️The column reference representing the ID. The column must be present in T.
TimelineColumnstring✔️The column reference representing the timeline. The column must be present in T.
Startdatetime, timespan, or long✔️The analysis start period.
Enddatetime, timespan, or long✔️The analysis end period.
BinSizedatetime, timespan, or long✔️The analysis window size. Each window is analyzed separately.
StateColumnstring✔️The column reference representing the state. The column must be present in T.
Sequencedynamic✔️An array with the sequence values that are looked up in StateColumn.
MaxSequenceStepPeriodsdynamic✔️An array with the values of the max allowed timespan between the first and last sequential steps in the sequence. Each period in the array generates a funnel analysis result.

Returns

Returns a single table useful for constructing a funnel diagram for the analyzed sequence:

  • TimelineColumn: the analyzed time window (bin), each bin in the analysis timeframe (Start to End) generates a funnel analysis separately.
  • StateColumn: the state of the sequence.
  • Period: the maximal period allowed for completing steps in the funnel sequence measured from the first step in the sequence. Each value in MaxSequenceStepPeriods generates a funnel analysis with a separate period.
  • dcount: distinct count of IdColumn in time window that transitioned from first sequence state to the value of StateColumn.

Examples

Exploring Storm Events

The following query checks the completion funnel of the sequence: Hail -> Tornado -> Thunderstorm Wind in “overall” time of 1hour, 4hours, 1day.

let _start = datetime(2007-01-01);
let _end =  datetime(2008-01-01);
let _windowSize = 365d;
let _sequence = dynamic(['Hail', 'Tornado', 'Thunderstorm Wind']);
let _periods = dynamic([1h, 4h, 1d]);
StormEvents
| evaluate funnel_sequence_completion(EpisodeId, StartTime, _start, _end, _windowSize, EventType, _sequence, _periods) 

Output

StartTimeEventTypePerioddcount
2007-01-01 00:00:00.0000000Hail01:00:002877
2007-01-01 00:00:00.0000000Tornado01:00:00208
2007-01-01 00:00:00.0000000Thunderstorm Wind01:00:0087
2007-01-01 00:00:00.0000000Hail04:00:002877
2007-01-01 00:00:00.0000000Tornado04:00:00231
2007-01-01 00:00:00.0000000Thunderstorm Wind04:00:00141
2007-01-01 00:00:00.0000000Hail1.00:00:002877
2007-01-01 00:00:00.0000000Tornado1.00:00:00244
2007-01-01 00:00:00.0000000Thunderstorm Wind1.00:00:00155

Understanding the results:
The outcome is three funnels (for periods: One hour, 4 hours, and one day). For each funnel step, a number of distinct counts of are shown. You can see that the more time is given to complete the whole sequence of Hail -> Tornado -> Thunderstorm Wind, the higher dcount value is obtained. In other words, there were more occurrences of the sequence reaching the funnel step.

6.7 - new_activity_metrics plugin

Learn how to use the new_activity_metrics plugin to calculate activity metrics.

Calculates useful activity metrics (distinct count values, distinct count of new values, retention rate, and churn rate) for the cohort of New Users. Each cohort of New Users (all users, which were first seen in time window) is compared to all prior cohorts. Comparison takes into account all previous time windows. For example, for records from T2 to T3, the distinct count of users will be all users in T3 who weren’t seen in both T1 and T2. The plugin is invoked with the evaluate operator.

Syntax

TabularExpression | evaluate new_activity_metrics(IdColumn, TimelineColumn, Start, End, Window [, Cohort] [, dim1, dim2, …] [, Lookback] )

Parameters

NameTypeRequiredDescription
TabularExpressionstring✔️The tabular expression for which to calculate activity metrics.
IdColumnstring✔️The name of the column with ID values that represent user activity.
TimelineColumnstring✔️The name of the column that represents the timeline.
Startscalar✔️The value of the analysis start period.
Endscalar✔️The value of the analysis end period.
Windowscalar✔️The value of the analysis window period. Can be a numeric, datetime, or timespan value, or a string that is one of week, month or year, in which case all periods will be startofweek/startofmonth/startofyear accordingly. When using startofweek, make sure start time is a Sunday, otherwise first cohort will be empty (since startofweek is considered to be a Sunday).
CohortscalarIndicates a specific cohort. If not provided, all cohorts corresponding to the analysis time window are calculated and returned.
dim1, dim2, …dynamicAn array of the dimensions columns that slice the activity metrics calculation.
LookbackstringA tabular expression with a set of IDs that belong to the ’look back’ period.

Returns

Returns a table that contains the following for each combination of ‘from’ and ’to’ timeline periods and for each existing column (dimensions) combination:

  • distinct count values
  • distinct count of new values
  • retention rate
  • churn rate

Output table schema is:

from_TimelineColumnto_TimelineColumndcount_new_valuesdcount_retained_valuesdcount_churn_valuesretention_ratechurn_ratedim1..dim_n
type: as of TimelineColumnsamelonglongdoubledoubledouble......
  • from_TimelineColumn - the cohort of new users. Metrics in this record refer to all users who were first seen in this period. The decision on first seen takes into account all previous periods in the analysis period.
  • to_TimelineColumn - the period being compared to.
  • dcount_new_values - the number of distinct users in to_TimelineColumn that weren’t seen in all periods prior to and including from_TimelineColumn.
  • dcount_retained_values - out of all new users, first seen in from_TimelineColumn, the number of distinct users that were seen in to_TimelineCoumn.
  • dcount_churn_values - out of all new users, first seen in from_TimelineColumn, the number of distinct users that weren’t seen in to_TimelineCoumn.
  • retention_rate - the percent of dcount_retained_values out of the cohort (users first seen in from_TimelineColumn).
  • churn_rate - the percent of dcount_churn_values out of the cohort (users first seen in from_TimelineColumn).

Examples

The following sample dataset shows which users seen on which days. The table was generated based on a source Users table, as follows:

Users | summarize tostring(make_set(user)) by bin(Timestamp, 1d) | order by Timestamp asc;

Output

Timestampset_user
2019-11-01 00:00:00.0000000[0,2,3,4]
2019-11-02 00:00:00.0000000[0,1,3,4,5]
2019-11-03 00:00:00.0000000[0,2,4,5]
2019-11-04 00:00:00.0000000[0,1,2,3]
2019-11-05 00:00:00.0000000[0,1,2,3,4]

The output of the plugin for the original table is the following:

let StartDate = datetime(2019-11-01 00:00:00);
let EndDate = datetime(2019-11-07 00:00:00);
Users 
| evaluate new_activity_metrics(user, Timestamp, StartDate, EndDate-1tick, 1d) 
| where from_Timestamp < datetime(2019-11-03 00:00:00.0000000)

Output

Rfrom_Timestampto_Timestampdcount_new_valuesdcount_retained_valuesdcount_churn_valuesretention_ratechurn_rate
12019-11-01 00:00:00.00000002019-11-01 00:00:00.000000044010
22019-11-01 00:00:00.00000002019-11-02 00:00:00.00000002310.750.25
32019-11-01 00:00:00.00000002019-11-03 00:00:00.00000001310.750.25
42019-11-01 00:00:00.00000002019-11-04 00:00:00.00000001310.750.25
52019-11-01 00:00:00.00000002019-11-05 00:00:00.000000014010
62019-11-01 00:00:00.00000002019-11-06 00:00:00.000000000401
72019-11-02 00:00:00.00000002019-11-02 00:00:00.000000022010
82019-11-02 00:00:00.00000002019-11-03 00:00:00.00000000110.50.5
92019-11-02 00:00:00.00000002019-11-04 00:00:00.00000000110.50.5
102019-11-02 00:00:00.00000002019-11-05 00:00:00.00000000110.50.5
112019-11-02 00:00:00.00000002019-11-06 00:00:00.000000000201

Following is an analysis of a few records from the output:

  • Record R=3, from_TimelineColumn = 2019-11-01, to_TimelineColumn = 2019-11-03:

    • The users considered for this record are all new users seen on 11/1. Since this is the first period, these are all users in that bin – [0,2,3,4]
    • dcount_new_values – the number of users on 11/3 who weren’t seen on 11/1. This includes a single user – 5.
    • dcount_retained_values – out of all new users on 11/1, how many were retained until 11/3? There are three values ([0,2,4]), while count_churn_values is one (user=3).
    • retention_rate = 0.75 – the three retained users out of the four new users who were first seen in 11/1.
  • Record R=9, from_TimelineColumn = 2019-11-02, to_TimelineColumn = 2019-11-04:

    • This record focuses on the new users who were first seen on 11/2 – users 1 and 5.
    • dcount_new_values – the number of users on 11/4 who weren’t seen through all periods T0 .. from_Timestamp. Meaning, users who are seen on 11/4 but who weren’t seen on either 11/1 or 11/2 – there are no such users.
    • dcount_retained_values – out of all new users on 11/2 ([1,5]), how many were retained until 11/4? There’s one such user ([1]), while count_churn_values is one (user 5).
    • retention_rate is 0.5 – the single user that was retained on 11/4 out of the two new ones on 11/2.

Weekly retention rate, and churn rate (single week)

The next query calculates a retention and churn rate for week-over-week window for New Users cohort (users that arrived on the first week).

// Generate random data of user activities
let _start = datetime(2017-05-01);
let _end = datetime(2017-05-31);
range Day from _start to _end step 1d
| extend d = tolong((Day - _start) / 1d)
| extend r = rand() + 1
| extend _users=range(tolong(d * 50 * r), tolong(d * 50 * r + 200 * r - 1), 1) 
| mv-expand id=_users to typeof(long) limit 1000000
// Take only the first week cohort (last parameter)
| evaluate new_activity_metrics(['id'], Day, _start, _end, 7d, _start)
| project from_Day, to_Day, retention_rate, churn_rate

Output

from_Dayto_Dayretention_ratechurn_rate
2017-05-01 00:00:00.00000002017-05-01 00:00:00.000000010
2017-05-01 00:00:00.00000002017-05-08 00:00:00.00000000.5446327683615820.455367231638418
2017-05-01 00:00:00.00000002017-05-15 00:00:00.00000000.0316384180790960.968361581920904
2017-05-01 00:00:00.00000002017-05-22 00:00:00.000000001
2017-05-01 00:00:00.00000002017-05-29 00:00:00.000000001

Weekly retention rate, and churn rate (complete matrix)

The next query calculates retention and churn rate for week-over-week window for New Users cohort. If the previous example calculated the statistics for a single week - the following query produces an NxN table for each from/to combination.

// Generate random data of user activities
let _start = datetime(2017-05-01);
let _end = datetime(2017-05-31);
range Day from _start to _end step 1d
| extend d = tolong((Day - _start) / 1d)
| extend r = rand() + 1
| extend _users=range(tolong(d * 50 * r), tolong(d * 50 * r + 200 * r - 1), 1) 
| mv-expand id=_users to typeof(long) limit 1000000
// Last parameter is omitted - 
| evaluate new_activity_metrics(['id'], Day, _start, _end, 7d)
| project from_Day, to_Day, retention_rate, churn_rate

Output

from_Dayto_Dayretention_ratechurn_rate
2017-05-01 00:00:00.00000002017-05-01 00:00:00.000000010
2017-05-01 00:00:00.00000002017-05-08 00:00:00.00000000.1903973509933770.809602649006622
2017-05-01 00:00:00.00000002017-05-15 00:00:00.000000001
2017-05-01 00:00:00.00000002017-05-22 00:00:00.000000001
2017-05-01 00:00:00.00000002017-05-29 00:00:00.000000001
2017-05-08 00:00:00.00000002017-05-08 00:00:00.000000010
2017-05-08 00:00:00.00000002017-05-15 00:00:00.00000000.4052631578947370.594736842105263
2017-05-08 00:00:00.00000002017-05-22 00:00:00.00000000.2276315789473680.772368421052632
2017-05-08 00:00:00.00000002017-05-29 00:00:00.000000001
2017-05-15 00:00:00.00000002017-05-15 00:00:00.000000010
2017-05-15 00:00:00.00000002017-05-22 00:00:00.00000000.7854889589905360.214511041009464
2017-05-15 00:00:00.00000002017-05-29 00:00:00.00000000.2376445846477390.762355415352261
2017-05-22 00:00:00.00000002017-05-22 00:00:00.000000010
2017-05-22 00:00:00.00000002017-05-29 00:00:00.00000000.6218354430379750.378164556962025
2017-05-29 00:00:00.00000002017-05-29 00:00:00.000000010

Weekly retention rate with lookback period

The following query calculates the retention rate of New Users cohort when taking into consideration lookback period: a tabular query with set of Ids that are used to define the New Users cohort (all IDs that don’t appear in this set are New Users). The query examines the retention behavior of the New Users during the analysis period.

// Generate random data of user activities
let _lookback = datetime(2017-02-01);
let _start = datetime(2017-05-01);
let _end = datetime(2017-05-31);
let _data = range Day from _lookback to _end step 1d
    | extend d = tolong((Day - _lookback) / 1d)
    | extend r = rand() + 1
    | extend _users=range(tolong(d * 50 * r), tolong(d * 50 * r + 200 * r - 1), 1) 
    | mv-expand id=_users to typeof(long) limit 1000000;
//
let lookback_data = _data | where Day < _start | project Day, id;
_data
| evaluate new_activity_metrics(id, Day, _start, _end, 7d, _start, lookback_data)
| project from_Day, to_Day, retention_rate

Output

from_Dayto_Dayretention_rate
2017-05-01 00:00:00.00000002017-05-01 00:00:00.00000001
2017-05-01 00:00:00.00000002017-05-08 00:00:00.00000000.404081632653061
2017-05-01 00:00:00.00000002017-05-15 00:00:00.00000000.257142857142857
2017-05-01 00:00:00.00000002017-05-22 00:00:00.00000000.296326530612245
2017-05-01 00:00:00.00000002017-05-29 00:00:00.00000000.0587755102040816

6.8 - rolling_percentile plugin

Learn how to use the rolling_percentile plugin to calculate an estimate of the rolling percentile per bin for the specified value column.

Returns an estimate for the specified percentile of the ValueColumn population in a rolling (sliding) BinsPerWindow size window per BinSize.

The plugin is invoked with the evaluate operator.

Syntax

T | evaluate rolling_percentile(ValueColumn, Percentile, IndexColumn, BinSize, BinsPerWindow [, dim1, dim2, …] )

Parameters

NameTypeRequiredDescription
Tstring✔️The input tabular expression.
ValueColumnstring✔️The name of the column used to calculate the percentiles.
Percentileint, long, or real✔️Scalar with the percentile to calculate.
IndexColumnstring✔️The name of the column over which to run the rolling window.
BinSizeint, long, real, datetime, or timespan✔️Scalar with size of the bins to apply over the IndexColumn.
BinsPerWindowint✔️The number of bins included in each window.
dim1, dim2, …stringA list of the dimensions columns to slice by.

Returns

Returns a table with a row per each bin (and combination of dimensions if specified) that has the rolling percentile of values in the window ending at the bin (inclusive). Output table schema is:

IndexColumndim1dim_nrolling_BinsPerWindow_percentile_ValueColumn_Pct

Examples

Rolling 3-day median value per day

The next query calculates a 3-day median value in daily granularity. Each row in the output represents the median value for the last 3 bins (days), including the bin itself.

let T = 
    range idx from 0 to 24 * 10 - 1 step 1
    | project Timestamp = datetime(2018-01-01) + 1h * idx, val=idx + 1
    | extend EvenOrOdd = iff(val % 2 == 0, "Even", "Odd");
T  
| evaluate rolling_percentile(val, 50, Timestamp, 1d, 3)

Output

Timestamprolling_3_percentile_val_50
2018-01-01 00:00:00.000000012
2018-01-02 00:00:00.000000024
2018-01-03 00:00:00.000000036
2018-01-04 00:00:00.000000060
2018-01-05 00:00:00.000000084
2018-01-06 00:00:00.0000000108
2018-01-07 00:00:00.0000000132
2018-01-08 00:00:00.0000000156
2018-01-09 00:00:00.0000000180
2018-01-10 00:00:00.0000000204

Rolling 3-day median value per day by dimension

Same example from above, but now also calculates the rolling window partitioned for each value of the dimension.

let T = 
    range idx from 0 to 24 * 10 - 1 step 1
    | project Timestamp = datetime(2018-01-01) + 1h * idx, val=idx + 1
    | extend EvenOrOdd = iff(val % 2 == 0, "Even", "Odd");
T  
| evaluate rolling_percentile(val, 50, Timestamp, 1d, 3, EvenOrOdd)

Output

TimestampEvenOrOddrolling_3_percentile_val_50
2018-01-01 00:00:00.0000000Even12
2018-01-02 00:00:00.0000000Even24
2018-01-03 00:00:00.0000000Even36
2018-01-04 00:00:00.0000000Even60
2018-01-05 00:00:00.0000000Even84
2018-01-06 00:00:00.0000000Even108
2018-01-07 00:00:00.0000000Even132
2018-01-08 00:00:00.0000000Even156
2018-01-09 00:00:00.0000000Even180
2018-01-10 00:00:00.0000000Even204
2018-01-01 00:00:00.0000000Odd11
2018-01-02 00:00:00.0000000Odd23
2018-01-03 00:00:00.0000000Odd35
2018-01-04 00:00:00.0000000Odd59
2018-01-05 00:00:00.0000000Odd83
2018-01-06 00:00:00.0000000Odd107
2018-01-07 00:00:00.0000000Odd131
2018-01-08 00:00:00.0000000Odd155
2018-01-09 00:00:00.0000000Odd179
2018-01-10 00:00:00.0000000Odd203

6.9 - rows_near plugin

Learn how to use the rows_near plugin to find rows near a specified condition.

Finds rows near a specified condition.

The plugin is invoked with the evaluate operator.

Syntax

T | evaluate rows_near(Condition, NumRows, [, RowsAfter ])

Parameters

NameTypeRequiredDescription
Tstring✔️The input tabular expression.
Conditionbool✔️Represents the condition to find rows around.
NumRowsint✔️The number of rows to find before and after the condition.
RowsAfterintWhen specified, overrides the number of rows to find after the condition.

Returns

Every row from the input that is within NumRows from a true Condition, When RowsAfter is specified, returns every row from the input that is NumRows before or RowsAfter after a true Condition.

Example

Find rows with an "Error" State, and returns 2 rows before and after the "Error" record.

datatable (Timestamp:datetime, Value:long, State:string )
[
    datetime(2021-06-01), 1, "Success",
    datetime(2021-06-02), 4, "Success",
    datetime(2021-06-03), 3, "Success",
    datetime(2021-06-04), 11, "Success",
    datetime(2021-06-05), 15, "Success",
    datetime(2021-06-06), 2, "Success",
    datetime(2021-06-07), 19, "Error",
    datetime(2021-06-08), 12, "Success",
    datetime(2021-06-09), 7, "Success",
    datetime(2021-06-10), 9, "Success",
    datetime(2021-06-11), 4, "Success",
    datetime(2021-06-12), 1, "Success",
]
| sort by Timestamp asc 
| evaluate rows_near(State == "Error", 2)

Output

TimestampValueState
2021-06-05 00:00:00.000000015Success
2021-06-06 00:00:00.00000002Success
2021-06-07 00:00:00.000000019Error
2021-06-08 00:00:00.000000012Success
2021-06-09 00:00:00.00000007Success

6.10 - sequence_detect plugin

Learn how to use the sequence_detect plugin to detect sequence occurrences based on provided predicates.

Detects sequence occurrences based on provided predicates. The plugin is invoked with the evaluate operator.

Syntax

T | evaluate sequence_detect (TimelineColumn, MaxSequenceStepWindow, MaxSequenceSpan, Expr1, Expr2, …, Dim1, Dim2,)

Parameters

NameTypeRequiredDescription
Tstring✔️The input tabular expression.
TimelineColumnstring✔️The column reference representing timeline, must be present in the source expression.
MaxSequenceStepWindowtimespan✔️The value of the max allowed timespan between 2 sequential steps in the sequence.
MaxSequenceSpantimespan✔️The max timespan for the sequence to complete all steps.
Expr1, Expr2, …string✔️The boolean predicate expressions defining sequence steps.
Dim1, Dim2, …string✔️The dimension expressions that are used to correlate sequences.

Returns

Returns a single table where each row in the table represents a single sequence occurrence:

  • Dim1, Dim2, …: dimension columns that were used to correlate sequences.
  • Expr1TimelineColumn, Expr2TimelineColumn, …: Columns with time values, representing the timeline of each sequence step.
  • Duration: the overall sequence time window

Examples

The following query looks at the table T to search for relevant data from a specified time period.

T | evaluate sequence_detect(datetime_column, 10m, 1h, e1 = (Col1 == 'Val'), e2 = (Col2 == 'Val2'), Dim1, Dim2)

Exploring Storm Events

The following query looks on the table StormEvents (weather statistics for 2007) and shows cases where sequence of ‘Excessive Heat’ was followed by ‘Wildfire’ within 5 days.

StormEvents
| evaluate sequence_detect(
               StartTime,
               5d,  // step max-time
               5d,  // sequence max-time
               heat=(EventType == "Excessive Heat"), 
               wildfire=(EventType == 'Wildfire'), 
               State
           )

Output

Stateheat_StartTimewildfire_StartTimeDuration
CALIFORNIA2007-05-08 00:00:00.00000002007-05-08 16:02:00.000000016:02:00
CALIFORNIA2007-05-08 00:00:00.00000002007-05-10 11:30:00.00000002.11:30:00
CALIFORNIA2007-07-04 09:00:00.00000002007-07-05 23:01:00.00000001.14:01:00
SOUTH DAKOTA2007-07-23 12:00:00.00000002007-07-27 09:00:00.00000003.21:00:00
TEXAS2007-08-10 08:00:00.00000002007-08-11 13:56:00.00000001.05:56:00
CALIFORNIA2007-08-31 08:00:00.00000002007-09-01 11:28:00.00000001.03:28:00
CALIFORNIA2007-08-31 08:00:00.00000002007-09-02 13:30:00.00000002.05:30:00
CALIFORNIA2007-09-02 12:00:00.00000002007-09-02 13:30:00.000000001:30:00

6.11 - session_count plugin

Learn how to use the session_count plugin to calculate the session count based on the ID column over a timeline.

Calculates the session count based on the ID column over a timeline. The plugin is invoked with the evaluate operator.

Syntax

TabularExpression | evaluate session_count(IdColumn, TimelineColumn, Start, End, Bin, LookBackWindow [, dim1, dim2, …])

Parameters

NameTypeRequiredDescription
TabularExpressionstring✔️The tabular expression that serves as input.
IdColumnstring✔️The name of the column with ID values that represents user activity.
TimelineColumnstring✔️The name of the column that represents the timeline.
Startscalar✔️The start of the analysis period.
Endscalar✔️The end of the analysis period.
Binscalar✔️The session’s analysis step period.
LookBackWindowscalar✔️The session lookback period. If the ID from IdColumn appears in a time window within LookBackWindow, the session is considered to be an existing one. If the ID doesn’t appear, then the session is considered to be new.
dim1, dim2, …stringA list of the dimensions columns that slice the session count calculation.

Returns

Returns a table that has the session count values for each timeline period and for each existing dimensions combination.

Output table schema is:

TimelineColumndim1..dim_ncount_sessions
type: as of TimelineColumn......long

Examples

For this example, the data is deterministic, and we use a table with two columns:

  • Timeline: a running number from 1 to 10,000
  • Id: ID of the user from 1 to 50

Id appears at the specific Timeline slot if it’s a divider of Timeline (Timeline % Id == 0).

An event with Id==1 will appear at any Timeline slot, an event with Id==2 at every second Timeline slot, and so on.

Here are 20 lines of the data:

let _data = range Timeline from 1 to 10000 step 1
    | extend __key = 1
    | join kind=inner (range Id from 1 to 50 step 1 | extend __key=1) on __key
    | where Timeline % Id == 0
    | project Timeline, Id;
// Look on few lines of the data
_data
| order by Timeline asc, Id asc
| take 20

Output

TimelineId
11
21
22
31
33
41
42
44
51
55
61
62
63
66
71
77
81
82
84
88

Let’s define a session in next terms: session considered to be active as long as user (Id) appears at least once at a timeframe of 100 time slots, while session look-back window is 41 time slots.

The next query shows the count of active sessions according to the above definition.

let _data = range Timeline from 1 to 9999 step 1
    | extend __key = 1
    | join kind=inner (range Id from 1 to 50 step 1 | extend __key=1) on __key
    | where Timeline % Id == 0
    | project Timeline, Id;
// End of data definition
_data
| evaluate session_count(Id, Timeline, 1, 10000, 100, 41)
| render linechart 

Example session count.

6.12 - sliding_window_counts plugin

Learn how to use the sliding_window_counts plugin to calculate counts and distinct counts of values in a sliding window over a lookback period.

Calculates counts and distinct count of values in a sliding window over a lookback period, using the technique described in the Perform aggregations over a sliding window example. The plugin is invoked with the evaluate operator.

Syntax

T | evaluate sliding_window_counts(IdColumn, TimelineColumn, Start, End, LookbackWindow, Bin , [dim1, dim2, …])

Parameters

NameTypeRequiredDescription
Tstring✔️The input tabular expression.
IdColumnstring✔️The name of the column with ID values that represent user activity.
TimelineColumnstring✔️The name of the column representing the timeline.
Startint, long, real, datetime, or timespan✔️The analysis start period.
Endint, long, real, datetime, or timespan✔️The analysis end period.
LookbackWindowint, long, real, datetime, or timespan✔️The lookback period. This value should be a multiple of the Bin value, otherwise the LookbackWindow will be rounded down to a multiple of the Bin value. For example, for dcount users in past 7d: LookbackWindow = 7d.
Binint, long, real, datetime, timespan, or string✔️The analysis step period. The possible string values are week, month, and year for which all periods will be startofweek, startofmonth, startofyear respectively.
dim1, dim2, …stringA list of the dimensions columns that slice the activity metrics calculation.

Returns

Returns a table that has the count and distinct count values of Ids in the lookback period, for each timeline period (by bin) and for each existing dimensions combination.

Output table schema is:

TimelineColumndim1..dim_ncountdcount
type: as of TimelineColumn......longlong

Example

Calculate counts and dcounts for users in past week, for each day in the analysis period.

let start = datetime(2017 - 08 - 01);
let end = datetime(2017 - 08 - 07); 
let lookbackWindow = 3d;  
let bin = 1d;
let T = datatable(UserId: string, Timestamp: datetime)
    [
    'Bob', datetime(2017 - 08 - 01), 
    'David', datetime(2017 - 08 - 01), 
    'David', datetime(2017 - 08 - 01), 
    'John', datetime(2017 - 08 - 01), 
    'Bob', datetime(2017 - 08 - 01), 
    'Ananda', datetime(2017 - 08 - 02),  
    'Atul', datetime(2017 - 08 - 02), 
    'John', datetime(2017 - 08 - 02), 
    'Ananda', datetime(2017 - 08 - 03), 
    'Atul', datetime(2017 - 08 - 03), 
    'Atul', datetime(2017 - 08 - 03), 
    'John', datetime(2017 - 08 - 03), 
    'Bob', datetime(2017 - 08 - 03), 
    'Betsy', datetime(2017 - 08 - 04), 
    'Bob', datetime(2017 - 08 - 05), 
];
T
| evaluate sliding_window_counts(UserId, Timestamp, start, end, lookbackWindow, bin)

Output

TimestampCountdcount
2017-08-01 00:00:00.000000053
2017-08-02 00:00:00.000000085
2017-08-03 00:00:00.0000000135
2017-08-04 00:00:00.000000095
2017-08-05 00:00:00.000000075
2017-08-06 00:00:00.000000022
2017-08-07 00:00:00.000000011

6.13 - User Analytics

This article describes User Analytics.

This section describes Kusto extensions (plugins) for user analytics scenarios.

ScenarioPluginDetailsUser Experience
Counting new users over timeactivity_counts_metricsReturns counts/dcounts/new counts for each time window. Each time window is compared to all previous time windowsKusto.Explorer: Report Gallery
Period-over-period: retention/churn rate and new usersactivity_metricsReturns dcount, retention/churn rate for each time window. Each time window is compared to previous time windowKusto.Explorer: Report Gallery
Users count and dcount over sliding windowsliding_window_countsFor each time window, returns count and dcount over a lookback period, in a sliding window manner
New-users cohort: retention/churn rate and new usersnew_activity_metricsCompares between cohorts of new users (all users that were first seen in time window). Each cohort is compared to all prior cohorts. Comparison takes into account all previous time windowsKusto.Explorer: Report Gallery
Active Users: distinct countsactive_users_countReturns distinct users for each time window. A user is only considered if it appears in at least X distinct periods in a specified lookback period.
User Engagement: DAU/WAU/MAUactivity_engagementCompares between an inner time window (for example, daily) and an outer (for example, weekly) for computing engagement (for example, DAU/WAU)Kusto.Explorer: Report Gallery
Sessions: count active sessionssession_countCounts sessions, where a session is defined by a time period - a user record is considered a new session, if it hasn’t been seen in the lookback period from current record
Funnels: previous and next state sequence analysisfunnel_sequenceCounts distinct users who have taken a sequence of events, and the previous or next events that led or were followed by the sequence. Useful for constructing sankey diagrams
Funnels: sequence completion analysisfunnel_sequence_completionComputes the distinct count of users that have completed a specified sequence in each time window

||||