autocluster plugin

Learn how to use the autocluster plugin to find common patterns in data.

autocluster finds common patterns of discrete attributes (dimensions) in the data. It then reduces the results of the original query, whether it’s 100 or 100,000 rows, to a few patterns. The plugin was developed to help analyze failures (such as exceptions or crashes) but can potentially work on any filtered dataset. The plugin is invoked with the evaluate operator.

Syntax

T | evaluate autocluster ([SizeWeight [, WeightColumn [, NumSeeds [, CustomWildcard [, … ]]]]])

Parameters

The parameters must be ordered as specified in the syntax. To indicate that the default value should be used, put the string tilde value ~. For more information, see Examples.

NameTypeRequiredDescription
Tstring✔️The input tabular expression.
SizeWeightdoubleA double between 0 and 1 that controls the balance between generic (high coverage) and informative (many shared) values. Increasing this value typically reduces the quantity of patterns while expanding coverage. Conversely, decreasing this value generates more specific patterns characterized by increased shared values and a smaller percentage coverage. The default is 0.5. The formula is a weighted geometric mean with weights SizeWeight and 1-SizeWeight.
WeightColumnstringConsiders each row in the input according to the specified weight. Each row has a default weight of 1. The argument must be a name of a numeric integer column. A common usage of a weight column is to take into account sampling or bucketing or aggregation of the data that is already embedded into each row.
NumSeedsintDetermines the number of initial local search points. Adjusting the number of seeds impacts result quantity or quality based on data structure. Increasing seeds can enhance results but with a slower query tradeoff. Decreasing below five yields negligible improvements, while increasing above 50 rarely generates more patterns. The default is 25.
CustomWildcardstringA type literal that sets the wildcard value for a specific type in the results table, indicating no restriction on this column. The default is null, which represents an empty string. If the default is a good value in the data, a different wildcard value should be used, such as *. You can include multiple custom wildcards by adding them consecutively.

Returns

The autocluster plugin usually returns a small set of patterns. The patterns capture portions of the data with shared common values across multiple discrete attributes. Each pattern in the results is represented by a row.

The first column is the segment ID. The next two columns are the count and percentage of rows out of the original query that are captured by the pattern. The remaining columns are from the original query. Their value is either a specific value from the column, or a wildcard value (which are by default null) meaning variable values.

The patterns aren’t distinct, may be overlapping, and usually don’t cover all the original rows. Some rows may not fall under any pattern.

Examples

Using evaluate

T | evaluate autocluster()

Using autocluster

StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State , EventType , Damage
| evaluate autocluster(0.6)

Output

SegmentIdCountPercentStateEventTypeDamage
0227838.7HailNO
15128.7Thunderstorm WindYES
289815.3TEXAS

Using custom wildcards

StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State , EventType , Damage
| evaluate autocluster(0.2, '~', '~', '*')

Output

SegmentIdCountPercentStateEventTypeDamage
0227838.7*HailNO
15128.7*Thunderstorm WindYES
289815.3TEXAS**