reduce operator

Learn how to use the reduce operator to group a set of strings together based on value similarity.

Groups a set of strings together based on value similarity.

For each such group, the operator returns a pattern, count, and representative. The pattern best describes the group, in which the * character represents a wildcard. The count is the number of values in the group, and the representative is one of the original values in the group.

Syntax

T | reduce [kind = ReduceKind] by Expr [with [threshold = Threshold] [, characters = Characters]]

Parameters

NameTypeRequiredDescription
Exprstring✔️The value by which to reduce.
ThresholdrealA value between 0 and 1 that determines the minimum fraction of rows required to match the grouping criteria in order to trigger a reduction operation. The default value is 0.1.

The threshold parameter determines the minimum level of similarity required for values to be grouped together. With a smaller threshold value (closer to 0), more similar values are grouped together, resulting in fewer but more similar groups. A larger threshold value (closer to 1) requires less similarity, resulting in more groups that are less similar. We recommend setting a small threshold value for large inputs. See Examples.
ReduceKindstringThe only valid value is source. If source is specified, the operator appends the Pattern column to the existing rows in the table instead of aggregating by Pattern.

Returns

A table with as many rows as there are groups and columns titled pattern, count, and representative. The pattern best describes the group, in which the * character represents a wildcard, or placeholder for an arbitrary insertion string. The count is the number of values in the group, and the representative is one of the original values in the group.

For example, the result of reduce by city might include:

PatternCountRepresentative
San *5182San Bernard
Saint *2846Saint Lucy
Moscow3726Moscow
* -on- *2730One -on- One
Paris2716Paris

Examples

The example in this section shows how to use the syntax to help you get started.

The following example generates a range of numbers, creates a new column with concatenated strings and random integers, and then groups the rows by the new column with specific reduction parameters. The threshold is set to 0.001, which means that the operator groups values that are very similar to each other.

range x from 1 to 1000 step 1
| project MyText = strcat("MachineLearningX", tostring(toint(rand(10))))
| reduce by MyText  with threshold=0.001 , characters = "X" 

Output

PatternCountRepresentative
MachineLearning*1000MachineLearningX4

The following example generates a range of numbers, creates a new column with concatenated strings and random integers, and then groups the rows by the new column with specific reduction parameters. The threshold is set to 0.9, which means that the operator groups together values less strictly and allows more variance.

range x from 1 to 1000 step 1
| project MyText = strcat("MachineLearningX", tostring(toint(rand(10))))
| reduce by MyText  with threshold=0.9 , characters = "X" 

Output

The result includes only those groups where the MyText value appears in at least 90% of the rows.

PatternCountRepresentative
MachineLearning*177MachineLearningX9
MachineLearning*102MachineLearningX0
MachineLearning*106MachineLearningX1
MachineLearning*96MachineLearningX6
MachineLearning*110MachineLearningX4
MachineLearning*100MachineLearningX3
MachineLearning*99MachineLearningX8
MachineLearning*104MachineLearningX7
MachineLearning*106MachineLearningX2

If the Characters parameter is unspecified, by default the operator treats all non-alphanumeric characters (including spaces and punctuation) as term separators. The following example shows how the reduce operator behaves when the Characters parameter isn’t specified.

range x from 1 to 10 step 1 | project str = strcat("foo", "Z", tostring(x)) | reduce by str

Output

PatternCountRepresentative
others10

However, if you specify that “Z” is a separator, then it’s as if each value in str is two terms: foo and tostring(x):

range x from 1 to 10 step 1 | project str = strcat("foo", "Z", tostring(x)) | reduce by str with characters="Z"

Output

PatternCountRepresentative
foo*10fooZ1

The following example shows how one might apply the reduce operator to a “sanitized” input, in which GUIDs in the column being reduced are replaced before reducing:

Start with a few records from the Trace table. Then reduce the Text column which includes random GUIDs. As random GUIDs interfere with the reduce operation, replace them all by the string “GUID”. Now perform the reduce operation. In case there are other “quasi-random” identifiers with embedded ‘-’ or ‘_’ characters in them, treat characters as non-term-breakers.

Trace
| take 10000
| extend Text = replace(@"[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}", "GUID", Text)
| reduce by Text with characters="-_"