reduce operator

Learn how to use the reduce operator to group a set of strings together based on value similarity.

Groups a set of strings together based on value similarity.

For each such group, the operator returns a pattern, count, and representative. The pattern best describes the group, in which the * character represents a wildcard. The count is the number of values in the group, and the representative is one of the original values in the group.

Syntax

T | reduce [kind = ReduceKind] by Expr [with [threshold = Threshold] [, characters = Characters]]

Parameters

Name	Type	Required	Description
Expr	`string`	✔️	The value by which to reduce.
Threshold	`real`		A value between 0 and 1 that determines the minimum fraction of rows required to match the grouping criteria in order to trigger a reduction operation. The default value is 0.1. We recommend setting a small threshold value for large inputs. With a smaller threshold value, more similar values are grouped together, resulting in fewer but more similar groups. A larger threshold value requires less similarity, resulting in more groups that are less similar. See Examples.
ReduceKind	`string`		The only valid value is `source`. If `source` is specified, the operator appends the `Pattern` column to the existing rows in the table instead of aggregating by `Pattern`.

Returns

A table with as many rows as there are groups and columns titled pattern, count, and representative. The pattern best describes the group, in which the * character represents a wildcard, or placeholder for an arbitrary insertion string. The count is the number of values in the group, and the representative is one of the original values in the group.

For example, the result of reduce by city might include:

Pattern	Count	Representative
San *	5182	San Bernard
Saint *	2846	Saint Lucy
Moscow	3726	Moscow
* -on- *	2730	One -on- One
Paris	2716	Paris

Examples

The example in this section shows how to use the syntax to help you get started.

Small threshold value

This query generates a range of numbers, creates a new column with concatenated strings and random integers, and then groups the rows by the new column with specific reduction parameters.

range x from 1 to 1000 step 1
| project MyText = strcat("MachineLearningX", tostring(toint(rand(10))))
| reduce by MyText  with threshold=0.001 , characters = "X"

Output

Pattern	Count	Representative
MachineLearning*	1000	MachineLearningX4

Large threshold value

This query generates a range of numbers, creates a new column with concatenated strings and random integers, and then groups the rows by the new column with specific reduction parameters.

range x from 1 to 1000 step 1
| project MyText = strcat("MachineLearningX", tostring(toint(rand(10))))
| reduce by MyText  with threshold=0.9 , characters = "X"

Output

The result includes only those groups where the MyText value appears in at least 90% of the rows.

Pattern	Count	Representative
MachineLearning*	177	MachineLearningX9
MachineLearning*	102	MachineLearningX0
MachineLearning*	106	MachineLearningX1
MachineLearning*	96	MachineLearningX6
MachineLearning*	110	MachineLearningX4
MachineLearning*	100	MachineLearningX3
MachineLearning*	99	MachineLearningX8
MachineLearning*	104	MachineLearningX7
MachineLearning*	106	MachineLearningX2

Behavior of `Characters` parameter

If the Characters parameter is unspecified, then every non-ascii numeric character becomes a term separator.

range x from 1 to 10 step 1 | project str = strcat("foo", "Z", tostring(x)) | reduce by str

Output

Pattern	Count	Representative
others	10

However, if you specify that “Z” is a separator, then it’s as if each value in str is two terms: foo and tostring(x):

range x from 1 to 10 step 1 | project str = strcat("foo", "Z", tostring(x)) | reduce by str with characters="Z"

Output

Pattern	Count	Representative
foo*	10	fooZ1

Apply `reduce` to sanitized input

The following example shows how one might apply the reduce operator to a “sanitized” input, in which GUIDs in the column being reduced are replaced before reducing:

Start with a few records from the Trace table. Then reduce the Text column which includes random GUIDs. As random GUIDs interfere with the reduce operation, replace them all by the string “GUID”. Now perform the reduce operation. In case there are other “quasi-random” identifiers with embedded ‘-’ or ‘_’ characters in them, treat characters as non-term-breakers.

Trace
| take 10000
| extend Text = replace(@"[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}", "GUID", Text)
| reduce by Text with characters="-_"

autocluster

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.

reduce operator

Syntax

Parameters

Returns

Examples

Small threshold value

Large threshold value

Behavior of Characters parameter

Apply reduce to sanitized input

Related content

Feedback

Behavior of `Characters` parameter

Apply `reduce` to sanitized input