diffpatterns_text plugin

Learn how to use the diffpatterns_text plugin to compare two string value datasets to find the differences between the two datasets.

Compares two datasets of string values and finds text patterns that characterize differences between the two datasets. The plugin is invoked with the evaluate operator.

The diffpatterns_text returns a set of text patterns that capture different portions of the data in the two sets. For example, a pattern capturing a large percentage of the rows when the condition is true and low percentage of the rows when the condition is false. The patterns are built from consecutive tokens separated by white space, with a token from the text column or a * representing a wildcard. Each pattern is represented by a row in the results.

Syntax

T | evaluate diffpatterns_text(TextColumn, BooleanCondition [, MinTokens, Threshold , MaxTokens])

Parameters

Name	Type	Required	Description
TextColumn	`string`	✔️	The text column to analyze.
BooleanCondition	`string`	✔️	An expression that evaluates to a boolean value. The algorithm splits the query into the two datasets to compare based on this expression.
MinTokens	`int`		An integer value between 0 and 200 that represents the minimal number of non-wildcard tokens per result pattern. The default is 1.
Threshold	`decimal`		A decimal value between 0.015 and 1 that sets the minimal pattern ratio difference between the two sets. Default is 0.05. See diffpatterns.
MaxTokens	`int`		An integer value between 0 and 20 that sets the maximal number of tokens per result pattern, specifying a lower limit decreases the query runtime.

Returns

The result of diffpatterns_text returns the following columns:

Count_of_True: The number of rows matching the pattern when the condition is true.
Count_of_False: The number of rows matching the pattern when the condition is false.
Percent_of_True: The percentage of rows matching the pattern from the rows when the condition is true.
Percent_of_False: The percentage of rows matching the pattern from the rows when the condition is false.
Pattern: The text pattern containing tokens from the text string and ‘*’ for wildcards.

Examples

The following example shows how to use the diffpatterns_text plugin to find patterns in the EpisodeNarrative column of the StormEvents table. The example compares the text patterns of the EpisodeNarrative column when the EventType is “Extreme Cold/Wind Chill” and when it is not.

The following example uses data from the StormEvents table in the help cluster. To access this data, sign in to https://dataexplorer.azure.com/clusters/help/databases/Samples. In the left menu, browse to help > Samples > Tables > Storm_Events.

The examples in this tutorial use the StormEvents table, which is publicly available in the Weather analytics sample data.

StormEvents     
| where EventNarrative != "" and monthofyear(StartTime) > 1 and monthofyear(StartTime) < 9
| where EventType == "Drought" or EventType == "Extreme Cold/Wind Chill"
| evaluate diffpatterns_text(EpisodeNarrative, EventType == "Extreme Cold/Wind Chill", 2)

Output

Count_of_True	Count_of_False	Percent_of_True	Percent_of_False	Pattern
11	0	6.29	0	Winds shifting northwest in * wake * a surface trough brought heavy lake effect snowfall downwind * Lake Superior from
9	0	5.14	0	Canadian high pressure settled * * region * produced the coldest temperatures since February * 2006. Durations * freezing temperatures
0	34	0	6.24	* * * * * * * * * * * * * * * * * * West Tennessee,
0	42	0	7.71	* * * * * * caused * * * * * * * * across western Colorado. *
0	45	0	8.26	* * below normal *
0	110	0	20.18	Below normal *

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.