String operators

Learn about query operators for searching string data types.

Kusto Query Language (KQL) offers various query operators for searching string data types. The following article describes how string terms are indexed, lists the string query operators, and gives tips for optimizing performance.

Understanding string terms

Kusto indexes all columns, including columns of type string. Multiple indexes are built for such columns, depending on the actual data. These indexes aren’t directly exposed, but are used in queries with the string operators that have has as part of their name, such as has, !has, hasprefix, !hasprefix. The semantics of these operators are dictated by the way the column is encoded. Instead of doing a “plain” substring match, these operators match terms.

What is a term?

By default, each string value is broken into maximal sequences of alphanumeric characters, and each of those sequences is made into a term.

For example, in the following string, the terms are Kusto, KustoExplorerQueryRun, and the following substrings: ad67d136, c1db, 4f9f, 88ef, d94f3b6b0b5a.

Kusto: ad67d136-c1db-4f9f-88ef-d94f3b6b0b5a;KustoExplorerQueryRun

Kusto builds a term index consisting of all terms that are three characters or more, and this index is used by string operators such as has, !has, and so on. If the query looks for a term that is smaller than three characters, or uses a contains operator, then the query will revert to scanning the values in the column. Scanning is much slower than looking up the term in the term index.

Operators on strings

The following abbreviations are used in this article:

  • RHS = right hand side of the expression
  • LHS = left hand side of the expression

Operators with an _cs suffix are case sensitive.

OperatorDescriptionCase-SensitiveExample (yields true)
==EqualsYes"aBc" == "aBc"
!=Not equalsYes"abc" != "ABC"
=~EqualsNo"abc" =~ "ABC"
!~Not equalsNo"aBc" !~ "xyz"
containsRHS occurs as a subsequence of LHSNo"FabriKam" contains "BRik"
!containsRHS doesn’t occur in LHSNo"Fabrikam" !contains "xyz"
contains_csRHS occurs as a subsequence of LHSYes"FabriKam" contains_cs "Kam"
!contains_csRHS doesn’t occur in LHSYes"Fabrikam" !contains_cs "Kam"
endswithRHS is a closing subsequence of LHSNo"Fabrikam" endswith "Kam"
!endswithRHS isn’t a closing subsequence of LHSNo"Fabrikam" !endswith "brik"
endswith_csRHS is a closing subsequence of LHSYes"Fabrikam" endswith_cs "kam"
!endswith_csRHS isn’t a closing subsequence of LHSYes"Fabrikam" !endswith_cs "brik"
hasRight-hand-side (RHS) is a whole term in left-hand-side (LHS)No"North America" has "america"
!hasRHS isn’t a full term in LHSNo"North America" !has "amer"
has_allSame as has but works on all of the elementsNo"North and South America" has_all("south", "north")
has_anySame as has but works on any of the elementsNo"North America" has_any("south", "north")
has_csRHS is a whole term in LHSYes"North America" has_cs "America"
!has_csRHS isn’t a full term in LHSYes"North America" !has_cs "amer"
hasprefixRHS is a term prefix in LHSNo"North America" hasprefix "ame"
!hasprefixRHS isn’t a term prefix in LHSNo"North America" !hasprefix "mer"
hasprefix_csRHS is a term prefix in LHSYes"North America" hasprefix_cs "Ame"
!hasprefix_csRHS isn’t a term prefix in LHSYes"North America" !hasprefix_cs "CA"
hassuffixRHS is a term suffix in LHSNo"North America" hassuffix "ica"
!hassuffixRHS isn’t a term suffix in LHSNo"North America" !hassuffix "americ"
hassuffix_csRHS is a term suffix in LHSYes"North America" hassuffix_cs "ica"
!hassuffix_csRHS isn’t a term suffix in LHSYes"North America" !hassuffix_cs "icA"
inEquals to any of the elementsYes"abc" in ("123", "345", "abc")
!inNot equals to any of the elementsYes"bca" !in ("123", "345", "abc")
in~Equals to any of the elementsNo"Abc" in~ ("123", "345", "abc")
!in~Not equals to any of the elementsNo"bCa" !in~ ("123", "345", "ABC")
matches regexLHS contains a match for RHSYes"Fabrikam" matches regex "b.*k"
startswithRHS is an initial subsequence of LHSNo"Fabrikam" startswith "fab"
!startswithRHS isn’t an initial subsequence of LHSNo"Fabrikam" !startswith "kam"
startswith_csRHS is an initial subsequence of LHSYes"Fabrikam" startswith_cs "Fab"
!startswith_csRHS isn’t an initial subsequence of LHSYes"Fabrikam" !startswith_cs "fab"

Performance tips

For better performance, when there are two operators that do the same task, use the case-sensitive one. For example:

  • Use ==, not =~
  • Use in, not in~
  • Use hassuffix_cs, not hassuffix

For faster results, if you’re testing for the presence of a symbol or alphanumeric word that is bound by non-alphanumeric characters, or the start or end of a field, use has or in. has works faster than contains, startswith, or endswith.

To search for IPv4 addresses or their prefixes, use one of special operators on IPv4 addresses, which are optimized for this purpose.

For more information, see Query best practices.

For example, the first of these queries will run faster:

StormEvents | where State has "North" | count;
StormEvents | where State contains "nor" | count

Operators on IPv4 addresses

The following group of operators provide index accelerated search on IPv4 addresses or their prefixes.

OperatorDescriptionExample (yields true)
has_ipv4LHS contains IPv4 address represented by RHShas_ipv4("Source address is 10.1.2.3:1234", "10.1.2.3")
has_ipv4_prefixLHS contains an IPv4 address that matches a prefix represented by RHShas_ipv4_prefix("Source address is 10.1.2.3:1234", "10.1.2.")
has_any_ipv4LHS contains one of IPv4 addresses provided by RHShas_any_ipv4("Source address is 10.1.2.3:1234", dynamic(["10.1.2.3", "127.0.0.1"]))
has_any_ipv4_prefixLHS contains an IPv4 address that matches one of prefixes provided by RHShas_any_ipv4_prefix("Source address is 10.1.2.3:1234", dynamic(["10.1.2.", "127.0.0."]))