This is the multi-page printable view of this section. Click here to print.
Limits and Errors
1 - Query Resource Consumption
This document describes the resource consumption information returned as part of a Kusto query response.
When executing a query, the service returns not only the query results but also detailed information about the resources consumed during query execution.
Understanding query resource consumption data helps in optimizing query performance, identifying bottlenecks, and planning for appropriate resource allocation. By monitoring these metrics over time, you can make informed decisions about query design, cluster configuration, and data organization to ensure optimal performance and cost-efficiency of your Kusto queries.
The resource consumption data is returned in the QueryResourceConsumption object as part of the query response, typically in JSON format. You can find the object in Monitoring tools or by Programmatic access.
Structure of the QueryResourceConsumption object
The QueryResourceConsumption object typically includes the following main sections:
QueryHash: A unique identifier for the query structure. This hash represents the query without its literal values, allowing for identification of similar query patterns even when the specific literal values differ. For example, queries likeEvents | where Timestamp > datetime(2023-01-01)andEvents | where Timestamp > datetime(2023-02-01)would have the same QueryHash, as they share the same structure, only differing in the literal datetime values.ExecutionTime: Total execution time in secondsresource_usage: Detailed breakdown of resources usedinput_dataset_statistics: Statistics about the data inputs processeddataset_statistics: Statistics about the resulting datasetcross_cluster_resource_usage: Information about resources used across clusters where relevant
Resource usage details
The resource usage section provides detailed information about the resources consumed during query execution. It includes the following subsections:
resource_usage.cache.shards: Information about cache usageresource_usage.cpu: Information about CPU usageresource_usage.memory: Information about memory usageresource_usage.network: Information about network usageinput_dataset_statistics: Information about the input dataset
Cache usage
The resource_usage.cache.shards section provides information about how the query utilized the cache:
| Object | Property | Description |
|---|---|---|
hot | Data served from the hot cache | |
hitbytes | Amount of data successfully retrieved from hot cache in bytes | |
missbytes | Amount of data not found in hot cache in bytes | |
retrievebytes | Amount of data retrieved from storage to satisfy misses in bytes | |
cold | Data served from the cold cache | |
hitbytes | Amount of data successfully retrieved from cold cache in bytes | |
missbytes | Amount of data in bytes not found in cold cache in bytes | |
retrievebytes | Amount of data retrieved from storage to satisfy misses in bytes | |
bypassbytes | Amount of data that bypassed the cache in bytes | |
results_cache_origin | Information about the original query whose results were cached and reused | |
client_request_id | Unique identifier of the original request that populated the cache | |
started_on | Timestamp when the original query that populated the cache was executed | |
partial_query_results | Statistics of per-shard level caching, if enabled | |
hits | Number of shard-level query results found in the cache | |
misses | Number of shard-level query results missing from the cache |
CPU usage
The resource_usage.cpu section provides information about CPU consumption:
| Object | Property | Description |
|---|---|---|
user | User-mode CPU time | |
kernel | Kernel-mode CPU time | |
total cpu | Total CPU time consumed | |
breakdown | Further breakdown of CPU usage | |
query execution | CPU time for query execution | |
query planning | CPU time for query planning |
Memory usage
The resource_usage.memory section provides information about memory consumption:
| Object | Property | Description |
|---|---|---|
peak_per_node | Peak memory usage per node in bytes |
Network usage
The resource_usage.network section provides information about network usage:
| Object | Property | Description |
|---|---|---|
inter_cluster_total_bytes | Total bytes transferred within the cluster | |
cross_cluster_total_bytes | Total bytes transferred across clusters |
Input dataset statistics
The input_dataset_statistics section provides details about the source data processed:
| Object | Property | Description |
|---|---|---|
extents | Information about data extents | |
total | Total number of extents in all tables referenced by the query | |
scanned | Number of extents scanned (examined by query nodes) | |
scanned_min_datetime | Minimum datetime of scanned data | |
scanned_max_datetime | Maximum datetime of scanned data | |
rows | Information about data rows | |
total | Total number of rows in all tables referenced by the query | |
scanned | Number of rows scanned (examined by query nodes) | |
rowstores | Information about rowstore data | |
scanned_rows | Number of rows scanned from rowstores | |
scanned_values_size | Size of values scanned from rowstores | |
shards | Information about shard queries | |
total | Total number of shards in all tables referenced by the query | |
scanned | Number of shards scanned (examined by query nodes) | |
external_data | Information about external data (if applicable) | |
downloaded_items | Number of items downloaded from external data sources | |
downloaded_bytes | Number of bytes downloaded from external data sources | |
iterated_artifacts | Number of artifacts iterated from external data sources |
Integration with monitoring tools
The QueryResourceConsumption data can be collected and analyzed over time to identify trends and anomalies in query performance. This data is available through:
- Query execution results
.show queriescommand output- Diagnostic logs exported to monitoring solutions
Monitoring this data can help identify query optimization opportunities and track the impact of changes to your data models or query patterns.
Query execution results
You can access the raw JSON file of QueryResourceConsumption data directly from the query execution results. The information is displayed in the results grid located below the query editor.
Run a query in the query pane.
Browse to the Stats tab of the query results.

View the Raw JSAN Preview section and select the View full JSON option to scroll through the raw JSON.

Programmatic access
In client applications, you can access the QueryResourceConsumption information programmatically:
// Example of accessing QueryResourceConsumption in C#
var dataSet = kustoClient.ExecuteQuery(query);
var resourceConsumption = GetQueryResourceConsumption(dataSet.Tables[2], false);
Console.WriteLine($"Execution time: {resourceConsumption.ExecutionTime}");
Console.WriteLine($"Memory peak: {resourceConsumption.ResourceUsage.Memory.PeakPerNode}");
For more information, see Create an app to run management commands.
Examples
Data in Hot Cache: This example shows a query that was served entirely from the hot cache (hitbytes: 517324, missbytes: 0) and had minimal execution time (0.0045931 seconds). All data was found in the hot cache, resulting in very fast query execution.
{
"QueryHash": "add172cd28dde0eb",
"ExecutionTime": 0.0045931,
"resource_usage": {
"cache": {
"shards": {
"hot": {
"hitbytes": 517324,
"missbytes": 0,
"retrievebytes": 0
},
"cold": {
"hitbytes": 0,
"missbytes": 0,
"retrievebytes": 0
},
"bypassbytes": 0
}
},
"cpu": {
"user": "00:00:00",
"kernel": "00:00:00",
"total cpu": "00:00:00",
"breakdown": {
"query execution": "00:00:00",
"query planning": "00:00:00"
}
},
"memory": {
"peak_per_node": 1580848
},
"network": {
"inter_cluster_total_bytes": 27384,
"cross_cluster_total_bytes": 0
}
},
"input_dataset_statistics": {
"extents": {
"total": 1,
"scanned": 1,
"scanned_min_datetime": "2016-03-17T08:24:02.6259906Z",
"scanned_max_datetime": "2016-03-17T08:24:02.6259906Z"
},
"rows": {
"total": 59066,
"scanned": 59066
},
"rowstores": {
"scanned_rows": 0,
"scanned_values_size": 0
},
"shards": {
"queries_generic": 1,
"queries_specialized": 0
}
},
"dataset_statistics": [
{
"table_row_count": 10,
"table_size": 11473
}
],
"cross_cluster_resource_usage": {}
}
Data from External Tables: This example shows a query that processed external data. Note the high execution time (159.88 seconds) and significant CPU utilization (over 1 hour total CPU time). The external data section shows that 6,709 items were downloaded, totaling approximately 87.7 GB. This is typical for queries that need to fetch large amounts of data from external sources, which is significantly slower than querying data in Kusto’s internal storage.
{
"QueryHash": "529656ef4099485b",
"ExecutionTime": 159.8833962,
"resource_usage": {
"cache": {
"shards": {
"hot": {
"hitbytes": 0,
"missbytes": 0,
"retrievebytes": 0
},
"cold": {
"hitbytes": 0,
"missbytes": 0,
"retrievebytes": 0
},
"bypassbytes": 0
}
},
"cpu": {
"user": "01:01:13.5312500",
"kernel": "00:00:44.9687500",
"total cpu": "01:01:58.5000000",
"breakdown": {
"query execution": "01:01:58.5000000",
"query planning": "00:00:00"
}
},
"memory": {
"peak_per_node": 26834528
},
"network": {
"inter_cluster_total_bytes": 6745,
"cross_cluster_total_bytes": 0
}
},
"input_dataset_statistics": {
"extents": {
"total": 0,
"scanned": 0,
"scanned_min_datetime": "0001-01-01T00:00:00.0000000Z",
"scanned_max_datetime": "0001-01-01T00:00:00.0000000Z"
},
"rows": {
"total": 0,
"scanned": 0
},
"rowstores": {
"scanned_rows": 0,
"scanned_values_size": 0
},
"shards": {
"queries_generic": 0,
"queries_specialized": 0
},
"external_data": {
"downloaded_items": 6709,
"downloaded_bytes": 87786879356,
"iterated_artifacts": 6709
}
},
"dataset_statistics": [
{
"table_row_count": 2,
"table_size": 44
}
],
"cross_cluster_resource_usage": {}
}
Data from Cold Cache: This example shows a query that retrieved data from the cold cache (cold.hitbytes: 127209). Note that out of 1,250 total extents, only 1 was scanned, and out of 50,000 total rows, only 40 were scanned. This suggests an efficient query that uses appropriate filtering. Cold cache access is typically slower than hot cache but faster than retrieving data directly from storage.
{
"QueryHash": "480873c9b515cea8",
"ExecutionTime": 1.4233768,
"resource_usage": {
"cache": {
"shards": {
"hot": {
"hitbytes": 0,
"missbytes": 0,
"retrievebytes": 0
},
"cold": {
"hitbytes": 127209,
"missbytes": 0,
"retrievebytes": 0
},
"bypassbytes": 0
}
},
"cpu": {
"user": "00:00:00",
"kernel": "00:00:00",
"total cpu": "00:00:00",
"breakdown": {
"query execution": "00:00:00",
"query planning": "00:00:00"
}
},
"memory": {
"peak_per_node": 2098656
},
"network": {
"inter_cluster_total_bytes": 250676,
"cross_cluster_total_bytes": 0
}
},
"input_dataset_statistics": {
"extents": {
"total": 1250,
"scanned": 1,
"scanned_min_datetime": "2024-01-08T07:13:13.6172552Z",
"scanned_max_datetime": "2024-01-08T07:13:13.6172552Z"
},
"rows": {
"total": 50000,
"scanned": 40
},
"rowstores": {
"scanned_rows": 0,
"scanned_values_size": 0
},
"shards": {
"queries_generic": 1,
"queries_specialized": 0
}
},
"dataset_statistics": [
{
"table_row_count": 10,
"table_size": 123654
}
],
"cross_cluster_resource_usage": {}
}
Results from Query Cache: This example demonstrates a query served from the query results cache. Note the presence of the results_cache_origin section, which indicates the results were retrieved from a previously cached query result. The extremely fast execution time (0.0039999 seconds) shows the benefit of query results caching, as no data processing was needed. The cache contains information about the original request that populated the cache (client_request_id) and when it was initially executed (started_on). Notice that no data was scanned from extents or rows as indicated by zeros in the input_dataset_statistics section, confirming that the results were retrieved directly from the query cache.
{
"ExecutionTime": 0.0039999,
"resource_usage": {
"cache": {
"shards": {
"hot": {
"hitbytes": 0,
"missbytes": 0,
"retrievebytes": 0
},
"cold": {
"hitbytes": 0,
"missbytes": 0,
"retrievebytes": 0
},
"bypassbytes": 0
},
"results_cache_origin": {
"client_request_id": "KE.DF.RunQuery;95b6d241-e684-4a43-91c8-da0d6a854e3e",
"started_on": "2025-04-22T08:22:24.4719143Z"
}
},
"cpu": {
"user": "00:00:00.0156250",
"kernel": "00:00:00",
"total cpu": "00:00:00.0156250",
"breakdown": {
"query execution": "00:00:00.0156250",
"query planning": "00:00:00"
}
},
"memory": {
"peak_per_node": 53440160
},
"network": {
"inter_cluster_total_bytes": 13233,
"cross_cluster_total_bytes": 0
}
},
"input_dataset_statistics": {
"extents": {
"total": 0,
"scanned": 0,
"scanned_min_datetime": "0001-01-01T00:00:00.0000000Z",
"scanned_max_datetime": "0001-01-01T00:00:00.0000000Z"
},
"rows": {
"total": 0,
"scanned": 0
},
"rowstores": {
"scanned_rows": 0,
"scanned_values_size": 0
},
"shards": {
"queries_generic": 0,
"queries_specialized": 0
}
},
"dataset_statistics": [
{
"table_row_count": 10,
"table_size": 12121
}
],
"cross_cluster_resource_usage": {}
}
Results from Partial Query Cache (Per-Shard): This example illustrates a query that benefited from per-shard level caching, as indicated by the partial_query_results section. The cache shows 1 hit and 0 misses, meaning the query was able to retrieve pre-computed results for the shard without having to reprocess the data. Unlike the full query cache example, the input_dataset_statistics shows that data was technically “scanned” (59,066 rows), but this was likely just a metadata operation since the actual computation was retrieved from cache. Note the very fast execution time (0.0047499 seconds), demonstrating the performance advantage of partial query caching. Per-shard caching is particularly useful for queries that repeatedly access the same data partitions with the same filtering conditions.
{
"QueryHash": "da3c6dc30e7b203d",
"ExecutionTime": 0.0047499,
"resource_usage": {
"cache": {
"shards": {
"hot": {
"hitbytes": 0,
"missbytes": 0,
"retrievebytes": 0
},
"cold": {
"hitbytes": 0,
"missbytes": 0,
"retrievebytes": 0
},
"bypassbytes": 0
},
"partial_query_results": {
"hits": 1,
"misses": 0
}
},
"cpu": {
"user": "00:00:00.0156250",
"kernel": "00:00:00",
"total cpu": "00:00:00.0156250",
"breakdown": {
"query execution": "00:00:00.0156250",
"query planning": "00:00:00"
}
},
"memory": {
"peak_per_node": 1580848
},
"network": {
"inter_cluster_total_bytes": 27428,
"cross_cluster_total_bytes": 0
}
},
"input_dataset_statistics": {
"extents": {
"total": 1,
"scanned": 1,
"scanned_min_datetime": "2016-03-17T08:24:02.6259906Z",
"scanned_max_datetime": "2016-03-17T08:24:02.6259906Z"
},
"rows": {
"total": 59066,
"scanned": 59066
},
"rowstores": {
"scanned_rows": 0,
"scanned_values_size": 0
},
"shards": {
"queries_generic": 0,
"queries_specialized": 0
}
},
"dataset_statistics": [
{
"table_row_count": 10,
"table_size": 11473
}
],
"cross_cluster_resource_usage": {}
}
Related content
2 - Query consistency
Query consistency refers to how queries and updates are synchronized. There are two supported modes of query consistency:
Strong consistency: Strong consistency ensures immediate access to the most recent updates, such as data appends, deletions, and schema modifications. Strong consistency is the default consistency mode. Due to synchronization, this consistency mode performs slightly less well than weak consistency mode in terms of concurrency.
Weak consistency: With weak consistency, there may be a delay before query results reflect the latest database updates. Typically, this delay ranges from 1 to 2 minutes. Weak consistency can support higher query concurrency rates than strong consistency.
For example, if 1000 records are ingested each minute into a table in the database, queries over that table running with strong consistency will have access to the most-recently ingested records, whereas queries over that table running with weak consistency may not have access to some of records from the last few minutes.
Use cases for strong consistency
If you have a strong dependency on updates that occurred in the database in the last few minutes, use strong consistency.
For example, the following query counts the number of error records in the 5 minutes and triggers an alert that count is larger than 0. This use case is best handled with strong consistency, since your insights may be altered you don’t have access to records ingested in the past few minutes, as may be the case with weak consistency.
my_table
| where timestamp between(ago(5m)..now())
| where level == "error"
| count
In addition, strong consistency should be used when database metadata is large. For instance. there are millions of data extents in the database, using weak consistency would result in query heads downloading and deserializing extensive metadata artifacts from persistent storage, which may increase the likelihood of transient failures in downloads and related operations.
Use cases for weak consistency
If you don’t have a strong dependency on updates that occurred in the database in the last few minutes, and you need high query concurrency, use weak consistency.
For example, the following query counts the number of error records per week in the last 90 days. Weak consistency is appropriate in this case, since your insights are unlikely to be impacted records ingested in the past few minutes are omitted.
my_table
| where timestamp between(ago(90d) .. now())
| where level == "error"
| summarize count() by level, startofweek(Timestamp)
Weak consistency modes
The following table summarizes the four modes of weak query consistency.
| Mode | Description |
|---|---|
| Random | Queries are routed randomly to one of the nodes in the cluster that can serve as a weakly consistent query head. |
| Affinity by database | Queries within the same database are routed to the same weakly consistent query head, ensuring consistent execution for that database. |
| Affinity by query text | Queries with the same query text hash are routed to the same weakly consistent query head, which is beneficial for leveraging query caching. |
| Affinity by session ID | Queries with the same session ID hash are routed to the same weakly consistent query head, ensuring consistent execution within a session. |
Affinity by database
The affinity by database mode ensures that queries running against the same database are executed against the same version of the database, although not necessarily the most recent version of the database. This mode is useful when ensuring consistent execution within a specific database is important. However. there’s an imbalance in the number of queries across databases, then this mode may result in uneven load distribution.
Affinity by query text
The affinity by query text mode is beneficial when queries leverage the Query results cache. This mode routes repeating queries frequently executed by the same identity to the same query head, allowing them to benefit from cached results and reducing the load on the cluster.
Affinity by session ID
The affinity by session ID mode ensures that queries belonging to the same user activity or session are executed against the same version of the database, although not necessarily the most recent one. To use this mode, the session ID needs to be explicitly specified in each query’s client request properties. This mode is helpful in scenarios where consistent execution within a session is essential.
How to specify query consistency
You can specify the query consistency mode by the client sending the request or using a server side policy. If it isn’t specified by either, the default mode of strong consistency applies.
Client sending the request: Use the
queryconsistencyclient request property. This method sets the query consistency mode for a specific query and doesn’t affect the overall effective consistency mode, which is determined by the default or the server-side policy. For more information, see client request properties.Server side policy: Use the
QueryConsistencyproperty of the Query consistency policy. This method sets the query consistency mode at the workload group level, which eliminates the need for users to specify the consistency mode in their client request properties and allows for enforcing desired consistency modes. For more information, see Query consistency policy.
Related content
- To customize parameters for queries running with weak consistency, use the Query weak consistency policy.
3 - Query limits
Kusto is an ad-hoc query engine that hosts large datasets and tries to satisfy queries by holding all relevant data in memory. There’s an inherent risk that queries monopolize the service resources without bounds. Kusto provides several built-in protections in the form of default query limits. If you’re considering removing these limits, first determine whether you actually gain any value by doing so.
Limit on request concurrency
Request concurrency is a limit on several requests running at the same time.
- The default value of the limit depends on the SKU the database is running on, and is calculated as:
Cores-Per-Node x 10.- For example, for a database that’s set up on D14v2 SKU, where each machine has 16 vCores, the default limit is
16 cores x10 = 160.
- For example, for a database that’s set up on D14v2 SKU, where each machine has 16 vCores, the default limit is
- You can change the default value by configuring the request rate limit policy of the
defaultworkload group.- Various factors affect the actual number of requests that can run concurrently on a database. The most dominant factors are database SKU, database’s available resources, and usage patterns. Configure the policy based on load tests performed on production-like usage patterns.
For more information, see Optimize for high concurrency with Azure Data Explorer.
Limit on result set size (result truncation)
Result truncation is a default limit on the result set returned by the query. Kusto limits the number of records returned to the client to 500,000, and the overall data size for those records to 64 MB. When either of these limits is exceeded, the query fails with a “partial query failure”. Exceeding the overall data size generates an exception with the following message:
The Kusto DataEngine has failed to execute a query: 'Query result set has exceeded the internal data size limit 67108864 (E_QUERY_RESULT_SET_TOO_LARGE).'
Exceeding the number of records fails with an exception that says:
The Kusto DataEngine has failed to execute a query: 'Query result set has exceeded the internal record count limit 500000 (E_QUERY_RESULT_SET_TOO_LARGE).'
You can use several strategies to resolve this error.
- Reduce the result set size by modifying the query to only return interesting data. This strategy is useful when the initial failing query is too “wide”. For example, the query doesn’t project away data columns that aren’t needed.
- Reduce the result set size by shifting post-query processing, such as aggregations, into the query itself. This strategy is useful in scenarios where the output of the query is fed to another processing system, and that system then does other aggregations.
- Switch from queries to using data export when you want to export large sets of data from the service.
- Instruct the service to suppress this query limit by using the
setstatements listed in the following section or flags in client request properties.
Methods for reducing the result set size produced by the query include:
- Use the summarize operator to group and aggregate over similar records in the query output. Potentially sample some columns by using the take_any aggregation function.
- Use a take operator to sample the query output.
- Use the substring function to trim wide free-text columns.
- Use the project operator to drop any uninteresting column from the result set.
You can disable result truncation by using the notruncation request option.
We recommend that some form of limitation is still put in place.
For example:
set notruncation;
MyTable | take 1000000
You can also have more refined control over result truncation
by setting the value of truncationmaxsize (maximum data size in bytes,
defaults to 64 MB) and truncationmaxrecords (maximum number of records,
defaults to 500,000). For example, the following query sets result truncation
to happen at either 1,105 records or 1 MB, whichever is exceeded.
set truncationmaxsize=1048576;
set truncationmaxrecords=1105;
MyTable | where User=="UserId1"
Removing the result truncation limit means that you intend to move bulk data out of Kusto.
You can remove the result truncation limit either for export purposes by using the .export command or for later aggregation. If you choose later aggregation, consider aggregating by using Kusto.
Kusto provides many client libraries that can handle “infinitely large” results by streaming them to the caller. Use one of these libraries, and configure it to streaming mode. For example, use the .NET Framework client (Microsoft.Azure.Kusto.Data) and either set the streaming property of the connection string to true, or use the ExecuteQueryV2Async() call that always streams results. For an example of how to use ExecuteQueryV2Async(), see the HelloKustoV2 application.
You might also find the C# streaming ingestion sample application helpful.
Result truncation is applied by default, not just to the result stream returned to the client. It’s also applied by default to any subquery that one cluster issues to another cluster in a cross-cluster query, with similar effects.
It’s also applied by default to any subquery that one Eventhouse issues to another Eventhouse in a cross-Eventhouse query, with similar effects.
Setting multiple result truncation properties
The following rules apply when you use set statements or specify flags in client request properties.
- If you set
notruncationbut also settruncationmaxsize,truncationmaxrecords, orquery_take_max_records, the service ignoresnotruncation. - If you set
truncationmaxsize,truncationmaxrecords, orquery_take_max_recordsmore than once, the service uses the lower value for each property.
Limit on memory consumed by query operators
Max memory consumption per iterator limit can be configured to control the amount of memory that each query operator consumes, per node. Some query operators, such as join and summarize, hold significant data in memory. By increasing the default of the request option maxmemoryconsumptionperiterator, you can run queries that require more memory per operator.
The maximum supported value for this request option is 32212254720 (30 GB). If you set maxmemoryconsumptionperiterator multiple times, for example in both client request properties and using a set statement, the lower value applies.
When the query reaches the configured memory per operator limit, a partial query failure message displays and includes the text E_RUNAWAY_QUERY.
For example:
The ClusterBy operator has exceeded the memory budget during evaluation. Results might be incorrect or incomplete (E_RUNAWAY_QUERY).
The HashJoin operator has exceeded the memory budget during evaluation. Results might be incorrect or incomplete (E_RUNAWAY_QUERY).
The Sort operator has exceeded the memory budget during evaluation. Results might be incorrect or incomplete (E_RUNAWAY_QUERY).
For example, this query sets the max memory consumption per iterator to 15 GB:
set maxmemoryconsumptionperiterator=16106127360;
MyTable | summarize count() by Use
Another limit that might trigger an E_RUNAWAY_QUERY partial query failure is the max accumulated size of
strings held by a single operator. The request option above can’t override this limit.
Runaway query (E_RUNAWAY_QUERY). Aggregation over string column exceeded the memory budget of 8GB during evaluation.
When this limit is exceeded, most likely the relevant query operator is a join, summarize, or make-series.
To work around the limit, modify the query to use the shuffle query strategy. This change is also likely to improve the performance of the query.
In all cases of E_RUNAWAY_QUERY, an additional option (beyond increasing the limit by setting the request option and changing the
query to use a shuffle strategy) is to switch to sampling. Sampling reduces the amount of data processed by the query, and therefore reduces the memory pressure on query operators.
These two queries show how to do the sampling. The first query is a statistical sampling, using a random number generator. The second query is deterministic sampling, done by hashing some column from the dataset, usually some ID.
T | where rand() < 0.1 | ...
T | where hash(UserId, 10) == 1 | ...
For more information about using mechanisms such as hint.shufflekey for both summarize and join, see Best practices for Kusto Query Language queries.
Limit on memory per node
Max memory per query per node is another limit used to protect against “runaway” queries. This limit, represented by the request option max_memory_consumption_per_query_per_node, sets an upper bound
on the amount of memory that can be used on a single node for a specific query.
set max_memory_consumption_per_query_per_node=68719476736;
MyTable | ...
If max_memory_consumption_per_query_per_node is set multiple times, for example in both client request properties and using a set statement, the lower value applies.
If the query uses summarize, join, or make-series operators, you can use the shuffle query strategy to reduce memory pressure on a single machine.
Limit execution timeout
Server timeout is a service-side timeout that is applied to all requests. Timeout on running requests (queries and management commands) is enforced at multiple points in the Kusto:
- client library (if used)
- service endpoint that accepts the request
- service engine that processes the request
By default, timeout is set to four minutes for queries, and 10 minutes for management commands. This value can be increased if needed (capped at one hour).
- Various client tools support changing the timeout as part of their global or per-connection settings. For example, in Kusto.Explorer, use Tools > Options* > Connections > Query Server Timeout.
- Programmatically, SDKs support setting the timeout through the
servertimeoutproperty. For example, in .NET SDK this is done through a client request property, by setting a value of typeSystem.TimeSpan.
Notes about timeouts
- On the client side, the timeout is applied from the request being created until the time that the response starts arriving to the client. The time it takes to read the payload back at the client isn’t treated as part of the timeout. It depends on how quickly the caller pulls the data from the stream.
- Also on the client side, the actual timeout value used is slightly higher than the server timeout value requested by the user. This difference, is to allow for network latencies.
- To automatically use the maximum allowed request timeout, set the client request property
norequesttimeouttotrue.
Limit on query CPU resource usage
Kusto lets you run queries and use all the available CPU resources that the database has. It attempts to do a fair round-robin between queries if more than one is running. This method yields the best performance for query-defined functions. At other times, you might want to limit the CPU resources used for a particular query. If you run a “background job”, for example, the system might tolerate higher latencies to give concurrent inline queries high priority.
Kusto supports specifying two request properties when running a query. The properties are query_fanout_threads_percent and query_fanout_nodes_percent. Both properties are integers that default to the maximum value (100), but may be reduced for a specific query to some other value.
The first, query_fanout_threads_percent, controls the fanout factor for thread use. When this property is set 100%, all CPUs are assigned on each node. For example, 16 CPUs deployed on Azure D14 nodes. When this property is set to 50%, then half of the CPUs are used, and so on. The numbers are rounded up to a whole CPU, so it’s safe to set the property value to 0.
The second, query_fanout_nodes_percent, controls how many of the query nodes to use per subquery distribution operation. It functions in a similar manner.
If query_fanout_nodes_percent or query_fanout_threads_percent are set multiple times, for example, in both client request properties and using a set statement - the lower value for each property applies.
Limit on query complexity
During query execution, the query text is transformed into a tree of relational operators representing the query. If the tree depth exceeds an internal threshold, the query is considered too complex for processing, and fails with an error code. The failure indicates that the relational operators tree exceeds its limits.
The following examples show common query patterns that can cause the query to exceed this limit and fail:
- a long list of binary operators that are chained together. For example:
T
| where Column == "value1" or
Column == "value2" or
.... or
Column == "valueN"
For this specific case, rewrite the query using the in() operator.
T
| where Column in ("value1", "value2".... "valueN")
- a query which has a union operator that’s running too wide schema analysis especially that the default flavor of union is to return “outer” union schema (meaning – that output includes all columns of the underlying table).
The suggestion in this case is to review the query and reduce the columns being used by the query.
Related content
4 - Partial query failures
4.1 - Kusto query result set exceeds internal limit
A query result set has exceeded the internal … limit is a kind of partial query failure that happens when the query’s result has exceeded one of two limits:
- A limit on the number of records (
record count limit, set by default to 500,000) - A limit on the total amount of data (
data size limit, set by default to 67,108,864 (64MB))
There are several possible courses of action:
- Change the query to consume fewer resources. For example, you can:
- Limit the number of records returned by the query using the take operator or adding additional where clauses.
- Try to reduce the number of columns returned by the query. Use the project operator, the project-away operator, or the project-keep operator.
- Use the summarize operator to get aggregated data
- Increase the relevant query limit temporarily for that query. For more information, see Result truncation under query limits.
[!NOTE] We don’t recommend that you increase the query limit, since the limits exist to protect the database. The limits make sure that a single query doesn’t disrupt concurrent queries running on the database.
4.2 - Overflows
An overflow occurs when the result of a computation is too large for the destination type. The overflow usually leads to a partial query failure.
For example, the following query will result in an overflow.
let Weight = 92233720368547758;
range x from 1 to 3 step 1
| summarize percentilesw(x, Weight * 100, 50)
Kusto’s percentilesw() implementation accumulates the Weight expression for values that are “close enough”.
In this case, the accumulation triggers an overflow because it doesn’t fit into a signed 64-bit integer.
Usually, overflows are a result of a “bug” in the query, since Kusto uses 64-bit types for arithmetic computations. The best course of action is to look at the error message, and identify the function or aggregation that triggered the overflow. Make sure the input arguments evaluate to values that make sense.
4.3 - Runaway queries
A runaway query is a kind of partial query failure that happens when some internal query limit was exceeded during query execution.
For example, the following error may be reported:
HashJoin operator has exceeded the memory budget during evaluation. Results may be incorrect or incomplete.
There are several possible courses of action.
- Change the query to consume fewer resources. For example, if the error indicates that the query result set is too large, you can:
- Limit the number of records returned by the query by
- Using the take operator
- Adding additional where clauses
- Reduce the number of columns returned by the query by
- Using the project operator
- Using the project-away operator
- Using the project-keep operator
- Use the summarize operator to get aggregated data.
- Limit the number of records returned by the query by
- Increase the relevant query limit temporarily for that query. For more information, see query limits - limit on memory per iterator. This method, however, isn’t recommended. The limits exist to protect the cluster and to make sure that a single query doesn’t disrupt concurrent queries running on the cluster.
- Increase the relevant query limit temporarily for that query. For more information, see query limits - limit on memory per iterator. This method, however, isn’t recommended. The limits exist to protect the Eventhouse and to make sure that a single query doesn’t disrupt concurrent queries running on the Eventhouse.